In go, or similarly chess, the AI can play stupendous number of games against itself and get accurate feedback for every single game. Everything is there to create your own training set just from knowing the rules. But outside of such games, how does an AI create it's own training data when there is no function to tell you how well you are doing? This might be a dumb question, I don't have any idea on how LLMs work
One such function is “what happens next?” which may work as well in the real world as on textual training data. Certainly it’s part of how human babies learn, via schemas.
Creating something is much harder than verifying it.
A simple setup for improving coding skills is the following:
1. GPT is given a coding task to implement as a high level prompt.
2. It generates unit tests to verify that the implementation is correct.
3. It generates code to implement the algorithm.
4. It runs the generated code against the generated unit tests. If there are errors generated by the interpreter/compiler, go back to Step 3, modify the code appropriately and try again.
5. If there are no errors found, take the generated code as a positive example and update the model weights with reinforcement learning.
The most naive way you could do things could be to procedurally generate immense amounts of python code, then ask the model to predict whether the code will compile, whether it will crash, what its outputs will be given certain inputs, etc.