In go, or similarly chess, the AI can play stupendous number of games against it...

thom · on March 25, 2023

One such function is “what happens next?” which may work as well in the real world as on textual training data. Certainly it’s part of how human babies learn, via schemas.

typon · on March 25, 2023

Creating something is much harder than verifying it.

A simple setup for improving coding skills is the following:

1. GPT is given a coding task to implement as a high level prompt.

2. It generates unit tests to verify that the implementation is correct.

3. It generates code to implement the algorithm.

4. It runs the generated code against the generated unit tests. If there are errors generated by the interpreter/compiler, go back to Step 3, modify the code appropriately and try again.

5. If there are no errors found, take the generated code as a positive example and update the model weights with reinforcement learning.

ses1984 · on March 25, 2023

What if it’s wrong at step 2?

PoignardAzur · on March 25, 2023

The most naive way you could do things could be to procedurally generate immense amounts of python code, then ask the model to predict whether the code will compile, whether it will crash, what its outputs will be given certain inputs, etc.

visarga · on March 25, 2023

Code execution is also a good way to collect feedback signals.