> R1-Zero removes the human bottleneck I disagree. It only removes the bottlenec...

Bjorkbat · on Jan 30, 2025

I'm still skeptical on the notion that we can remove the human bottleneck on code because code has verifiable solutions.

It's true only to the extent that there's sufficient test coverage to prevent any unwanted side effects. Easy to do with straight forward problems, far more difficult with more complex as well as open-ended problems.

mtrovo · on Jan 31, 2025

The fact that both systems scored well on ARC AGI 1 shows they can handle unseen challenges without heavy human input, unless I'm missing something about why you see humans as the best interface for real world exploration.

janalsncm · on Jan 30, 2025

In the case of ARC they are referring to verifiable math and reasoning problems. They still used SFT and model-based rewards for other domains.