Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> R1-Zero removes the human bottleneck

I disagree. It only removes the bottleneck to collecting math and code reasoning chains, not in general. The general case requires physical testing not just calculations, otherwise scientists would not need experimental labs. Discovery comes from searching the real world, it's where interesting things happen. The best interface between AI and the world are still humans, the code and math domains are just lucky to work without real world interaction.



I'm still skeptical on the notion that we can remove the human bottleneck on code because code has verifiable solutions.

It's true only to the extent that there's sufficient test coverage to prevent any unwanted side effects. Easy to do with straight forward problems, far more difficult with more complex as well as open-ended problems.


The fact that both systems scored well on ARC AGI 1 shows they can handle unseen challenges without heavy human input, unless I'm missing something about why you see humans as the best interface for real world exploration.


In the case of ARC they are referring to verifiable math and reasoning problems. They still used SFT and model-based rewards for other domains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: