Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The R1-Zero paper shows how many training steps the RL took, and it's not many. The cost of the RL is likely a small fraction of the cost of the foundational model.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: