Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reinforcement learning.

People like being told they are right, and when a response contains that formulation, on average, given the choice, people will pick it more often than a response that doesn't, and the LLM will adapt.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: