Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not dumber. More biased.

Important distinction, especially if we're looking to push back out towards the Pareto Frontier of the problem.

RLHF is still very much in its infancy and does not maximize the bias-variance tradeoff by a long shot, in my personal experience.



My understanding is that OpenAI did indeed find diminished capability across a range of tasks after doing RLHF. You're correct to question this though - as I believe the opposite was true of GPT-3 where it improved certain tasks.

The benefits from a business perspective were still clear however, and of course the instruction-tuned GPT-4 model still outperformed GPT-3, in general.

There are probably some weird edge cases and nuances that I'm missing - and I'd be happy to be corrected.


No dumber. Sure more biased too if you want but also dumber. Open ai have indicated as much.


Also generally less creative and insightful.

"No I won't do it" becomes a good option no matter what if you turn safety too high.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: