Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Luckily I write way more infrequently :)

This one right here: https://news.ycombinator.com/item?id=46384118

It’s absolutely not enough to “keep an eye on it on your phone”. You need to know that the implementation of the tests are real. LLMs routinely make shortcut in tests to make them green. There was an occasion when flat out mocked everything from the live code, and it was a very-very simple python REST API, tests of course were green.



I haven't caught Opus 4.5 cheating on a test yet. I saw plenty of that with older models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: