Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nice find. Sonnet 4.5 = Fail, Gemini 2.5 Pro = Fail, Qwen 30b = Pass!


I just tried Opus 4.1=Pass (after a self correction in its answer), Gemini 2.5 Flash=Pass (surprised that it gave the correct answer immediately)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: