"Proper" may be doing some work here, but such a test was run last year and GPT-4.5 and LLaMa-3.1-405B both passed. Oddly, GPT-4.5 was judged as human significantly more often than chance. https://arxiv.org/abs/2503.23674
When you cannot win, you regulate the scoreboard. This is why Europe keeps getting lapped. They mistake control for competence. They cannot build platforms at scale, so they try to govern everyone else’s.
Our data in the cloud, hallowed be thy computation, your kingdom come, your will be done, on our devices as it is in the cloud. Give us today our daily feed.
reply