You're missing the point. There is no evidence to support their claims which means they are more than likely leaking the memory into the LLM prompt & it is cheating by simply loading constants into memory instead of computing anything. This is why formal specifications are used to constrain optimization. Without proof that the code is equivalent you might as well just load constants into memory & claim victory.
Do you make a habit of not presuming even basic competence? You believe that Anthropic left the task running for hours, got a score back, and never bothered to examine the solution? Not even out of curiosity?
Also if it was cheating you'd expect the final score to be unbelievably low. Unless you also suppose that the LLM actively attempted to deceive the human reviewers by adding extra code to burn (approximately the correct number of) cycles.
This has nothing to do w/ me & consistently making it a personal problem instead of addressing the claims is a common tactic for people who do not know what it means to present evidence for their claims. Anthropic has not provided the necessary evidence for me to conclude that their LLM is not cheating. I have no opinion on their competence b/c that is not what is at issue. They could be incompetent & not notice that their LLM is cheating at their take home exam but I don't care about that.
You are implying that you believe them to be incompetent since otherwise you would not expect evidence in this instance. They also haven't provided independent verification of their claims - do you suspect them of lying as well?
How do you explain the specific score that was achieved if as you suggest the LLM simply copied the answer directly?
Either they have proof that their LLM is not cheating or they don't. The linked post does not provide evidence that the LLM is not cheating. I don't have to explain anything on my end b/c my claim is very simple & easily refuted w/ the proper evidence.