Your argument is fine but different from the claim the OP is making. You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output. Subjectively, people might still prefer one over due to anything from design to marketing, but that's very different from the claim that X is better than Y for coding (see: "A colleague was convinced Claude is better"). Basically, I prefer Claude is a different claim than Claude is better and the latter has a higher bar of proof.
> You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output.
You definitely can in principle; that’s the entire point of the comment you are responding to. If one tool completes it in 10 minutes with little hand holding, and the other does it in one hour at 4× the cost and while needing a lot of steering, the former is arguably better even if the end result is the same.
Whether that’s specifically true and demonstrable of GPT and Claude is another question, but your blanket statement doesn’t hold as a general rule.
That's a fair callout and I agree my statement was too general in just mentioning 'output', as you correctly pointed out. To define 'better' you would indeed need to agree on the dimensions you would evaluate candidates against.
I think a more appropriate rephrasing would be 'You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference on dimensions you care about'. In the case of latest of claude code vs codex with gpt 5.5) both are similar enough in the dimensions people will care about in evaluating (vs. differing wildly in cost or time taken).
The colleague implicitly agreed that comparing the output was a valid way to settle the matter as they took part in the test, so they weren't using "better" in the way you propose.
I wasn’t really discussing the colleague, but either way, from:
> A colleague was convinced Claude is better so we played a game. We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.
I don’t think it’s obvious that they specifically agreed that losing the game meant that. They might just have thought “sure, it might be fun”, if they even gave it that much thought.
“So we played a game” is rather vague and I feel it’s a bit of a leap to read it as: “as an explicit outcome of their claim that Claude is better, we made a formal bet as to whether they could tell the difference in the output, the failure of which would mean a full retractation of their statement”.
Claude and Codex are tools. You can't tell the difference in the output between something that was done with a ratcheting wrench vs a standard combination wrench, but your mechanic certainly knows the ratcheting wrench is better (for most tasks).
I've not used Codex to compare against, so I'm not claiming X is better than Y, but comparing tools simply on their output is naive.
You need to see the response in light of the original discussion. Referencing here for clarity since I should have included it in the first place: "We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code."
So the same person, was using similarly competitive tools, and showing that the output was hard to discern (indirectly the implication was also that implementation was fairly trivial in both of those). A better analogy would not be different process and widely different tools but for example two power drills. Sure, folks could still prefer one over the other, but that's a different claim that saying X is objectively better than Y when both are directly competing on very similar dimensions.
Assuming you meant Claude code: I'd love to learn more about "Codex and Claude are very different" because maybe I'm assuming just based on my use case where I use both of them interchangeably for the same thing (coding web and mobile apps)
That’s actually what my comment was based on; raw code output isn’t the only measure of quality. Engineers write better code if they have the tools they prefer.
Exactly, I was confused too. The authors clearly mention what the parent comment talks about, albeit towards the end of the article, that the 'J' bundle meant that these firms were not set up for success once they 'caught up' and were required to innovate not just process but from the ground up to envision new categories (e.g. iPhone).
Revenue is not the right metric when you compare space trips to trips inside a city. The more relevant numbers are EBITDA, Operating cash flow, Profits.
has anyone done the math on:
1. cost to build out and run the data centers
2. cost of compute (hardware and energy)
3. depreciation of legacy GPU and thus value at the end of 3 years.
And then compare the $45B revenue from Anthropic to see if it's mostly break even or if one of Anthropic/SpaceX came out ahead on the contract.
Maybe it is a win/win. Anthropic gets desperately needed compute at a fair price. SpaceXAI sells compute at a fair price and gets desperately needed revenues.
XAi purchased Teslas allocation of GPUs, in exchange for Tesla purchasing XAis allocation of GPUs at a later date. This they claim was done because Tesla didnt have their datacentre ready to receive those gpus at that time. I dont see how or who was robbed here.
SpaceX is already indicating their strategy on this, because they’re renting their last-gen data center to Anthropic and keeping the current-gen data center for themselves. Rinse and repeat.
Ed Zitron https://www.wheresyoured.at/ has done the math, and it's pretty bleak. His somewhat voluminous rantings contain raw figures on investments, data centre builds, energy availability and depreciation.
He believes Oracle has already signed it's own death warrant, and that Meta is close behind. MS, Amazon and Google have massive revenue streams to sustain them, but looking at the numbers, each has to earn from AI the equivalent of their existing real revenue. I can't see that happening.
And he believes from multiple perspectives of the data that Nvidea are either massively overstating their GPU sales, or that there are warehouses full of unused GPUs. There just isn't the energy capacity to run them all, let alone data centres to put them in.
He is complaining that there are no 1GW+ data centers, with evidence like this:
> For example, CNBC’s MacKenzie Sigalos reported in October 2025 that Amazon’s Indiana-based (allegedly) 2.2GW Project Rainier data center was “operational,” but only seven out of a planned 30 buildings were actually operational, and her comment of “with two more campuses [of indeterminate capacity] underway.” This comment was buried two videos and 600 words into a piece that declared the data center was “now operational,” with the express intent of making you think the whole thing was operational.
But if you read the report that "buried" comment is far from buried - the whole thing is about how it is still under construction!
Of course 1GW data centers don't all come online at once! You get them online in the parts you can as soon as you can!
From a previous comment of mine – the quotes are all from a single article:
He comes across as just a ludicrously unpleasant, spite-filled person.
> I'm fucking tired of having to write this sentence.
> I am so very bored of having this conversation
> I don't care about this number!
> Shut the fuck up!
> This isn't the early days of shit.
> Didn't we just talk about this? Fine, fine.
> $3.25 billion a quarter is absolutely pathetic.
> This isn’t real business! Sorry!
> He said in one of his stupid and boring blogs that
> This man is full of shit! Hey, tech media people reading this — your readers hate this shit! Stop printing it! Stop it!
> It's here where I'm going to choose to scream.
> Dario Amodei — much like Sam Altman — is a liar, a crook, a carnival barker and a charlatan, and the things he promises
are equal parts ridiculous and offensive.
> Why are we humoring these oafs?
> Despite Newton's fawning praise
> Nobody talks like this! This isn’t how human beings sound! I don’t like reading it!
> Ewww.
> I'm sorry, I know I sound like a hater, and perhaps I am, but this shit doesn't impress me even a little.
> I know, I know, I'm a hater, I'm a pessimist, a cynic, but I need you to fucking listen to me: everything I am describing is unfathomably dangerous
> expensive, stupid, irksome, quasi-useless new product
> I know this has been a rant-filled newsletter, but I'm so tired of being told to be excited about this warmed-up dogshit.
> I refuse to sit here and pretend that any of this matters.
> I'm tired of the delusion. I'm tired of being forced to take these men seriously.
When I read this kind of thing, it’s very apparent that this is being driven entirely by spite not insight. He’s just so angry about everything. There are 57 exclamation marks in this article!
In the 90s we had people talking like this about The Internet. They're all over on FB now, with a detour in between to say stuff like "my isp can track me!?"
Do you find the video understanding work there also to be 'silly little slop', or did you only look at the gifs on the page and not read about the understanding work in a 3B model?
This is not ground-breaking by any means, but achieving this in a 3B model and sharing the approach + weights advances engineering and certainly more contribution that 'silly little slop videos' imo.
If that's the case, a way to test the theory and understanding (assuming some parts of reservoir and signal channel can be reliably identified) would be to prune the high-confidence reservoir significantly reducing the model size while still getting good predictions. I don't believe the authors mention this (though I skimmed and didn't read the full paper in detail so I may be wrong)
"What slows down a team where agents do the implementation is the production of specifications precise enough for an agent to pick up and run. Roadmap, written down. Acceptance criteria, written down. The “what we actually want” forced into precision, be it via a test suite, a ticket, or a written design."
This is merely speed of development and not the velocity of a company towards higher value. There are many PMs confidently (using the same AI tools), without a clear deep understanding of the user problems or why the requirements will be adopted by their target users (or even who the target users really are), writing these done elaborately.
So yes this will lead to faster end-end execution. But if the product is used or if it sits unused will depend on things beyond the above.
Agree with your points on the primary two questions and the circular argument in the original article.
However, re: " How is it that atoms/electrons/photons suddenly start experiencing pain? What is it, in terms of atoms/forces, that's experiencing the pain?" that's an interesting question but not necessarily fundamentally refuting of #1. If you start with #1 "Consciousness is an unknown physical something (force/particle/quantum whatever)" then it has 'perceivable' properties of it's own different from those of it's constituent atoms or electrons. A toy example is the 'wetness' of water. If you only look at atoms and molecules with no way to 'experience' water then it's hard to conceive how water can have properties (though in the case of water it is tractable)
Consciousness *may* be something similar. If it is (e.g. the purest form of energy) then it is not inconceivable that it has some properties that not not tractable if we only look at more granular manifestations of it.
Agreed! I'm skeptical of consciousness requiring some exotic new physics (a quantum phenomenon or a new form of energy or somesuch) but we can't prove that it doesn't.
Honestly, if someday a scientist proves that consciousness is a fundamental force like gravity, I would say, "yup, that makes sense!" even if I don't think it's likely.
reply