Hacker Newsnew | past | comments | ask | show | jobs | submit | neosat's commentslogin

Agree. Audio has strongly temporal so there is almost certainly some positional encoding one way or another.

Your argument is fine but different from the claim the OP is making. You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output. Subjectively, people might still prefer one over due to anything from design to marketing, but that's very different from the claim that X is better than Y for coding (see: "A colleague was convinced Claude is better"). Basically, I prefer Claude is a different claim than Claude is better and the latter has a higher bar of proof.

> You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output.

You definitely can in principle; that’s the entire point of the comment you are responding to. If one tool completes it in 10 minutes with little hand holding, and the other does it in one hour at 4× the cost and while needing a lot of steering, the former is arguably better even if the end result is the same.

Whether that’s specifically true and demonstrable of GPT and Claude is another question, but your blanket statement doesn’t hold as a general rule.


That's a fair callout and I agree my statement was too general in just mentioning 'output', as you correctly pointed out. To define 'better' you would indeed need to agree on the dimensions you would evaluate candidates against.

I think a more appropriate rephrasing would be 'You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference on dimensions you care about'. In the case of latest of claude code vs codex with gpt 5.5) both are similar enough in the dimensions people will care about in evaluating (vs. differing wildly in cost or time taken).


This obviously correct take will get pushback, so let me add some other examples:

- which tool required more detailed goal-setting in the prompt?

- did one tool ask follow-up questions up front vs spread out over implementation?

- did either tool match existing coding styles?

- did either tool remind you about potential conflicts between what you asked it to build and other parts of the codebase?

There are a lot of ways to compare agents besides just the code. (Similarly, working engineers are not evaluated just on their code output.)


The colleague implicitly agreed that comparing the output was a valid way to settle the matter as they took part in the test, so they weren't using "better" in the way you propose.

I wasn’t really discussing the colleague, but either way, from:

> A colleague was convinced Claude is better so we played a game. We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code.

I don’t think it’s obvious that they specifically agreed that losing the game meant that. They might just have thought “sure, it might be fun”, if they even gave it that much thought.

“So we played a game” is rather vague and I feel it’s a bit of a leap to read it as: “as an explicit outcome of their claim that Claude is better, we made a formal bet as to whether they could tell the difference in the output, the failure of which would mean a full retractation of their statement”.


Claude and Codex are tools. You can't tell the difference in the output between something that was done with a ratcheting wrench vs a standard combination wrench, but your mechanic certainly knows the ratcheting wrench is better (for most tasks).

I've not used Codex to compare against, so I'm not claiming X is better than Y, but comparing tools simply on their output is naive.


" You cannot simply make a claim that (model + harness) X is better than Y, but then have no discernible difference in the output"

Sorry I think this misses the mark.

Because it's not the output but the process.

And sometimes the outcomes are not always discernable.

Codex and Claude are very different.

I use them for different things.

Their behaviour difference is obvious.

Of course it'd impossible for anyone to tell by looking at my code base 'how it was written'.


You need to see the response in light of the original discussion. Referencing here for clarity since I should have included it in the first place: "We used the claude code and codex harness and I implemented some prs they needed with gpt5.5 and opus4.7 and asked them to identify which came from which only from the code."

So the same person, was using similarly competitive tools, and showing that the output was hard to discern (indirectly the implication was also that implementation was fairly trivial in both of those). A better analogy would not be different process and widely different tools but for example two power drills. Sure, folks could still prefer one over the other, but that's a different claim that saying X is objectively better than Y when both are directly competing on very similar dimensions.

Assuming you meant Claude code: I'd love to learn more about "Codex and Claude are very different" because maybe I'm assuming just based on my use case where I use both of them interchangeably for the same thing (coding web and mobile apps)


It's not reasonable to compare results from two different tool sets, especially as they are guided by humans.

The only way a reasonable comparison could be made, would be to compare completely automated results from either technology - that would be useful.

For example - creating a 'per-baked script' and running on both to see the output.

Codex and Claude are obviously very different, though it's hard to characterize how those differences might apply exactly to a given problem.

Two 'very different power saws' will ultimately build the same home.


> A colleague was convinced Claude is better

That’s actually what my comment was based on; raw code output isn’t the only measure of quality. Engineers write better code if they have the tools they prefer.


The colleague participated in the test though, so apparently the colleague didn't object to "better" being interpreted as "makes better output".

Exactly, I was confused too. The authors clearly mention what the parent comment talks about, albeit towards the end of the article, that the 'J' bundle meant that these firms were not set up for success once they 'caught up' and were required to innovate not just process but from the ground up to envision new categories (e.g. iPhone).


Revenue is not the right metric when you compare space trips to trips inside a city. The more relevant numbers are EBITDA, Operating cash flow, Profits.


has anyone done the math on: 1. cost to build out and run the data centers 2. cost of compute (hardware and energy) 3. depreciation of legacy GPU and thus value at the end of 3 years.

And then compare the $45B revenue from Anthropic to see if it's mostly break even or if one of Anthropic/SpaceX came out ahead on the contract.


Maybe it is a win/win. Anthropic gets desperately needed compute at a fair price. SpaceXAI sells compute at a fair price and gets desperately needed revenues.


Tesla loses out on that revenue since it was their chips to begin with, right?


XAi purchased Teslas allocation of GPUs, in exchange for Tesla purchasing XAis allocation of GPUs at a later date. This they claim was done because Tesla didnt have their datacentre ready to receive those gpus at that time. I dont see how or who was robbed here.


No. xAI is buying Nvidia, not using Tesla chips.


SpaceX is already indicating their strategy on this, because they’re renting their last-gen data center to Anthropic and keeping the current-gen data center for themselves. Rinse and repeat.


It's same gen.


It has $25 billion on AI cap expenditure in the S1. So generally looks like a solid deal for SpaceX.


Well Colossus 1 has 230k GPUs, including 30k GB200s and Colossus 2 has 550k GB200s & GB300s.

So my guess on costs would be like ~$10B for Colossus 1, and Colossus 2 would be like ~20b.


a GB300 rack is like 5-6 million so seems a bit low.


Yeah, maybe these aren't good guesses. I was basing it off CapEx and Elon's tweet about them. Maybe C2 isn't completely filled yet.


Ed Zitron https://www.wheresyoured.at/ has done the math, and it's pretty bleak. His somewhat voluminous rantings contain raw figures on investments, data centre builds, energy availability and depreciation.

He believes Oracle has already signed it's own death warrant, and that Meta is close behind. MS, Amazon and Google have massive revenue streams to sustain them, but looking at the numbers, each has to earn from AI the equivalent of their existing real revenue. I can't see that happening.

And he believes from multiple perspectives of the data that Nvidea are either massively overstating their GPU sales, or that there are warehouses full of unused GPUs. There just isn't the energy capacity to run them all, let alone data centres to put them in.


> Ed Zitron

His math is wrong though. He still claims H100s are worthless but in fact they are worth more now than when they were new.

And everything I've read from him is just.. weird? Like he has an anti-AI agenda and he interpreters everything through that?

Look at his latest public piece: https://www.wheresyoured.at/where-are-all-the-data-centers/

He is complaining that there are no 1GW+ data centers, with evidence like this:

> For example, CNBC’s MacKenzie Sigalos reported in October 2025 that Amazon’s Indiana-based (allegedly) 2.2GW Project Rainier data center was “operational,” but only seven out of a planned 30 buildings were actually operational, and her comment of “with two more campuses [of indeterminate capacity] underway.” This comment was buried two videos and 600 words into a piece that declared the data center was “now operational,” with the express intent of making you think the whole thing was operational.

But if you read the report that "buried" comment is far from buried - the whole thing is about how it is still under construction!

Of course 1GW data centers don't all come online at once! You get them online in the parts you can as soon as you can!


> Ed Zitron https://www.wheresyoured.at/ has done the math, and it's pretty bleak.

Ed Zitron is constantly wrong and writes like a child having a tantrum, I don’t understand why you take him seriously?

https://www.theargumentmag.com/p/ais-biggest-critic-has-lost...

From a previous comment of mine – the quotes are all from a single article:

He comes across as just a ludicrously unpleasant, spite-filled person.

> I'm fucking tired of having to write this sentence.

> I am so very bored of having this conversation

> I don't care about this number!

> Shut the fuck up!

> This isn't the early days of shit.

> Didn't we just talk about this? Fine, fine.

> $3.25 billion a quarter is absolutely pathetic.

> This isn’t real business! Sorry!

> He said in one of his stupid and boring blogs that

> This man is full of shit! Hey, tech media people reading this — your readers hate this shit! Stop printing it! Stop it!

> It's here where I'm going to choose to scream.

> Dario Amodei — much like Sam Altman — is a liar, a crook, a carnival barker and a charlatan, and the things he promises are equal parts ridiculous and offensive.

> Why are we humoring these oafs?

> Despite Newton's fawning praise

> Nobody talks like this! This isn’t how human beings sound! I don’t like reading it!

> Ewww.

> I'm sorry, I know I sound like a hater, and perhaps I am, but this shit doesn't impress me even a little.

> I know, I know, I'm a hater, I'm a pessimist, a cynic, but I need you to fucking listen to me: everything I am describing is unfathomably dangerous

> expensive, stupid, irksome, quasi-useless new product

> I know this has been a rant-filled newsletter, but I'm so tired of being told to be excited about this warmed-up dogshit.

> I refuse to sit here and pretend that any of this matters.

> I'm tired of the delusion. I'm tired of being forced to take these men seriously.

When I read this kind of thing, it’s very apparent that this is being driven entirely by spite not insight. He’s just so angry about everything. There are 57 exclamation marks in this article!

https://news.ycombinator.com/item?id=43085885#43086361

Pay too much attention to this kind of thing and it will poison your mind.


In the 90s we had people talking like this about The Internet. They're all over on FB now, with a detour in between to say stuff like "my isp can track me!?"


Do you find the video understanding work there also to be 'silly little slop', or did you only look at the gifs on the page and not read about the understanding work in a 3B model?

This is not ground-breaking by any means, but achieving this in a 3B model and sharing the approach + weights advances engineering and certainly more contribution that 'silly little slop videos' imo.


It’s not a 3B model, it has 3B active parameters. The full model is much larger.


That's true, I should have mentioned active. Actual params are closer to 12B-14B likely, given the 40GB VRAM usage.


If that's the case, a way to test the theory and understanding (assuming some parts of reservoir and signal channel can be reliably identified) would be to prune the high-confidence reservoir significantly reducing the model size while still getting good predictions. I don't believe the authors mention this (though I skimmed and didn't read the full paper in detail so I may be wrong)


"What slows down a team where agents do the implementation is the production of specifications precise enough for an agent to pick up and run. Roadmap, written down. Acceptance criteria, written down. The “what we actually want” forced into precision, be it via a test suite, a ticket, or a written design."

This is merely speed of development and not the velocity of a company towards higher value. There are many PMs confidently (using the same AI tools), without a clear deep understanding of the user problems or why the requirements will be adopted by their target users (or even who the target users really are), writing these done elaborately.

So yes this will lead to faster end-end execution. But if the product is used or if it sits unused will depend on things beyond the above.


Agree with your points on the primary two questions and the circular argument in the original article. However, re: " How is it that atoms/electrons/photons suddenly start experiencing pain? What is it, in terms of atoms/forces, that's experiencing the pain?" that's an interesting question but not necessarily fundamentally refuting of #1. If you start with #1 "Consciousness is an unknown physical something (force/particle/quantum whatever)" then it has 'perceivable' properties of it's own different from those of it's constituent atoms or electrons. A toy example is the 'wetness' of water. If you only look at atoms and molecules with no way to 'experience' water then it's hard to conceive how water can have properties (though in the case of water it is tractable)

Consciousness *may* be something similar. If it is (e.g. the purest form of energy) then it is not inconceivable that it has some properties that not not tractable if we only look at more granular manifestations of it.


Agreed! I'm skeptical of consciousness requiring some exotic new physics (a quantum phenomenon or a new form of energy or somesuch) but we can't prove that it doesn't.

Honestly, if someday a scientist proves that consciousness is a fundamental force like gravity, I would say, "yup, that makes sense!" even if I don't think it's likely.


Apart from a cool project, this evolved my perspective on what an MCP is, along with some cool architecture insights and inspiring ideas. Thank you!


Glad to hear it! This stuff is fascinating and rapidly evolving. I’ve been learning by doing. Happy hacking.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: