Hacker Newsnew | past | comments | ask | show | jobs | submit | _se's commentslogin

It's always fun to ask Opus what it thinks about articles like this. Here's what I got with no history or system prompt:

https://burkeholland.github.io/posts/opus-4-5-change-everyth...

Read this article and ultrathink critically about it. Provide your perspective.

The article makes a strong experiential case for improved velocity in AI-assisted development but contains several argumentation weaknesses and conflations worth examining.

The projects described are legitimately non-trivial: Firebase backend integration, Facebook OAuth, iOS apps in Swift (a language the author doesn't know), GitHub Actions pipelines, scheduled cloud functions. Getting these working in hours rather than weeks represents a real capability shift. The author is honest about his limitations and uncertainties, particularly the security concerns.

Where the argument breaks down:

1. "Replace developers" vs "dramatically augment developers" The author's own workflow contradicts the headline claim. He's still:

Making architectural decisions (choosing Firebase) Handling errors Opus couldn't see (XAML via Visual Studio) Writing custom prompts to shape output quality Manually auditing security Making product and UX decisions

This is developer work. The tool changed; the role didn't disappear.

2. The 80% security confidence undermines his thesis He admits he's shipping apps with "80% confidence" in security and calls it "too damn low." This is the crux: the AI accelerated production but didn't replace the judgment required to responsibly ship production software. The velocity gain exposed a competence gap rather than closing it.

3. Sample bias in project selection

All examples are:

Greenfield (no existing codebase) Single developer Personal/family use Standard patterns with excellent documentation (Firebase, SwiftUI, React Native) No regulatory, compliance, or scale requirements No team collaboration or handoff considerations

These constraints define a specific problem space where AI excels. Extrapolating to "developers are replaceable" ignores the majority of professional software work.

4. "Code doesn't need human readability" is underbaked His argument is circular: "Why optimize for human readability when the AI is doing all the work?" But:

His 80% security confidence exists because he can't read the code He had to use external tools (VS) when Opus couldn't diagnose errors What happens when context windows are exceeded and the LLM loses track? Model behavior changes between versions; human-readable code is version-agnostic

The custom prompt he shares actually encodes many good engineering practices (minimal coupling, explicit state, linear control flow) that benefit LLMs and humans. The "no comments needed" claim conflates what's optimal for LLM regeneration with what's optimal for debugging production issues at 3am. What's actually being demonstrated

The honest version of this article would be: Opus 4.5 dramatically compresses the gap between "can write code" and "can ship a personal app" for a specific class of greenfield projects. That's genuinely transformative for hobbyists, indie developers, and people solving their own problems. But that's different from "replacing developers." The article demonstrates a power tool; power tools don't eliminate tradespeople.


There's something eerily recursive about Opus 4.5’s sensible take calming the anxiety about Opus 4.5’s capabilities and impact. It's probably the right take, but I feel weird the most pragmatic response to this article is from said model.

DHH has long past the point where anyone should be caring about his technical opinions. This is a 0 substance post.


> DHH has long past the point where anyone should be caring about his technical opinions. This is a 0 substance post.

Can you elaborate?


What can be stated without evidence can be dismissed without evidence. It is IMO pretty clear to me there is no substance to this post, without knowing anything about the author.

In general most such claims today are without substance, as they are made without any real metrics, and the metrics we actually need we just don't have. I.e. we need to quantify the technical debt of LLM code, how often it has errors relative to human-written code, and how critical / costly those errors are in each case relative to the cost of developer wages, and also need to be clear if the LLM usage is just boilerplate / webshit vs. on legacy codebases involving non-trivial logic and/or context, and whether e.g. the velocity / usefulness of the LLM-generated code decreases as the codebase grows, and etc.

Otherwise, anyone can make vague claims that might even be in earnest, only to have e.g. studies show that in fact the productivity is reduced, despite the developer "feeling" faster. Vague claims are useless at this point without concrete measurements and numbers.


This study does a good job of measuring the productivity impact. It found 1% uplift in dev productivity from using AI.

https://youtu.be/JvosMkuNxF8?si=J9qCjE-RvfU6qoU0


Actually it didn't

From the video summary itself:

> We’ll unpack why identical tools deliver ~0% lift in some orgs and 25%+ in others.

At https://youtu.be/JvosMkuNxF8?t=145 he says the median is 10% more productivity, and looking at the chart we can see a 19% increase for the top teams (from July 2025).

The paper this is based on doesn't seem to be available which is frustrating though!


I think you are quoting productivity measured before checking the code actually works and correcting it. After re-work productivity drops to 1%. Tinestamp 14:04.

That was from a single company, not across the cohort.

My bad. What was the result when they measured productivity after rework across the entire co hort?

They don't publish it as far as I can see!

In any case, IMHO I think AI SWE has happened in 3 phases:

Pre-Sonnet 3.7 (Feb 2025): Autocomplete worked.

Sonnet 3.7 to Codex 5.2/Opus 4.5 (Feb 2025-Nov 2025): Agentic coding started working, depending on your problem space, ambition and the model you chose

Post Opus 4.5 (Nov 2025): Agentic coding works in most circumstances

This study was published July 2025. For most of the study timeframe it isn't surprising to me that it was more trouble than it was worth.

But it's different now, so I'm not sure the conclusions are particularly relevant anymore.

As DHH pointed out: AI models are now good enough.


Sorry for the late response!

My guess is they didn't publish it because they only measured it at one company, if they had the data across the cohort they would have published.

The general result that review/re-wrok can cancel out the productivity gains is supported by other studies

AI generated code is 1.7x more buggy vs human generated code: https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-gen...

Individual dev productivity gains are offset by peers having to review the verbose (and buggy) AI code: https://www.faros.ai/blog/ai-software-engineering

On agentic being the saviour for productivity, Meta measured a 6-12% productivity boost from agents programming: https://www.youtube.com/watch?v=1OzxYK2-qsI&si=ABTk-2RZM-leT...

"But it's different now" :)


Great example of something that actually has some substance beyond meaningless anecdotes.


The claim was > DHH has long past the point where anyone should be caring about his technical opinions.

I asked for evidence, you are replying to something else.


The Marshall fire was 4 years ago. Almost to the day.


AI slop garbage


5 years? That's so interesting bro. Tell us more about what your thoughts on ChatGPT were back in 2020.

Go ahead and double check when the LLM craze started and perhaps reconsider making things up.


May 2018: AI winter is well on its way (piekniewski.info) [1]

January 2020: Researchers: Are we on the cusp of an ‘AI winter’? (bbc.co.uk) [2]

I'm sure you can easily find more. Felt good to be called a "bro", though, made me feel younger.

[1] HN discussion, almost 500 comments: https://news.ycombinator.com/item?id=17184054

[2] HN discussion on BBC article, ~110 comments: https://news.ycombinator.com/item?id=22069204


No, the new "Year of the Linux Desktop" is "This Year Software Engineers Won't Exist Any More Because Of LLMs". Very obviously.


Can you really not see the irony in posting this comment? Have you no self-awareness whatsoever?

Peak HN, absolutely hilarious!


The hottest discussion on the first page is how big of a meanie Andrew Kelley is. Yeah the sites a vibe for sure.


Don't tell the Posthog guys about this. Far too much collaboration going on here!!!


Giving people cardio advice while being completely unaware of what zone 2 training is.

Peak HN right here. The epitome of confidently incorrect


Type checking is done statically without running the program. You don't need to execute any run-time logic to perform a check like this. What you are suggesting is a much, much weaker form of verification that doesn't require any type system at all.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: