Read this article and ultrathink critically about it. Provide your perspective.
The article makes a strong experiential case for improved velocity in AI-assisted development but contains several argumentation weaknesses and conflations worth examining.
The projects described are legitimately non-trivial: Firebase backend integration, Facebook OAuth, iOS apps in Swift (a language the author doesn't know), GitHub Actions pipelines, scheduled cloud functions. Getting these working in hours rather than weeks represents a real capability shift. The author is honest about his limitations and uncertainties, particularly the security concerns.
Where the argument breaks down:
1. "Replace developers" vs "dramatically augment developers"
The author's own workflow contradicts the headline claim. He's still:
Making architectural decisions (choosing Firebase)
Handling errors Opus couldn't see (XAML via Visual Studio)
Writing custom prompts to shape output quality
Manually auditing security
Making product and UX decisions
This is developer work. The tool changed; the role didn't disappear.
2. The 80% security confidence undermines his thesis
He admits he's shipping apps with "80% confidence" in security and calls it "too damn low." This is the crux: the AI accelerated production but didn't replace the judgment required to responsibly ship production software. The velocity gain exposed a competence gap rather than closing it.
3. Sample bias in project selection
All examples are:
Greenfield (no existing codebase)
Single developer
Personal/family use
Standard patterns with excellent documentation (Firebase, SwiftUI, React Native)
No regulatory, compliance, or scale requirements
No team collaboration or handoff considerations
These constraints define a specific problem space where AI excels. Extrapolating to "developers are replaceable" ignores the majority of professional software work.
4. "Code doesn't need human readability" is underbaked
His argument is circular: "Why optimize for human readability when the AI is doing all the work?" But:
His 80% security confidence exists because he can't read the code
He had to use external tools (VS) when Opus couldn't diagnose errors
What happens when context windows are exceeded and the LLM loses track?
Model behavior changes between versions; human-readable code is version-agnostic
The custom prompt he shares actually encodes many good engineering practices (minimal coupling, explicit state, linear control flow) that benefit LLMs and humans. The "no comments needed" claim conflates what's optimal for LLM regeneration with what's optimal for debugging production issues at 3am.
What's actually being demonstrated
The honest version of this article would be: Opus 4.5 dramatically compresses the gap between "can write code" and "can ship a personal app" for a specific class of greenfield projects. That's genuinely transformative for hobbyists, indie developers, and people solving their own problems.
But that's different from "replacing developers." The article demonstrates a power tool; power tools don't eliminate tradespeople.
There's something eerily recursive about Opus 4.5’s sensible take calming the anxiety about Opus 4.5’s capabilities and impact. It's probably the right take, but I feel weird the most pragmatic response to this article is from said model.
What can be stated without evidence can be dismissed without evidence. It is IMO pretty clear to me there is no substance to this post, without knowing anything about the author.
In general most such claims today are without substance, as they are made without any real metrics, and the metrics we actually need we just don't have. I.e. we need to quantify the technical debt of LLM code, how often it has errors relative to human-written code, and how critical / costly those errors are in each case relative to the cost of developer wages, and also need to be clear if the LLM usage is just boilerplate / webshit vs. on legacy codebases involving non-trivial logic and/or context, and whether e.g. the velocity / usefulness of the LLM-generated code decreases as the codebase grows, and etc.
Otherwise, anyone can make vague claims that might even be in earnest, only to have e.g. studies show that in fact the productivity is reduced, despite the developer "feeling" faster. Vague claims are useless at this point without concrete measurements and numbers.
> We’ll unpack why identical tools deliver ~0% lift in some orgs and 25%+ in others.
At https://youtu.be/JvosMkuNxF8?t=145 he says the median is 10% more productivity, and looking at the chart we can see a 19% increase for the top teams (from July 2025).
The paper this is based on doesn't seem to be available which is frustrating though!
I think you are quoting productivity measured before checking the code actually works and correcting it. After re-work productivity drops to 1%. Tinestamp 14:04.
In any case, IMHO I think AI SWE has happened in 3 phases:
Pre-Sonnet 3.7 (Feb 2025): Autocomplete worked.
Sonnet 3.7 to Codex 5.2/Opus 4.5 (Feb 2025-Nov 2025): Agentic coding started working, depending on your problem space, ambition and the model you chose
Post Opus 4.5 (Nov 2025): Agentic coding works in most circumstances
This study was published July 2025. For most of the study timeframe it isn't surprising to me that it was more trouble than it was worth.
But it's different now, so I'm not sure the conclusions are particularly relevant anymore.
As DHH pointed out: AI models are now good enough.
Type checking is done statically without running the program. You don't need to execute any run-time logic to perform a check like this. What you are suggesting is a much, much weaker form of verification that doesn't require any type system at all.
https://burkeholland.github.io/posts/opus-4-5-change-everyth...
Read this article and ultrathink critically about it. Provide your perspective.
The article makes a strong experiential case for improved velocity in AI-assisted development but contains several argumentation weaknesses and conflations worth examining.
The projects described are legitimately non-trivial: Firebase backend integration, Facebook OAuth, iOS apps in Swift (a language the author doesn't know), GitHub Actions pipelines, scheduled cloud functions. Getting these working in hours rather than weeks represents a real capability shift. The author is honest about his limitations and uncertainties, particularly the security concerns.
Where the argument breaks down:
1. "Replace developers" vs "dramatically augment developers" The author's own workflow contradicts the headline claim. He's still:
Making architectural decisions (choosing Firebase) Handling errors Opus couldn't see (XAML via Visual Studio) Writing custom prompts to shape output quality Manually auditing security Making product and UX decisions
This is developer work. The tool changed; the role didn't disappear.
2. The 80% security confidence undermines his thesis He admits he's shipping apps with "80% confidence" in security and calls it "too damn low." This is the crux: the AI accelerated production but didn't replace the judgment required to responsibly ship production software. The velocity gain exposed a competence gap rather than closing it.
3. Sample bias in project selection
All examples are:
Greenfield (no existing codebase) Single developer Personal/family use Standard patterns with excellent documentation (Firebase, SwiftUI, React Native) No regulatory, compliance, or scale requirements No team collaboration or handoff considerations
These constraints define a specific problem space where AI excels. Extrapolating to "developers are replaceable" ignores the majority of professional software work.
4. "Code doesn't need human readability" is underbaked His argument is circular: "Why optimize for human readability when the AI is doing all the work?" But:
His 80% security confidence exists because he can't read the code He had to use external tools (VS) when Opus couldn't diagnose errors What happens when context windows are exceeded and the LLM loses track? Model behavior changes between versions; human-readable code is version-agnostic
The custom prompt he shares actually encodes many good engineering practices (minimal coupling, explicit state, linear control flow) that benefit LLMs and humans. The "no comments needed" claim conflates what's optimal for LLM regeneration with what's optimal for debugging production issues at 3am. What's actually being demonstrated
The honest version of this article would be: Opus 4.5 dramatically compresses the gap between "can write code" and "can ship a personal app" for a specific class of greenfield projects. That's genuinely transformative for hobbyists, indie developers, and people solving their own problems. But that's different from "replacing developers." The article demonstrates a power tool; power tools don't eliminate tradespeople.
reply