In case the author is reading this, I have the receipts on how there's a real step function in how much software I build, especially lately. I am not going to put any number on it because that makes no sense, but I certainly push a lot of code that reasonably seems to work.
The reason it doesn't show up online is that I mostly write software for myself and for work, with the primary goal of making things better, not faster. More tooling, better infra, better logging, more prototyping, more experimentation, more exploration.
Here's my opensource work: https://github.com/orgs/go-go-golems/repositories . These are not just one-offs (although there's plenty of those in the vibes/ and go-go-labs/ repositories), but long-lived codebases / frameworks that are building upon each other and have gone through many many iterations.
I have linked my github above. I don't know how that fares in the bigger scope of things, but I went from 0 opensource to hundreds of tools and frameworks and libraries. Putting a number on "productivity" makes no sense to me, I would have no idea what that means.
I generate between 10-100k lines of code per day these days. But is that a measure of productivity? Not really...
He said "generate". This is trivial to do. And probably this is what Amodei meant when he said 90% of code would be AI by now. It doesn't meant that generated code is actually useful and gets checked in.
Trivial is a pretty big word in this context. Expanding an idea into some sort of code is indeed a matter of waiting. The idea, the prompt, the design of the overall workflow to leverage the capabilities of llms/agents in a professional/long-lived codebase context is far from trivial, imo.
I tuned in to a random spot at a random episode, didn't see any coding but did get to hear you say:
"I'm a person who hates art now...I never want to see art again. All I want to see is like, AI stuff. That's how bad it's gotten. Handmade? nuh-uh. Handmade code? ... anything by humans, just over. I'm just gonna watch pixels."
I'm always a very serious person while I wait for people to join the stream. I'm sorry you weren't impressed, but tbf that's not really my goal, I just like building things and yapping about it.
I'm not sure why you bother yapping about it yourself. It's too human. Just give an LLM a list of lowercase bullet points and have an AI voiceover read them. It'll be 10x more efficient.
I very often put some random idea into the llm slot machine that is manus, and use the result as a starting point to remold it into a proper tool, and extracting the relevant pieces as reusable packages. I’ve got a pretty wide treesitter/lsp/git based set of packages to manage llm output and assist with better code reviews.
Also, every llm PR comes with _extensive_ documentation / design documents / changelogs, by the nature of how these things work, which helps both humans and llm-asssisted code review tools.
Since I get downvoted because I guess people don’t believe me, I’m sitting at breakfast reading a book. I suddenly think about yaml streaming parsing, start a gpt research, dig a bit deeper into streaming parser approaches, and launch a deep research on streaming parsing which I will print out and read tomorrow at breakfast and go through by hand. I then take some of the gpt discussion and paste it into Manus, saying:
“ Write a streaming go yaml parsers based on the tokenizer (probably use goccy yaml if there is no tokenizer in the standard yaml parser), and provide an event callback to the parser which can then be used to stream and print to the output.
Make a series of test files and verify they are streamed properly.”
This is the slot machine. It might work, it might be 50% jank, it might be entire jank. It’ll be a few thousand lines of code that I will skim and run. In the best case, it’s a great foundation to more properly work on. In the worst case it was an interesting experiment and I will learn something about either prompting Manus, or streaming parsing, or both.
I certainly won’t dedicate my full code review attention to what was generated. Think of it more as a hyper specific google search returning stackoverflow posts that go into excruciating detail.
Same. On many days 90% of my code output by lines is Claude generated and things that took me a day now take well under an hour.
Also, a good chunk of my personal OSS projects are AI assisted. You probably can't tell from looking at them, because I have strict style guides that suppress the "AI style", and I don't really talk about how I use AI in the READMEs. Do you also expect I mention that I used Intellisense and syntax highlighting too?
The author’s main point is that there hasn’t been an uptick in total code shipped, as you would expect if people are 10x-ing their productivity. Whether folks admit to using AI in their workflow is irrelevant.
Their main point is "AI coding claims don't add up", as shown by the amount of code shipped. I personally do think some of the more incredible claims about AI coding add up, and am happy to talk about it based on my "evidence", ie the software I am building. 99.99% of my code is ai generated at this point, with the occasional one line I fill in because it'd be stupid to wait for an LLM to do it.
For example, I've built 5-6 iphone apps, but they're kind of one-offs and I don't know why I would put them up on the app store, since they only scratch my own itches.
I'd suspect that a very large proportion of code has always been "private code" written for personal or intra-organizational purposes, and which never get released publicly.
But if we expect the ratio of this sort of private code to publicly-released code to remain relatively stable, which I think is a reasonable expectation, then we'd expect there to be a proportional increase in both private and public code as a result of any situation that increased coding productivity generally.
So the absence of a notable increase in the volume of public code either validates the premise that LLMs are not actually creating a general productivity boost for software development, or instead points to its productivity gains being concentrated entirely in projects that never do get released, which would raise the question of why that might be.
Oh yeah, I love building one off tools with it. I am working on a game mod with a friend, we are hand writing the code that runs when you play it, but we vibe code all sorts of dev tools to help us test and iterate on it faster.
Do internal, narrow purpose dev tools count as shipped code?
This seems to be a common thread. For personal projects where most details aren't important, they are good at meeting the couple things that are important to you and filling in the rest with reasonable, mostly-good-enough guesses. But the more detailed the requirements are, the less filler code there is, and the more each line of code matters. In those situations it's probably faster to type the line of code than to type the English equivalent and hand-hold the assistant through the editing process.
I don't think so, although I think at that point experience heavily comes into play. With GPT-5 especially, I can basically point cursor/codex at a repo and say "refactor this to this pattern" and come back 25 minutes later to a pretty much impeccable result. In fact that's become my favourite past time lately.
I linked some examples higher up, but I've been maintaining a lot of packages that I started slightly before chatgpt and then refactored and worked on as I progressively moved to the "entirely AI generated" workflow I have today.
I don't think it's an easy skill (not saying that to make myself look good, I spent an ungodly amount of time exploring programming with LLMs and still do), akin to thinking at a strategic level vs at a "code" level.
Certain design patterns also make it much easier to deal with LLM code: state reducers (redux/zustand for example), event-driven architectures, component-based design systems, building many CLI tools that the agent can invoke to iterate and correct things, as do certain "tools" like sqlite/tmux (by that I mean just telling the LLM "btw you can use tmux/sqlite", you allow it to pass hurdles that would otherwise just make it spiral into slop-ratatouille).
I also think that a language like go was a really good coincidence, because it is so amenable to LLM-ification.
I don’t think this is necessarily true. People that didn’t ship before still don’t ship. My ‘unshipped projects’ backlog is still nearly as large. It’s just got three new entries in the past two months instead of one.
>Do you also expect I mention that I used Intellisense and syntax highlighting too?
No, but I expect my software to have been verified for correctness, and soundness by a human being with a working mental model of how the code works. But, I guess that's not a priority anymore if you're willing to sacrifice $2400 a year to Anthropic.
$2400? Mate, I have a free GitHub Copilot subscription (Microsoft hands them out to active OSS developers), and work pays for my Claude Code via our cloud provider backend (and it costs less per working day than my morning Monster can). LLM inference is _cheap_ and _getting cheaper every month_.
> No, but I expect my software to have been verified for correctness, and soundness by a human being with a working mental model of how the code works.
This is not exclusive with AI tools:
- Use AI to write dev tools to help you write and verify your handwritten code. Throw the one-off dev tools in the bin when you're done.
- Handwrite your code, generate test data, review the test data like you would a junior engineer's work.
- Handwrite tests, AI generate an implementation, have the agent run tests in a loop to refine itself. Works great for code that follows a strict spec. Again, review the code like you would a junior engineer's work.
Agree. In the hands of a seasoned dev not only does productivity improve but the quality of outputs.
If I’m working against a deadline I feel more comfortable spending time on research and design knowing I can spend less time on implementation. In the end, it took the same amount of time, though hopefully with an increase of reliability, observability, and extendibility. None of these things show up in the author’s faulty dataset and experiment.
Not sure what you mean? This was a demo in a live session that took about 30 minutes, including ui ideation (see pngs). It’s a reasonably well featured app and the code is fairly minimal. I wouldn’t be able to write something like that in 30 minutes by hand.
The reason it doesn't show up online is that I mostly write software for myself and for work, with the primary goal of making things better, not faster. More tooling, better infra, better logging, more prototyping, more experimentation, more exploration.
Here's my opensource work: https://github.com/orgs/go-go-golems/repositories . These are not just one-offs (although there's plenty of those in the vibes/ and go-go-labs/ repositories), but long-lived codebases / frameworks that are building upon each other and have gone through many many iterations.