For some reason people are perfectly able to understand this in the context of, say, cursive, calculator use, etc., but when it comes to their own skillset somehow it's going to be really different.
No, it hasn't. I did not have a problem before AI with people sending in gigantic pull requests that made absolutely no sense, and justifying them with generated responses that they clearly did not understand. This is not a thing that used to happen. That's not to say people wouldn't have done it if it were possible, but there was a barrier to submitting a pull request that no longer exists.
I'm mostly surprised that people found the output quality of Opus 4.6 good enough... 4.7 so far is a pretty sizable improvement for the stuff I care about. I don't really care how cheap 4.6 was per task when 90% of the tasks weren't actually being done correctly. Or maybe it's that people like the LLM agreeing with them blindly while sneakily doing something else under the hood? Did people enjoy Claude routinely disregarding their instructions? Not really sure I understand, I truly found 4.6 immensely frustrating (from the getgo, not just the "pre-nerf" version, whatever that means). 4.7 is a buggy mess, it's slow, and it costs a lot per token. It's also a huge breath of fresh air because it actually seems to make a good faith effort at doing the thing you asked it to do, and doesn't waste your time with irrelevant nonsense just to make it look busy or because it thinks you want that nonsense (I mean, it still does all of these things to some extent, but so far it seems like it does them much less than 4.6 did).
Disclaimer: I'm always running on max and don't really have token limits so I am in a position not to care about cost per token. But I am not surprised by the improved benchmark results at all, 4.6 was really not nearly as strong of a model as people seem to remember it being.
It's not reality. I'm really not a fan of the way that people excuse the really terrible code LLMs write by claiming that people write code just as bad. Even if that were true, it is not true that when you ask those people to do otherwise they simply pretend to have done it and forget you asked later.
Yes and both are right. It’s a matter of which is working as expected and making fewer mistakes more often. And as someone using Claude Code heavily now, I would say we’re already at a point where AI wins.
> it is not true that when you ask those people to do otherwise they simply pretend to have done it and forget you asked later.
I had a coworker that more or less exactly did that. You left a comment in a ticket about something extra to be done, he answered "yes sure" and after a few days proceeded to close the ticket without doing the thing you asked. Depending on the quantity of work you had at the moment, you might not notice that until after a few months, when the missing thing would bite you back in bitter revenge.
You may have had one. It clearly made a pretty negative impression on you because you are still complaining about them years later. I find it pretty misanthropic when people ascribe this kind of antisocial behavior to all of their coworkers.
It's still relatively recent. Anyway I'm not saying everyone is like this, absolutely (not even an important chunk), but they do exist.
At the same time it's not true that current LLMs only write terrible code.
"Even if that were true, it is not true that when you ask those people to do otherwise they simply pretend to have done it and forget you asked later."
The point is, that's not the typical experience and people like that can be replaced. We don't willingly bring people like that on our teams, and we certainly don't aim to replace entire teams with clones of this terrible coworker prototype.
Not only have i never had a coworker as bad as these people describe, the point is as you say: why would I want an LLM that works like these people's shitty coworkers?
My worst coworkers right now are the ones using Claude to write every word of code and don't test it. These are people who never produced such bad code on their own.
So the LLMs aren't just as bad as the bad coworkers, they're turning good coworkers into bad ones!
Couple of reasons, but mainly speed and avaiability.
I can give Claude a job anytime and it will do it immediately.
And yes, I will have to double check anything important, but I am way better and faster at checking, than doing it myself.
So obviously I don't want a shitty LLM as coworker, but a competent one. But the progress they made is pretty astonishing and they are good enough now that I started really integrating them.
In the long run, good code makes everyone much happier than code that is bad because people are being "nice" and letting things slide in code review to avoid confrontation.
...but seriously... there was the "up until 1850" LLM or whatever... can we make an "up until 1920 => 1990 [pre-internet] => present day" and then keep prodding the "older ones" until they "invent their way" to the newer years?
We knew more in 1920 than we did in 1850, but can a "thinking machine" of 1850-knowledge invent 1860's knowledge via infinite monkeys theorem/practice?
The same way that in 2025/2026, Knuth has just invented his way to 2027-knowledge with this paper/observation/finding? If I only had a beowulf cluster of these things... ;-)
But a query optimizer only matters once you have an established business with large customers.
You seem to be implying Salesforce’s business is successful because they have their own query optimizer. But the causality is reversed. Salesforce has their own query optimizer because they’ve built a successful business.
My point is that a lot of people think it'd be really easy to build the next Salesforce until they actually try to compete with Salesforce in the market. Like it or not, if you want to build a Salesforce competitor (or try to get your company to build its own) you're going to be compared to actual Salesforce, not the version of Salesforce that existed when the market was new.
> But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms
I don't find that LLMs are any more likely than humans to remember to update all of the places it wrote redundant functions. Generally far less likely, actually. So forgive me for treating this claim with a massive grain of salt.
The environment is why I quit my job and started working for myself in January. I hated it. And not to sound like an arrogant ass because there were a LOT of way smarter people than me at $PREVIOUS_EMPLOYER, but having to have meetings to set our meetings, having to explain things that aren't statistically meaningful to people who don't understand stats anyway, and getting code reviews (when I could get them scheduled) from dudes who hadn't touched a keyboard in 5 years was... soul sucking? I'm not doing that anymore. Or ever again.
I mean, maybe it's because I had a more hands-on blue-collar adjacent job before I got into tech? Maybe it's because I'm a fool and couldn't play the game of "pretend to work and look busy. But - and I know this might be kind of messed up - I really like not having to explain things in a series of emails to people other than the customers. I really like not having to answer to anyone but my self and my customers. If I want to do something, well, I just do it now? That's a nice place to be. Riskier for sure, but I think the prior environment would have killed me, so maybe not.
Also, I have time to do shit that's interesting? Who would have guessed how much more time I'd have in the day when I didn't have 4.5 hours of meetings per day? Hell, I'm taking 2 classes at the university for fun (weird right?!) - I never could have done that before because I would have had to make a slide deck for Thursdays All-Hands or whatever and couldn't have missed the SUPER IMPORTANT MEETING that Jake has on the schedule that he'll show up for unprepared or just not show up to.
It's ironic that I have to tell you of all people this, but many users of C (or at least, backends of compilers targeted by C) do actually want the compiler to aggressively optimize around UB.
reply