More

Jweb_Guru · 2026-04-18T20:20:56 1776543656

For some reason people are perfectly able to understand this in the context of, say, cursive, calculator use, etc., but when it comes to their own skillset somehow it's going to be really different.

Jweb_Guru · 2026-04-18T20:18:03 1776543483

No, it hasn't. I did not have a problem before AI with people sending in gigantic pull requests that made absolutely no sense, and justifying them with generated responses that they clearly did not understand. This is not a thing that used to happen. That's not to say people wouldn't have done it if it were possible, but there was a barrier to submitting a pull request that no longer exists.

Jweb_Guru · 2026-04-18T03:43:47 1776483827

I'm mostly surprised that people found the output quality of Opus 4.6 good enough... 4.7 so far is a pretty sizable improvement for the stuff I care about. I don't really care how cheap 4.6 was per task when 90% of the tasks weren't actually being done correctly. Or maybe it's that people like the LLM agreeing with them blindly while sneakily doing something else under the hood? Did people enjoy Claude routinely disregarding their instructions? Not really sure I understand, I truly found 4.6 immensely frustrating (from the getgo, not just the "pre-nerf" version, whatever that means). 4.7 is a buggy mess, it's slow, and it costs a lot per token. It's also a huge breath of fresh air because it actually seems to make a good faith effort at doing the thing you asked it to do, and doesn't waste your time with irrelevant nonsense just to make it look busy or because it thinks you want that nonsense (I mean, it still does all of these things to some extent, but so far it seems like it does them much less than 4.6 did).

Disclaimer: I'm always running on max and don't really have token limits so I am in a position not to care about cost per token. But I am not surprised by the improved benchmark results at all, 4.6 was really not nearly as strong of a model as people seem to remember it being.

Jweb_Guru · 2026-04-07T08:10:48 1775549448

Yup. Every single time it's about to do the dumbest thing I've seen in my life.

Jweb_Guru · 2026-03-07T06:51:30 1772866290

It's not reality. I'm really not a fan of the way that people excuse the really terrible code LLMs write by claiming that people write code just as bad. Even if that were true, it is not true that when you ask those people to do otherwise they simply pretend to have done it and forget you asked later.

imiric · 2026-03-07T07:28:37 1772868517

It's an easy copout.

Tool works as expected? It's superintelligence. Programming is dead.

Tool makes dumb mistake? So do humans.

brabel · 2026-03-07T08:31:04 1772872264

Yes and both are right. It’s a matter of which is working as expected and making fewer mistakes more often. And as someone using Claude Code heavily now, I would say we’re already at a point where AI wins.

darkwater · 2026-03-07T08:47:43 1772873263

> it is not true that when you ask those people to do otherwise they simply pretend to have done it and forget you asked later.

I had a coworker that more or less exactly did that. You left a comment in a ticket about something extra to be done, he answered "yes sure" and after a few days proceeded to close the ticket without doing the thing you asked. Depending on the quantity of work you had at the moment, you might not notice that until after a few months, when the missing thing would bite you back in bitter revenge.

Jweb_Guru · 2026-03-07T17:30:25 1772904625

You may have had one. It clearly made a pretty negative impression on you because you are still complaining about them years later. I find it pretty misanthropic when people ascribe this kind of antisocial behavior to all of their coworkers.

darkwater · 2026-03-07T17:56:51 1772906211

It's still relatively recent. Anyway I'm not saying everyone is like this, absolutely (not even an important chunk), but they do exist. At the same time it's not true that current LLMs only write terrible code.

lukan · 2026-03-07T09:26:16 1772875576

"Even if that were true, it is not true that when you ask those people to do otherwise they simply pretend to have done it and forget you asked later."

I admire your experience with people.

dns_snek · 2026-03-07T10:43:12 1772880192

The point is, that's not the typical experience and people like that can be replaced. We don't willingly bring people like that on our teams, and we certainly don't aim to replace entire teams with clones of this terrible coworker prototype.

queenkjuul · 2026-03-07T15:57:08 1772899028

Not only have i never had a coworker as bad as these people describe, the point is as you say: why would I want an LLM that works like these people's shitty coworkers?

My worst coworkers right now are the ones using Claude to write every word of code and don't test it. These are people who never produced such bad code on their own.

So the LLMs aren't just as bad as the bad coworkers, they're turning good coworkers into bad ones!

lukan · 2026-03-08T05:51:38 1772949098

Couple of reasons, but mainly speed and avaiability.

I can give Claude a job anytime and it will do it immediately.

And yes, I will have to double check anything important, but I am way better and faster at checking, than doing it myself.

So obviously I don't want a shitty LLM as coworker, but a competent one. But the progress they made is pretty astonishing and they are good enough now that I started really integrating them.

ttoinou · 2026-03-07T08:15:13 1772871313

No but they will despise you for bringing the problem up

Jweb_Guru · 2026-03-07T17:15:57 1772903757

In the long run, good code makes everyone much happier than code that is bad because people are being "nice" and letting things slide in code review to avoid confrontation.

Jweb_Guru · 2026-03-03T16:04:54 1772553894

I assure you that LLM thinking also has a speed limit.

ramses0 · 2026-03-03T17:18:59 1772558339

But imagine a beowulf cluster of them... /s

...but seriously... there was the "up until 1850" LLM or whatever... can we make an "up until 1920 => 1990 [pre-internet] => present day" and then keep prodding the "older ones" until they "invent their way" to the newer years?

We knew more in 1920 than we did in 1850, but can a "thinking machine" of 1850-knowledge invent 1860's knowledge via infinite monkeys theorem/practice?

The same way that in 2025/2026, Knuth has just invented his way to 2027-knowledge with this paper/observation/finding? If I only had a beowulf cluster of these things... ;-)

Jweb_Guru · 2026-02-22T16:05:16 1771776316

Salesforce literally has its own query optimizer, you are vastly underestimating the complexity of its software.

hippo22 · 2026-02-22T18:13:57 1771784037

But a query optimizer only matters once you have an established business with large customers.

You seem to be implying Salesforce’s business is successful because they have their own query optimizer. But the causality is reversed. Salesforce has their own query optimizer because they’ve built a successful business.

Jweb_Guru · 2026-02-25T15:48:41 1772034521

My point is that a lot of people think it'd be really easy to build the next Salesforce until they actually try to compete with Salesforce in the market. Like it or not, if you want to build a Salesforce competitor (or try to get your company to build its own) you're going to be compared to actual Salesforce, not the version of Salesforce that existed when the market was new.

Jweb_Guru · 2026-02-22T08:28:36 1771748916

> But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms

I don't find that LLMs are any more likely than humans to remember to update all of the places it wrote redundant functions. Generally far less likely, actually. So forgive me for treating this claim with a massive grain of salt.

Jweb_Guru · 2026-02-19T12:34:18 1771504458

This comment expresses how it feels to work in a corporate environment better than anything I've ever seen on this site.

piloto_ciego · 2026-02-19T16:24:37 1771518277

This is validating, thanks.

The environment is why I quit my job and started working for myself in January. I hated it. And not to sound like an arrogant ass because there were a LOT of way smarter people than me at $PREVIOUS_EMPLOYER, but having to have meetings to set our meetings, having to explain things that aren't statistically meaningful to people who don't understand stats anyway, and getting code reviews (when I could get them scheduled) from dudes who hadn't touched a keyboard in 5 years was... soul sucking? I'm not doing that anymore. Or ever again.

I mean, maybe it's because I had a more hands-on blue-collar adjacent job before I got into tech? Maybe it's because I'm a fool and couldn't play the game of "pretend to work and look busy. But - and I know this might be kind of messed up - I really like not having to explain things in a series of emails to people other than the customers. I really like not having to answer to anyone but my self and my customers. If I want to do something, well, I just do it now? That's a nice place to be. Riskier for sure, but I think the prior environment would have killed me, so maybe not.

Also, I have time to do shit that's interesting? Who would have guessed how much more time I'd have in the day when I didn't have 4.5 hours of meetings per day? Hell, I'm taking 2 classes at the university for fun (weird right?!) - I never could have done that before because I would have had to make a slide deck for Thursdays All-Hands or whatever and couldn't have missed the SUPER IMPORTANT MEETING that Jake has on the schedule that he'll show up for unprepared or just not show up to.

Nah, the hell with that. I'm never going back.

Jweb_Guru · 2026-02-17T06:56:29 1771311389

It's ironic that I have to tell you of all people this, but many users of C (or at least, backends of compilers targeted by C) do actually want the compiler to aggressively optimize around UB.

WalterBright · 2026-02-17T17:17:21 1771348641

I'm well aware of that. We've had many, many discussions of that in the D forums.