More

netdevphoenix · 2026-02-13T14:09:51 1770991791

> I often ask it "I have this bug. Why?" And it almost always figures it out and fixes it. Huge code base.

Is your AI PR publicly available in github?

aurareturn · 2026-02-13T14:11:31 1770991891

No. I don't do any open source work. I work for a private company.

whiplash451 · 2026-02-13T14:19:55 1770992395

These two things are not mutually exclusive.

netdevphoenix · 2026-02-13T14:06:54 1770991614

> Not my experience. It excels in existing codebases too.

Why don't you prove it?

1. Find an old large codebase in codeberg (avoiding the octopus for obvious reasons)

2. Video stream the session and make the LLM convo public

3. Ask your LLM to remove jQuery from the db and submit regular commits to a public remote branch

Then we will be able to judge if the evidence stands

aurareturn · 2026-02-13T14:09:15 1770991755

I don't have to prove it. I do it every single day at work in a real production codebase that my business relies on.

And I don't remove jQuery every day. Maybe the OP is right that Opus 4.6 sucks at removing jQuery. I don't know. I've never asked an AI to do it.

    The moment you point it at a real, existing codebase - even a small one - everything falls apart.

This statement is absolutely not true based on my experience. Codex has been amazing for me at existing code bases.

netdevphoenix · 2026-02-13T14:12:11 1770991931

Extraordinary claims require extraordinary evidence. "Works on my machine" ain't it.

aurareturn · 2026-02-13T14:15:23 1770992123

Is it an extraordinary claim that Opus 4.6 or GPT 5.3 works amazing on existing code bases in my experience?

That's funny. I feel like it's the opposite. Claiming that Opus 4.6 or GPT 5.3 fails as soon as you point them to an existing code base, big or small, is a much more extraordinary claim.

simonw · 2026-02-13T14:14:02 1770992042

What are the obvious reasons?

netdevphoenix · 2026-02-16T13:21:22 1771248082

I thought it would be obvious: OpenAI has used repos on GitHub as training data. Would be like testing someone using a past paper publicly available.

Are you planning on carrying out the experiment? Regardless of the outcome, it would be of value to developers.

simonw · 2026-02-16T15:53:57 1771257237

Why wouldn't they train on Codeberg too?

It's pretty hard to block automated uses of "git clone".

netdevphoenix · 2026-02-17T10:19:37 1771323577

Why would they? Github has 28 million public repos, Codeberg only hit 300k last year. Anyway, Codeberg was just a placeholder for 'repo source _less_ likely to be in their training data'. Codeberg was quick candidate for a place to find a big old codebase with non-sensitive data.

It is indeed hard but the guys at Codeberg are certainly an order of magnitude better than Github as they opted out of the main AI crawlers, regularly block IPs known to belong to AI startups and they allow you to make your repos only be accessible to logged in users.

You seem be going on a tangent, here. Main point was about performing a well documented test anyway.

simonw · 2026-02-17T12:33:14 1771331594

My question about the "obvious" thing was genuine - it wasn't obvious to me.

netdevphoenix · 2026-02-13T14:03:05 1770991385

For the oblivious: /s

snarf21 · 2026-02-13T14:18:58 1770992338

This one is a lot harder to tell because there are some AI bros who claim similar things but are completely serious. Even look at Show HN now: There used to be ~20-40 posts per day but now there are 20 per HOUR.

(Please oh please can we have a Show HN AI. I'm not interested in people's weekend vibe coded app to replace X popular tool. I want to check out cool projects wher people invested their passion and time.)

netdevphoenix · 2026-02-09T13:25:11 1770643511

Consensus. People like to follow what the majority does even if it's suboptimal.

em-bee · 2026-02-09T13:52:25 1770645145

not just like to follow, but are forced to follow.

netdevphoenix · 2026-02-09T09:32:21 1770629541

> Crucially the end user should then be ASKED which to enable

This doesn't work for literally 99.9% of the users out there. This is a classic HN's Dropbox symptom.

You need overridable defaults.

netdevphoenix · 2026-02-06T14:16:14 1770387374

You should wonder whether any of those devs will train themselves to become engineers and whether the supply of engineers will be lower than the demand for them. Because if any of them become true, you will likely struggle to keep your employee stats relatively the same (ie you will struggle in very specific ways) unless you are the kind of person who doesn't need to interview to land a gig at a top 10 tech company.

netdevphoenix · 2026-02-06T14:11:04 1770387064

> I'm struggling to think of any scenario that doesn't also put most white collar professions out of work alongside me

You don't need to be out of a job to struggle. Just for your pay to remain the same (or lower), for your work conditions to degrade (you think jQuery spaguetti was a mess? good luck with AI spaguetti slop) or for competition to increase because now most of the devving involves tedious fixing of AI code and the actual programming heavy jobs are as fought for as dev roles at Google/Jane Street/etc.

Devving isn't going anywhere but just like you don't punch cards anymore, you shouldn't expect your role in the coming decades to be the same as the 90s-25s period.

netdevphoenix · 2026-02-06T13:58:36 1770386316

How come that OpenAI and Anthropic both released their models pretty much at the same time? Does anyone know if the timing is coincidental?

phil917 · 2026-02-06T18:49:13 1770403753

I would bet to be ready before the Superbowl ads

netdevphoenix · 2026-02-06T10:59:10 1770375550

> are there modes of thinking that fundamentally require something other than what current LLM architectures do?

Possibly. There are likely also modes of thinking that fundamentally require something other than what current humans do.

Better questions are: are there any kinds of human thinking that cannot be expressed in a "predict the next token" language? Is there any kind of human thinking that maps into token prediction pattern such that training a model for it would not be feasible regardless of training data and compute resources?

At the end of the day, the real world value is utility, some of their cognitive handicaps are likely addressable. Think of it like the evolution of flight by natural selection, flight is usefulness to make it worth it adapt the whole body to make flight not just possible but useful and efficient. Sleep falls in this category too imo.

We will likely see similar with AI. To compensate for some of their handicaps, we might adapt our processes or systems so the original problem can be solved automatically by the models.

netdevphoenix · 2026-02-06T10:44:38 1770374678

Waiting until the moment they get good enough is not a smart thing to do either. If you are a farmer and know it is going to snow, at some point in the next 5 months, you make plans NOW, you don't wait until the temperatures drop and you see the snow falling. Right now, people are waiting for the snowfall before moving their proverbial chickens indoors

codexon · 2026-02-06T19:42:34 1770406954

Top AI researchers like Yann LeCunn have said that LLMs are a dead end.

It seems to me that LLM performance is plateuing and not improving exponentially anymore. This recent hubbub about rewriting a worse GCC for $20,000 is another example of overhype and regurgitating training data.

You don't know for sure if it is going to "snow" (AI reaches general intelligence) Snow happens frequently, AI reaching general intelligence has never happened. If it ever happens, 99% of jobs are gone and there is really nothing you can do to prepare for this other than maybe buy guns and ammo, and even that might not do anything to robotic soldiers.

People were worried about AI taking their jobs 60 years ago when perceptrons came out, and anyone who avoided a tech career because of that back then would have lost out majorly.

bossyTeacher · 2026-02-07T00:07:20 1770422840

There is no reason why an AI model capable of pushing a significant chunk of devs into lower paid and highly competitive dev jobs as a result of automation needs to be a general artificial intelligence. There is a lack of nuance that comes with thinking that either AI is dumb or it has human level general intelligence. As much as devs hate to admit it, you don't need that much of what we understand as general intelligence to write software. Only a portion of your intelligence is needed and arguably not all of it at the same time.

While general purpose models might be plateauing soon (arguably they have for a while). Highly specialised models (especially for programming) haven't necessarily plateaud yet. And anyway, existing functionality seem like a good foundation to build upon systems that remove the need of hiring as many devs. It's not the "being out of a job" that should worry you. Open up your binary thinking and consider that facing a 08 job market for the rest of your career is not the same permanent unemployment but it is not a market you would like to have.

That is the real concern.

codexon · 2026-02-07T01:34:35 1770428075

You don't need to be a genius or rocket scientist to write code, but llm don't even reach the bar for anything but the most simple things. Take a look at the video I posted earlier for an example.

And specialised models for programming HAVE plateaued.

https://livebench.ai/#/?sort=Agentic+Coding+Average

From Claude 4.1 to 4.5 was only an 18% gain, and from 4.5 to 4.6 it even DECLINED. Codex 5.1 to 5.2 also shows a decline.

codexon · 2026-02-07T04:54:36 1770440076

https://arxiv.org/abs/2510.26787

Testing the top llms on wework, the highest performing one only succeeded with a rate of 2.5%

Can you imagine not being fired when you can only do 2.5% of all tasks?

This study is dated October 30th, very recent.

netdevphoenix · 2026-02-09T09:19:17 1770628757

> Can you imagine not being fired when you can only do 2.5% of all tasks?

You are not competing against LLMs though. You are competing against people (who in a pre-LLM world wouldn't be in tech) using LLMs tools to beat you in terms of value. In the new world, you either are a top 1% dev or you beat everyone in race to the bottom pricewise. The middle will become vanishingly small. Think of manufacturing in developed countries.