Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A compiler is another thing whose honor and pride that the models have taken from the nerds. In the past, people would debate for hours about the “dragon book” v.s. “writing interpreters” and present their cool bespoke compilers in Show HN articles. Now models can produce 100,000 lines of code over two weeks with no human intervention that actually work and can compile significant project. Which way now nerd? The models are getting better, are you?

The article has some really odd low level descriptions of bash orchestration which I suppose are important to illustrate how barebones it was. However I always feel it odd when we’re talking about agents that are lauded as borderline super intelligence and there is still low level bash being slung around – feels like we’re talking about things at the wrong level.

The point about writing extremely high quality tests reminds me a bit of the “hot mess theory of AI” (https://alignment.anthropic.com/2026/hot-mess-of-ai/) also made by anthropic where they essentially say that long horizon tasks are more likely to fall to incoherency than for a model to purposefully pursue incorrect results. This is phrased in the article as “Claude will work autonomously to solve whatever problem I give it. So it’s important that the task verifier is nearly perfect, otherwise Claude will solve the wrong problem”.

The author also observes something that I’ve realised after the initial joy of seeing an agent one shot a task wore off – for a 30 minute agent task, 25 minutes may be spent doing exploration of the environment. While it would be an offence to give a human unvetted model generated documentation and runbooks (I’m looking at you emoji ridden README.md files becoming more common across Show HN), models should commit things like this to memory for themselves to avoid repeatedly paying the “discovery tax” on every new action. Errors, hallucinations or changes cause the generated docs to fail create more busywork for the agent but agent time is less valuable than finite human life.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: