I would say it varies from 0x to a modest 2x. It can help you write good code quickly, but, I only spent about 20-30% of my time writing code anyway before AI. It definitely makes debugging and research tasks much easier as well. I would confidently say my job as a senior dev has gotten a lot easier and less stressful as a result of these tools.
One other thing I have seen however is the 0x case, where you have given too much control to the llm, it codes both you and itself into pan’s labyrinth, and you end up having to take a weed wacker to the whole project or start from scratch.
That's why you dont use LLMs as a knowledge source without giving them tools.
"Agents use tools in a loop to achieve a goal."
If you don't give any tools, you get hallucinations and half-truths.
But you give one a tool to do, say, web searches and it's going to be a lot smarter. That's where 90% of the innovation with "AI" today is coming from. The raw models aren't gettin that much smarter anymore, but the scaffolding and frameworks around them are.
Tools are the main reason Claude Code is as good as it is compared to the competition.
> The raw models aren't gettin that much smarter anymore, but the scaffolding and frameworks around them are.
yes, that is my understanding as well, though it gets me thinking if that is true, then what real value is the llm on the server compared to doing that locally + tools?
You still can't beat an acre of specialized compute with any kind of home hardware. That's pretty much the power of cloud LLMs.
For a tool use loop local models are getting to "OK" levels, when they get to "pretty good", most of my own stuff can run locally, basically just coordinating tool calls.
Of course, step one is always critically think and evaluate for bad information. I think for research, I mainly use it for things that are testable/verifiable, for example I used it for a tricky proxy chain set up. I did try to use it to learn a language a few months ago which I think was counter productive for the reasons you mentioned.
How can you critically assess something in a field you're not already an expert on?
That Python you just got might look good, but could be rewritten from 50 lines to 5, it's written in 2010-style, it's not using modern libraries, it's not using modern syntax.
And it is 50 to 5. That is the scale we're talking about in a good 75% of AI produced code unless you challenge it constantly. Not using modern syntax to reduce boilerplate, over-guarding against impossible state, ridiculous amounts of error handling. It is basically a junior dev on steriods.
Most of the time you have no idea that most of that code is totally unnecessary unless you're already an expert in that language AND libraries it's using. And you're rarely an expert in both or you wouldn't even be asking as it would have been quicker to write the code than even write the prompt for the AI.
I use web search (DDG) and I don’t think I have ever try more than one queries in the vast majority of cases. Why because I know where the answer is, I’m using the search engine as an index to where I can find it. Like “csv python” to find that page in the doc.
One other thing I have seen however is the 0x case, where you have given too much control to the llm, it codes both you and itself into pan’s labyrinth, and you end up having to take a weed wacker to the whole project or start from scratch.