Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Break down sessions into separate clear, actionable tasks. Don't try to "draw the owl" in one mega session.

This is the key one I think. At one extreme you can tell an agent "write a for loop that iterates over the variable `numbers` and computes the sum" and they'll do this successfully, but the scope is so small there's not much point in using an LLM. On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.

A lot of successful LLM adoption for code is finding this sweet spot. Overly specific instructions don't make you feel productive, and overly broad instructions you end up redoing too much of the work.

 help



This is actually an aspect of using AI tools I really enjoy: Forming an educated intuition about what the tool is good at, and tastefully framing and scoping the tasks I give it to get better results.

It cognitively feels very similar to other classic programming activities, like modularization at any level from architecture to code units/functions, thoughtfully choosing how to lay out and chunk things. It's always been one of the things that make programming pleasurable for me, and some of that feeling returns when slicing up tasks for agents.


"Become better at intuiting the behavior of this non-deterministic black box oracle maintained by a third party" just isn't a strong professional development sell for me, personally. If the future of writing software is chasing what a model trainer has done with no ability to actually change that myself I don't think that's going to be interesting to nearly as many people.

It sounds like you're talking more about "vibe coding" i.e. just using LLMs without inspecting the output. That's neither what the article nor the people to whom you're replying are saying. You can (and should) heavily review and edit LLM generated code. You have the full ability to change it yourself, because the code is just there and can be edited!

And yet the comments are chock full of cargo-culting about different moods of the oracle and ways to get better output.

I think this is underrating the role of intuition in working effectively with deterministic but very complex software systems like operating systems and compilers. Determinism is a red herring.

Whether it's interesting or not is irrelevant to whether it produces usable output that could be economically valuable.

Yeah, still waiting for something to ship before I form a judgement on that

Claude Code is made with Anthropic's models and is very commercially successful.

Something besides AI tooling. This isn't Amway.

Since they started doing that it's gained a lot of bugs.

Should have used codex. (jk ofc)

I agree that framing and scoping tasks is becoming a real joy. The great thing about this strategy is there's a point at which you can scope something small enough that it's hard for the AI to get it wrong and it's easy enough for you as a human to comprehend what it's done and verify that it's correct.

I'm starting to think of projects now as a tree structure where the overall architecture of the system is the main trunk and from there you have the sub-modules, and eventually you get to implementations of functions and classes. The goal of the human in working with the coding agent is to have full editorial control of the main trunk and main sub-modules and delegate as much of the smaller branches as possible.

Sometimes you're still working out the higher-level architecture, too, and you can use the agent to prototype the smaller bits and pieces which will inform the decisions you make about how the higher-level stuff should operate.


[Edit: I may have been replying to another comment in my head as now I re-read it and I'm not sure I've said the same thing as you have. Oh well.]

I agree. This is how I see it too. It's more like a shortcut to an end result that's very similar (or much better) than I would've reached through typing it myself.

The other day I did realise that I'm using my experience to steer it away from bad decisions a lot more than I noticed. It feels like it does all the real work, but I have to remember it's my/our (decades of) experience writing code playing a part also.

I'm genuinely confused when people come in at this point and say that it's impossible to do this and produce good output and end results.


I feel the same, but, also, within like three years this might look very different. Maybe you'll give the full end-to-end goal upfront and it just polls you when it needs clarification or wants to suggest alternatives, and it self-manages cleanly self-delegating.

Or maybe something quite different but where these early era agentic tooling strategies still become either unneeded or even actively detrimental.


> it just polls you when it needs clarification

I think anyone who has worked on a serious software project would say, this means it would be polling you constantly.

Even if we posit that an LLM is equivalent to a human, humans constantly clarify requirements/architecture. IMO on both of those fronts the correct path often reveals itself over time, rather than being knowable from the start.

So in this scenario it seems like you'd be dealing with constant pings and need to really make sure you're understanding of the project is growing with the LLM's development efforts as well.

To me this seems like the best-case of the current technology, the models have been getting better and better at doing what you tell it in small chunks but you still need to be deciding what it should be doing. These chunks don't feel as though they're getting bigger unless you're willing to accept slop.


> Break down sessions into separate clear, actionable tasks.

What this misses, of course, is that you can just have the agent do this too. Agent's are great at making project plans, especially if you give them a template to follow.


It sounds to me like the goal there is to spell out everything you don't want the agent to make assumptions about. If you let the agent make the plan, it'll still make those assumptions for you.

If you've got a plan for the plan, what else could you possibly need!

You joke, but the more I iterate on a plan before any code, the more successful the first pass is.

1) Tell claude my idea with as much as I know, ask it to ask me questions. This could go on for a few rounds. (Opus)

2) Run a validate skill on the plan, reviewer with a different prompt (Opus)

3) codex reviews the plan, always finds a few small items after the above 2.

4) claude opus implements in 1 shot, usually 99% accurate, then I manually test.

If I stay on target with those steps I always have good outcomes, but it is time consuming.


I do something very similar. I have an "outside expert" script I tell my agent to use as the reviewer. It only bothers me when neither it OR the expert can figure out what the heck it is I actually wanted.

In my case I have Gemini CLI, so I tell Gemini to use the little python script called gatekeeper.py to validate it's plan before each phase with Qwen, Kimi, or (if nothing else is getting good results) ChatGPT 5.2 Thinking. Qwen & Kimi are via fireworks.ai so it's much cheaper than ChatGPT. The agent is not allowed to start work until one of the "experts" approves it via gatekeeper. Similarly it can't mark a phase as complete until the gatekeeper approves the code as bug free and up to standards and passes all unit tests & linting.

Lately Kimi is good enough, but when it's really stuck it will sometimes bother ChatGPT. Seldom does it get all the way to the bottom of the pile and need my input. Usually it's when my instructions turned out to be vague.

I also have it use those larger thinking models for "expert consultation" when it's spent more than 100 turns on any problem and hasn't made progress by it's own estimation.


I actually enjoy writing specifications. So much so that I made it a large part of my consulting work for a huge part of my career. SO it makes sense that working with Gen-AI that way is enjoyable for me.

The more detailed I am in breaking down chunks, the easier it is for me to verify and the more likely I am going to get output that isn't 30% wrong.


> On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.

Amusingly, this was my experience in giving Lovable a shot. The onboarding process was literally just setting me up for failure by asking me to describe the detailed app I was attempting to build.

Taking it piece by piece in Claude Code has been significantly more successful.


Exactly. The LLMs are quite good at "code inpainting", eg "give me the outline/constraints/rules and I'll fill-in the blanks"

But not so good at making (robust) new features out of the blue


so many times I catch myself asking a coding agent e.g “please print the output” and it will update the file with “print (output)”.

Maybe there’s something about not having to context switch between natural language and code just makes it _feel_ easier sometimes


And lately, the sweet spot has been moving upwards every 6-8 weeks with the model release cycle.

> the scope is so small there's not much point in using an LLM

Actually that's how I did most of my work last year. I was annoyed by existing tools so I made one that can be used interactively.

It has full context (I usually work on small codebases), and can make an arbitrary number of edits to an arbitrary number of files in a single LLM round trip.

For such "mechanical" changes, you can use the cheapest/fastest model available. This allows you to work interactively and stay in flow.

(In contrast to my previous obsession with the biggest, slowest, most expensive models! You actually want the dumbest one that can do the job.)

I call it "power coding", akin to power armor, or perhaps "coding at the speed of thought". I found that staying actively involved in this way (letting LLM only handle the function level) helped keep my mental model synchronized, whereas if I let it work independently, I'd have to spend more time catching up on what it had done.

I do use both approaches though, just depends on the project, task or mood!


Do you have the tool open sourced somewhere? I have been thinking of using something similar



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: