The "human in the loop at key checkpoints" pattern has been the most practically useful for us. We found that giving the agent full autonomy end-to-end produces subtly broken code that passes tests but violates implicit invariants you never thought to write down. Short loops with a human sanity check at decision forks catches that class of failure early.
The thing I keep wrestling with is where exactly to place those checkpoints. Too frequent and you've just built a slow pair programmer. Too infrequent and you're doing expensive archaeology to figure out where it went sideways. We've landed on "before any irreversible action" as a useful heuristic, but that requires the agent to have some model of what's irreversible, which is its own can of worms.
Has anyone found a principled way to communicate implicit codebase conventions to an agent beyond just dumping a CLAUDE.md or similar file? We've tried encoding constraints as linter rules but that only catches surface stuff, not architectural intent.
The checkpoint pattern you describe is exactly right. I've been dealing with this as well. Instead of vibe coding, it's vibe system engineering and I don't care for it. So I thought about it and came up with a framework to describe and reason about different pipelines. I based it on the types of LLM failures I was seeing in my own pipeline (omissions, incorrect, or inconsistent with existing stuff).
I wanted something I could use to objectively decide if one test (or gate, as I call them) is better than another, and how do they work as a holistic system.
My personal tool encodes a workflow that has stages and gates. The gates enforce handoff. Once I did this I went from ~73% first-pass approval to over 90% just by adding structured checks at stage boundaries.
The thing I keep wrestling with is where exactly to place those checkpoints. Too frequent and you've just built a slow pair programmer. Too infrequent and you're doing expensive archaeology to figure out where it went sideways. We've landed on "before any irreversible action" as a useful heuristic, but that requires the agent to have some model of what's irreversible, which is its own can of worms.
Has anyone found a principled way to communicate implicit codebase conventions to an agent beyond just dumping a CLAUDE.md or similar file? We've tried encoding constraints as linter rules but that only catches surface stuff, not architectural intent.