The idea of giving it a task that may take six hours and reviewing it also gives me shivers.
I'm a very happy Codex customer, but everything turns to disgusting slop if I don't provide:
(1) Up-to-date AGENTS.md and an excellent prompt
(2) A full file-level API with function signatures, return types and function-level guidance if it's a complex one
(3) Multiple rounds of feedback until the result is finely sculpted
Overall it's very small units of work - one file or two, tops.
I've been letting the above standards go for the last couple of weeks due to crunch and looking at some of the hotspots of slop now lying around has me going all Homelander-face [1] at the sight of them.
Those hotspots are a few hundred lines in the worst cases; I'm definitely not ready to deal with the fallout of any unit of work that takes even more than 20min.
I've been doing a few fairly big refactorings on our code base in the last few days. It does a decent job and I generally don't put a lot of effort in my prompts.
It seems to pick a lot up from my code base. I do have an Agents.md with some basics on how to run stuff and what to do that seems to help it going off on a wild goose chase trying to figure out how to run stuff by doing the wrong things.
I think from first using codex around July to now has been quite a journey where it improved a lot. It actually seems to do well in larger code bases where it has a lot of existing structure and examples of how things are done in that code base. A lot of things it just does without me asking for them just because there's a lot of other code that does it that way.
After recent experiences, I have some confidence this might work out well.
I'm a very happy Codex customer, but everything turns to disgusting slop if I don't provide:
(1) Up-to-date AGENTS.md and an excellent prompt
(2) A full file-level API with function signatures, return types and function-level guidance if it's a complex one
(3) Multiple rounds of feedback until the result is finely sculpted
Overall it's very small units of work - one file or two, tops.
I've been letting the above standards go for the last couple of weeks due to crunch and looking at some of the hotspots of slop now lying around has me going all Homelander-face [1] at the sight of them.
Those hotspots are a few hundred lines in the worst cases; I'm definitely not ready to deal with the fallout of any unit of work that takes even more than 20min.
[1] https://i.kym-cdn.com/entries/icons/original/000/050/702/ab7...