I'm finding the latest models are pretty good at debugging, if you give them the...

I'm finding the latest models are pretty good at debugging, if you give them the tools to debug properly

If they can run a tool from the terminal, see all the output in text format, and have a clear 'success' criteria, then they're usually able to figure out the issue and fix it (often with spaghetti code patching, but it does at least fix the bug)

I think the testing/verification part is going to keep getting better, as we figure out better tools the AI can use here (ex, parsing the accessibility tree in a web UI to click around in it and verify)