It’s absolutely not enough to “keep an eye on it on your phone”. You need to know that the implementation of the tests are real. LLMs routinely make shortcut in tests to make them green. There was an occasion when flat out mocked everything from the live code, and it was a very-very simple python REST API, tests of course were green.
This one right here: https://news.ycombinator.com/item?id=46384118
It’s absolutely not enough to “keep an eye on it on your phone”. You need to know that the implementation of the tests are real. LLMs routinely make shortcut in tests to make them green. There was an occasion when flat out mocked everything from the live code, and it was a very-very simple python REST API, tests of course were green.