Given that the models will attempt to check their own work with almost the identical verification that a human engineer would, it's hard to say if human's aren't implicitly checking by relying on the shared verification methods (e.g. let me run the tests, let me try to run the application with specific arguments to test if the behavior works).
> Given that the models will attempt to check their own work with almost the identical verification that a human engineer would
That's not the case at all though. The LLM doesn't have a mental model of what the expected final result is, so how could it possibly verify that?
It has a description in text format of what the engineer thinks he wants. The text format is inherently limited and lossy and the engineer is unlikely to be perfect at expressing his expectations in any case.
Not yet, but the more you will insist, the more it will be. But what is your proposal for differentiating between just prompting without looking at code vs just using LLM to generate code?