LLMs work best for code when both (a) there's sufficient relevant training data aka we're not doing something particularly novel and (b) there's sufficient context from the current codebase to pick up expected patterns, the peculiarities of the domain models, etc.
Drop (a) and get comical hallucinations; drop (b) and quickly find that LLMs are deeply mediocre at top-level architectural and framework/library choices.
Perhaps there's also a (c) related to precision. You can write code to issue a SQL query and return JSON from an API endpoint in multiple just-fine ways. Misplace a pthread_mutex_lock, however, and you're in trouble. I certainly don't trust LLMs to get things like this right!
(It's worth mentioning that "novelty" is a tough concept in the context of LLM training data. For instance, maybe nobody has implemented a font rasterizer in Rust before, but plenty of people have written font rasterizers and plenty of others have written Rust; LLMs seem quite good at synthesizing the two.)
LLMs work best for code when both (a) there's sufficient relevant training data aka we're not doing something particularly novel and (b) there's sufficient context from the current codebase to pick up expected patterns, the peculiarities of the domain models, etc.
Drop (a) and get comical hallucinations; drop (b) and quickly find that LLMs are deeply mediocre at top-level architectural and framework/library choices.
Perhaps there's also a (c) related to precision. You can write code to issue a SQL query and return JSON from an API endpoint in multiple just-fine ways. Misplace a pthread_mutex_lock, however, and you're in trouble. I certainly don't trust LLMs to get things like this right!
(It's worth mentioning that "novelty" is a tough concept in the context of LLM training data. For instance, maybe nobody has implemented a font rasterizer in Rust before, but plenty of people have written font rasterizers and plenty of others have written Rust; LLMs seem quite good at synthesizing the two.)