When you say new mathematical theorems, they absolutely can. So can infinite monkeys on typewriters, though LLMs have a much better heuristic to arrive at valid trheorems.
The same applies to valid new programs.
The issue I have with this is pretending that the word "new" is sufficient justification for giving all the credit/attribution and subsequent reward (reputational, financial, etc.) to the person who wrote the prompt instead of distributing it to the people in the whole chain of work according to how much work and what quality of work they did.
How many man-hours did it take to create the training data? How many to create the LLM training algorithm and the electricity to run it? How many to write the prompts?
The most work by many, many orders of magnitude was put in by the first group. They often did it with altruistic goals in mind and released their work under permissive or copyleft licenses.
And now somebody found a way to monetize this effort without giving them anything in return. In fact, they will have to pay to access the LLMs which are based on their own work.
Copyright or plagiarism are perhaps the wrong terms to use when talking about it. I think copyright should absolutely apply but it was designed to protect creative works, not code in the first place.
Either way it's a form of industrialized exploitation and we should use all available tools to defend against it.
I don't necessarily disagree about the copyleft stuff.
Transformers do sometimes overfit to exact token sequences from training data, but that isn't really what they the architecture does in general.