I mean, in a sense yes the training set is gospel. But these systems are also (g...

I mean, in a sense yes the training set is gospel. But these systems are also (generally) tested against held out data.

When you have to model 100TB of images with 4gb of weights there is no way this is possible without learning some kind of patterns and regularity that generalise outside the training set. Most generated items will be novel, and most training items will not be reproducible.

It doesn’t seem radical to suggest that the copying issue will continue to recede as we get better models.

And there are areas of research specifically concerned with _provably_ showing that you cannot identify what items a model was trained was.

Lots of reasons to be optimistic in my view.