Data *is* the source code here, though. Training code is effectively a build scr...

Data is the source code here, though. Training code is effectively a build script. Data that goes into training a model does not function like assets in videogames; you can't swap out the training dataset after release and get substantially the same thing. If anything, you can imagine the weights themselves are the asset - and even if the vendor is granting most users a license to copy and modify it (unlike with videogames), the asset itself isn't open source.

So, the only bit that's actually open-sourced in these models is the inference code. But that's a trivial part that people can procure equivalents of elsewhere or reproduce from published papers. In this sense, even if you think calling the models "open source" is correct, it doesn't really mean much, because the only parts that matter are not open sourced.