The problem is not in the image models rather the training data and its context. "British museum" for MJ is the image source, "British museum" is the setting for Nano Banana.
It is licensed under Apache 2.0 license. The model is capable of generating 6-second videos at 720p resolution and 15 FPS based on prompt and image. Architecture is a 175M parameter VideoVAE and a 2.8B parameter VideoDiT model, which uses only 9.3 GB of GPU memory in BF16 mode with CPU offloading.