I hate the delay in the code release due to company review policies... but hopefully it will be out this week during CVPR.
The initial version will live at https://github.com/google-research/magvit (not yet online as of 06/19), written in Jax/Flax. We are also going to release model weights trained on non-proprietary datasets, along with generated samples, so long as they're approved.
I'm also happy to help with any potential Pytorch reimplementations.
The text to video with a base image seems like a extremely powerful use case to me. Midjourney for the high fidelity "Will Smith eating spaghetti", and continue it with Magvit would yield much better results than the current state of the art.