I'm more interested in the technical details than the publicity. Pretty much any...

famouswaffles · 2025-04-09T14:24:54 1744208694

This video is very good. https://youtu.be/EzDsrEvdgNQ?si=EWp3U1GMkwg1bMQQ

One thing i'd add is that generating the tokens at the target resolution from the start is no longer the only approach to autoregressive image generation.

Rather than predicting each patch at the target resolution right away, it starts with the image (as patches) at a very small resolution and increasingly scales up. Paper here - https://arxiv.org/abs/2404.02905