Sure, but the various machinima/demo file-formats of games are just application state-data formats, not video formats per se. The difference comes in what can decode them.
An application state-data format can only be decoded by the original application, because necessary context—in this case, the game engine that translates user input to game-state and then to displayed frames, and also the library of visual assets the game uses to render those frames—is in the application, rather than in the video.
A video format is self-contained, and usually not domain-specific. Many encoders and many decoders can be written to target a video format, and the decoders should not have to ship with an asset library (let alone a game engine) in order to properly render specific videos.
A format like I'm talking about—one that doesn't know anything about application state, but does understand that it's compositing and placing a set of embedded assets each frame, rather than only knowing about pixels/gradels—seems like something generically useful to me. (Heck, we're close to support for such a format already, since many video players already understand the idea of compositing arbitrary stuff with placement instructions on the screen each frame, care of support for the https://en.wikipedia.org/wiki/SubStation_Alpha subtitle format. That format is exactly the kind of "vector video" I'm talking about, except the only primitives it can position and style are text elements. Add RGBA-textured rectangles as another primitive type to it, and you'd get a video format!)
And yes, I'm basically talking about the visual equivalent of a https://en.wikipedia.org/wiki/Module_file (embedded samples/synth patches + sequencing information); or, if you prefer another analogy, "what Flash movies are if you exclude the ability to execute ActionScript."