Instead of connecting language with physical existence, or entities, it's connecting tokens.
An LLM may be able to describe scenes in a video, but a model would tell you that said video is a deep fake because of some principle like conservation of energy and mass informed by experience, assumptions, inference rules, etc.