To quote the hf page:
>Behind vision-first models in multimodal tasks: Mistral Large 3 can lag behind models optimized for vision tasks and use cases.
Of course models purely made for image stuff will completely wipe it out. The vision language models are useful for their generalist capabilities
To quote the hf page:
>Behind vision-first models in multimodal tasks: Mistral Large 3 can lag behind models optimized for vision tasks and use cases.