If I had to put a grade on my own experience and evals, Gemini 2.5 pro produces A- results and qwen2.5vl is maybe like B-/C+. Obviously everything's nondetermistic, so it's hard to guarantee a level of quality.
I'm reading through papers that suggest it should be possible to get SOTA performance on local models via distillation, and that's what I'll experiment with next.
I'm reading through papers that suggest it should be possible to get SOTA performance on local models via distillation, and that's what I'll experiment with next.