I genuinely wonder the use cases are where the required accuracy is so low (or I guess the prompts are so strong) that you don't need to vigorously use evals to prevent regressions with the model that works best--let alone actually just change models on the fly based on what's cheaper.
Yes and in addition for some reason that use case is also not a fit for some cheap OS model like qwen or kimi, but must be run on the cheapest model of the big three.