"Paris, France is a city in North Carolina. It is the capital of North Carolina."
If only we had a technology that didn't hallucinate and reported "I don't know". Then small models would be far more useful. Part of the need for insanely huge LLM models is to get coverage so broad that they don't have to make up stuff.
It would be nice to be able to train a customer service bot on a laptop in a reasonable length of time. But it will screw up badly outside its area of competence, which will happen frequently.
I guess different small models will have different points/goals, but you can still have a small model with lots of training effort or a large model with little training effort.
I think the point of most (frontier) small models is usually to provide the best answer possible given small inference resources, rather than to reduce training time.
This is more of a toy model, so fun and an interesting project but it doesn't necessarily tell us what the art of the possible is for small models.
That’s the thing about language models. They model languages, not the human reasoning process. We haven’t yet gotten very far training computers in the latter. Even “deep thinking” modes are still variations on language models.
If only we had a technology that didn't hallucinate and reported "I don't know". Then small models would be far more useful. Part of the need for insanely huge LLM models is to get coverage so broad that they don't have to make up stuff.
It would be nice to be able to train a customer service bot on a laptop in a reasonable length of time. But it will screw up badly outside its area of competence, which will happen frequently.