Yaan LeCun @ Meta has been similarly skeptical that performance of LLMs and generative AI systems can dramatically improve without a change in approach.
My general thought (as a layman) is that there are currently thousands of really smart people looking for that new approach. The researchers in the field is saturated, with many who in other times would have been physicists or chemists or mathematicians now focusing on ML.
As long as there is a will from industry to pay, they will continue looking. And there are so many incredibly valuable low hanging fruit that are more engineering challenges currently, that I don't think overall funding for research in the field will dry up anytime soon.
My recollection of CLIP is that it’s more of a text-language co-embedding, where you have two transformers, one which encodes images into vectors and one which encodes captions into vectors. Through a contrastive loss (positive pairs are captioned image pairs, negative pairs are random image-caption combinations), embeddings of positive image-caption pairs are brought together (i.e., made similar) and embeddings of negative image-caption pairs are made more dissimilar.
This is quite a interesting finding. And if it holds it does suggest to me that we will see all sorts of excuses for the lack of further significant progress coming from AI companies in the near future.
No, because we never know what's around the corner. Even Moore's law didn't peak. As long as there is fierce competition, things will continue to move forward.
But this hasn't applied for self-driving cars, there has been huge amounts of money poured into it and very smart people working on the problem and yet the results have not really been up to the level.
Hardware products have a much harder time becoming more popular. They are often restricted to a few countries and require significant investment from customers, whereas software can be distributed overnight to a hundred different countries at once.
Hence, the economics of both are vastly different. It is much harder to reach the viral inflection point with hardware than software, even with great products.
Not in the field, but first transformers aren’t really new, they’ve been there for a decade now (only probably underused), but mostly second : i don’t think i’ve ever seen such an explosion of funding in a single tech since the internet bubble. That could probably drain the pool of low hanging fruits pretty fast..