I believe you that it had to do with the selloff, but I believe that efficiency improvements are good news for NVIDIA: each card just got 20x more useful
That still means that that AI firms don't have to buy as many of Nvidia's chips, which is the whole thing that Nvidia's price was predicated on. FB, Google and Microsoft just had their their billions of dollars in Nvidia GPU capex blown out by $5M side-project. Tech firms are probably not going to be as generous shelling out whatever overinflated price Nvidia was asking for as they were a week ago.
Although there’s the Jevon’s Paradox possibility that more efficient AI will drive even more demand for AI chips because more uses will be found for them. But possibly not super high end NVDA chips but instead little Apple iPhone AI cores or smartwatch AI cores, etc.
Although not all commodities will work like fossil fuels did in Jevon’s Paradox. It could be the case that demand for AI doesn’t grow fast enough to keep demand for chips as high as it was, as efficiency improves.
> But possibly not super high end NVDA chips but instead little Apple iPhone AI cores or smartwatch AI cores, etc.
We tried that, though. NPUs are in all sorts of hardware, and it is entirely wasted silicon for most users, most of the time. They don't do LLM inference, they don't generate images, and they don't train models. Too weak to work, too specialized to be useful.
Nvidia "wins" by comparison because they don't specialize their hardware. The GPU is the NPU, and it's power scales with the size of GPU you own. The capability of a 0.75w NPU is rendered useless by the scale, capability and efficiency of a cluster of 600w dGPU clusters.
Wrong conclusion, IMO. This makes inference more cost effective which means self-hosting suddenly becomes more attractive to a wider share of the market.
GPUs will continue to be bought up as fast as fabs can spit them out.
The number of people interested in doing self-hosting for AI at the moment is a tiny, tiny percentage of enthusiast computer users, who indeed get to play with self-hosted LLMs on consumer hardware now.. but the promise of these AI companies is that LLMs will be the "next internet", or even the "next electricity" according to Sam Altman, all of which will run exclusively on Nvidia chips running in mega-datacenters, the promise of which was priced into Nvidia's share price as of last Friday. That appears on shaky ground now.
> That still means that that AI firms don't have to buy as many of Nvidia's chips
Couldn’t you say that about Blackwell as well? Blackwell is 25x more energy-efficient for generative AI tasks and offer up to 2.5x faster AI training performance overall.
The industry is compute starved and that makes totally sense.
The tranformer model on which current LLMs are based on are 8 years old. But why took it so much time to get to the LLMs only 2 years ago?
Simple, Nvidia first had to push the compute at scale strongly. Try training GPT4 on Voltas from 2017. Good luck with that!
Current LLMs are possible thanks to the compute Nvidia has provided in the past decade. You could technically use 20 year old CPUs for LLMs but you might need to connect a billion of them.
Always hilarious to see westerners concerned about privacy when it comes to China, yet not concerned at all about their own governments that know far more about you. Do they think some Chinese policeman is going to come to their door? Never heard of Snowden or the five eyes?
You can rent 10k H100 for 20 days with that money. Go and knock yourself out because that compute is probably higher than what DeepSeek received for that money. And that is public cloud pricing for single H100. I'm sure if you ask for 10k H100 you'll get them at half price so easily 40 days of training.
DeepSeek has fooled everyone by telling them that they need only so less money and people think that they only need to "buy" $5M worth of GPU but that's wrong. The money is the training costs of renting the GPU training hours.
Somebody had to install the 10k GPUs and that's paying $300M to Nvidia.
They only got more useful if the AI goldrush participants actually strike, well, gold. Otherwise it's not useful at all. Afaict it remains to be seen whether any of this AI stuff has actual commercial value. It's all just speculation predicated on thoughts and prayers.
When your business is selling a large number of cards to giant companies you don't want them to be 20x more useful because then people will buy fewer of them to do the same amount of work
each card is not 20x more useful lol. there's no evidence yet that the deepseek architecture would even yield a substantially (20x) more performant model with more compute.
if there's evidence to the contrary I'd love to see. in any case I don't think a h800 is even 20x better than a h100 anyway, so the 20x increase has to be wrong.
We need GPUs for inference, not just training. The Jevons Paradox suggests that reducing the cost per token will increase the overall demand for inference.
Also, everything we know about LLMs points to an entirely predictable correlation between training compute and performance.
Jevons paradox doesn't really suggest anything by itself. Jevons paradox is something that occurs in some instances of increased efficiency, but not all. I suppose the important question here is "What is the price elasticity of demand of inference?"
Personally, in the six months prior to the release of the deepseekv3 api, I'd made probably 100-200 api calls per month to llm services. In the past week I made 2.8 million api calls to dsv3.
Processing each english (word, part-of-speech, sense) triple in various ways. Generating (very silly) example sentences for each triple in various styles. Generating 'difficulty' ratings for each triple. Two examples:
High difficulty:
id = 37810
word = dendroid
pos = noun
sense = (mathematics) A connected continuum that is arcwise connected and hereditarily unicoherent.
elo = 2408.61936886416
sentence2 = The dendroid, that arboreal structure of the Real, emerges not as a mere geometric curiosity but as the very topology of desire, its branches both infinite and indivisible, a map of the unconscious where every detour is already inscribed in the unicoherence of the subject's jouissance.
Low difficulty:
id = 11910
word = bed
pos = noun
sense = A flat, soft piece of furniture designed for resting or sleeping.
elo = 447.32459484266
sentence2 = The city outside my window never closed its eyes, but I did, sinking into the cold embrace of a bed that smelled faintly of whiskey and regret.
the jevons paradox isn't about any particular product or company's product, so is irrelevant here. the relevant resource here is compute, which is already a commodity. secondly, even if it were about GPUs in particular, there's no evidence that nvidia would be able to sustain such high margins if fewer were necessary for equivalent performance. things are currently supply constrained, which gives nvidia price optionality.
> there's no evidence yet that the deepseek architecture would even yield a substantially more performant model with more compute.
It's supposed to. There was an info that the longer length of 'thinking' makes o3 model better than o1. I.e. at least at inference compute power still matters.
> It's supposed to. There was an info that the longer length of 'thinking' makes o3 model better than o1. I.e. at least at inference compute power still matters.
compute matters, but performance doesn't scale with compute from what I've heard about o3 vs o1.
you shouldn't take my word for it - go on the leaderboards and look at the top models from now, and then the top models from 2023 and look at the compute involved for both. there's obviously a huge increase, but it isn't proportional