I believe you that it had to do with the selloff, but I believe that efficiency ...

segasaturn · on Jan 27, 2025

That still means that that AI firms don't have to buy as many of Nvidia's chips, which is the whole thing that Nvidia's price was predicated on. FB, Google and Microsoft just had their their billions of dollars in Nvidia GPU capex blown out by $5M side-project. Tech firms are probably not going to be as generous shelling out whatever overinflated price Nvidia was asking for as they were a week ago.

epicureanideal · on Jan 27, 2025

Although there’s the Jevon’s Paradox possibility that more efficient AI will drive even more demand for AI chips because more uses will be found for them. But possibly not super high end NVDA chips but instead little Apple iPhone AI cores or smartwatch AI cores, etc.

Although not all commodities will work like fossil fuels did in Jevon’s Paradox. It could be the case that demand for AI doesn’t grow fast enough to keep demand for chips as high as it was, as efficiency improves.

talldayo · on Jan 28, 2025

> But possibly not super high end NVDA chips but instead little Apple iPhone AI cores or smartwatch AI cores, etc.

We tried that, though. NPUs are in all sorts of hardware, and it is entirely wasted silicon for most users, most of the time. They don't do LLM inference, they don't generate images, and they don't train models. Too weak to work, too specialized to be useful.

Nvidia "wins" by comparison because they don't specialize their hardware. The GPU is the NPU, and it's power scales with the size of GPU you own. The capability of a 0.75w NPU is rendered useless by the scale, capability and efficiency of a cluster of 600w dGPU clusters.

alfalfasprout · on Jan 27, 2025

Wrong conclusion, IMO. This makes inference more cost effective which means self-hosting suddenly becomes more attractive to a wider share of the market.

GPUs will continue to be bought up as fast as fabs can spit them out.

segasaturn · on Jan 27, 2025

The number of people interested in doing self-hosting for AI at the moment is a tiny, tiny percentage of enthusiast computer users, who indeed get to play with self-hosted LLMs on consumer hardware now.. but the promise of these AI companies is that LLMs will be the "next internet", or even the "next electricity" according to Sam Altman, all of which will run exclusively on Nvidia chips running in mega-datacenters, the promise of which was priced into Nvidia's share price as of last Friday. That appears on shaky ground now.

alfalfasprout · on Jan 28, 2025

I'm not talking about enthusiastic computer users. To be frank, they're rather irrelevant here. I'm talking about companies.

flowerlad · on Jan 27, 2025

> That still means that that AI firms don't have to buy as many of Nvidia's chips

Couldn’t you say that about Blackwell as well? Blackwell is 25x more energy-efficient for generative AI tasks and offer up to 2.5x faster AI training performance overall.

Jlagreen · on Jan 28, 2025

And yet, Blackwell is sold out.

What does that tell us?

The industry is compute starved and that makes totally sense.

The tranformer model on which current LLMs are based on are 8 years old. But why took it so much time to get to the LLMs only 2 years ago?

Simple, Nvidia first had to push the compute at scale strongly. Try training GPT4 on Voltas from 2017. Good luck with that!

Current LLMs are possible thanks to the compute Nvidia has provided in the past decade. You could technically use 20 year old CPUs for LLMs but you might need to connect a billion of them.

uxhacker · on Jan 27, 2025

It means personal ai on every computer. No privacy concerns, but saying that it is quite weird coming from a Chinese start up :)

grogenaut · on Jan 28, 2025

It won't last long. Agents are where AI is going to go imho. That means giving the ai software access to the internet, and that means telemetry.

windsignaling · on Jan 28, 2025

Always hilarious to see westerners concerned about privacy when it comes to China, yet not concerned at all about their own governments that know far more about you. Do they think some Chinese policeman is going to come to their door? Never heard of Snowden or the five eyes?

Jlagreen · on Jan 28, 2025

The $5M was the cost of the training itself.

You can rent 10k H100 for 20 days with that money. Go and knock yourself out because that compute is probably higher than what DeepSeek received for that money. And that is public cloud pricing for single H100. I'm sure if you ask for 10k H100 you'll get them at half price so easily 40 days of training.

DeepSeek has fooled everyone by telling them that they need only so less money and people think that they only need to "buy" $5M worth of GPU but that's wrong. The money is the training costs of renting the GPU training hours.

Somebody had to install the 10k GPUs and that's paying $300M to Nvidia.

chgs · on Jan 27, 2025

Imagine what you can do with all that Nvidia hardware using the deep mind techniques.

jcgrillo · on Jan 27, 2025

They only got more useful if the AI goldrush participants actually strike, well, gold. Otherwise it's not useful at all. Afaict it remains to be seen whether any of this AI stuff has actual commercial value. It's all just speculation predicated on thoughts and prayers.

sumeno · on Jan 27, 2025

When your business is selling a large number of cards to giant companies you don't want them to be 20x more useful because then people will buy fewer of them to do the same amount of work

HDThoreaun · on Jan 28, 2025

or people do 30x more work and buy 50% more cards

amazingamazing · on Jan 27, 2025

each card is not 20x more useful lol. there's no evidence yet that the deepseek architecture would even yield a substantially (20x) more performant model with more compute.

if there's evidence to the contrary I'd love to see. in any case I don't think a h800 is even 20x better than a h100 anyway, so the 20x increase has to be wrong.

jdietrich · on Jan 27, 2025

We need GPUs for inference, not just training. The Jevons Paradox suggests that reducing the cost per token will increase the overall demand for inference.

Also, everything we know about LLMs points to an entirely predictable correlation between training compute and performance.

tshaddox · on Jan 27, 2025

Jevons paradox doesn't really suggest anything by itself. Jevons paradox is something that occurs in some instances of increased efficiency, but not all. I suppose the important question here is "What is the price elasticity of demand of inference?"

ckw · on Jan 28, 2025

Personally, in the six months prior to the release of the deepseekv3 api, I'd made probably 100-200 api calls per month to llm services. In the past week I made 2.8 million api calls to dsv3.

shawabawa3 · on Jan 28, 2025

can i ask what kind of api calls you're making to dsv3? Crunching through huge amounts of unstructured data or something?

ckw · on Jan 29, 2025

Processing each english (word, part-of-speech, sense) triple in various ways. Generating (very silly) example sentences for each triple in various styles. Generating 'difficulty' ratings for each triple. Two examples:

High difficulty:

        id = 37810
      word = dendroid
       pos = noun
     sense = (mathematics) A connected continuum that is arcwise connected and hereditarily unicoherent.
       elo = 2408.61936886416
 sentence2 = The dendroid, that arboreal structure of the Real, emerges not as a mere geometric curiosity but as the very topology of desire, its branches both infinite and indivisible, a map of the unconscious where every detour is already inscribed in the unicoherence of the subject's jouissance.

Low difficulty:

        id = 11910
      word = bed
       pos = noun
     sense = A flat, soft piece of furniture designed for resting or sleeping.
       elo = 447.32459484266
 sentence2 = The city outside my window never closed its eyes, but I did, sinking into the cold embrace of a bed that smelled faintly of whiskey and regret.

mrbungie · on Jan 28, 2025

People act like Jevons Paradox is an universal law thanks to Satya's tweet.

amazingamazing · on Jan 27, 2025

the jevons paradox isn't about any particular product or company's product, so is irrelevant here. the relevant resource here is compute, which is already a commodity. secondly, even if it were about GPUs in particular, there's no evidence that nvidia would be able to sustain such high margins if fewer were necessary for equivalent performance. things are currently supply constrained, which gives nvidia price optionality.

Scoundreller · on Jan 27, 2025

Uhhh, isn’t it about coal?

numba888 · on Jan 27, 2025

> there's no evidence yet that the deepseek architecture would even yield a substantially more performant model with more compute.

It's supposed to. There was an info that the longer length of 'thinking' makes o3 model better than o1. I.e. at least at inference compute power still matters.

amazingamazing · on Jan 27, 2025

> It's supposed to. There was an info that the longer length of 'thinking' makes o3 model better than o1. I.e. at least at inference compute power still matters.

compute matters, but performance doesn't scale with compute from what I've heard about o3 vs o1.

you shouldn't take my word for it - go on the leaderboards and look at the top models from now, and then the top models from 2023 and look at the compute involved for both. there's obviously a huge increase, but it isn't proportional