Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> They can't just slap more memory on the board

Why not? It doesn't have to be balanced. RAM is cheap. You would get an affordable card that can hold a large model and still do inference e.g. 4x faster than a CPU. The 128GB card doesn't have to do inference on a 128GB model as fast as a 16GB card does on a 16GB model, it can be slower than that and still faster than any cost-competitive alternative at that size.

The extra RAM also lets you do things like load a sparse mixture of experts model entirely into the GPU, which will perform well even on lower end GPUs with less bandwidth because you don't have to stream the whole model for each token, but you do need enough RAM for the whole model because you don't know ahead of time which parts you'll need.



To get 128GB of RAM on a GPU you'd need at least a 1024 bit bus. GDDR6x is 16Gbit 32 pins, so you'd need 64 GDDR6x chips, which good luck even trying to fit that around the GPU die since traces need to be the same length, and you want to keep them as short as possible. There's also a good chance you can't run a clamshell setup so you'd have to double the bus width to 2048 because 32 GDDR6x chips would kick off way too much heat to be cooled on the back of a GPU. Such a ridiculous setup would obviously be extremely expensive and would use way too much power.

A more sensible alternative would be going with HBM, except good luck getting any capacity for that since it's all being used for the extremely high margin data center GPUs. HBM is also extremely expensive both in terms of the cost of buying the chips and due to it's advanced packaging requirements.


You do not need a 1024-bit bus to put 128GB of some DDR variant on a GPU. You could do a 512-bit bus with dual rank memory. The 3090 had a 384-bit bus with dual rank memory and going to 512-bit from that is not much of a leap.

This assumes you use 32Gbit chips, which will likely be available in the near future. Interestingly, the GDDR7 specification allows for 64Gbit chips:

> the GDDR7 standard officially adds support for 64Gbit DRAM devices, twice the 32Gbit max capacity of GDDR6/GDDR6X

https://www.anandtech.com/show/21287/jedec-publishes-gddr7-s...


Yeah, the idea that you're limited by bus width is kind of silly. If you're using ordinary DDR5 then consider that desktops can handle 192GB of memory with a 128-bit memory bus, implying that you get 576GB with a 384-bit bus and 768GB at 512-bit. That's before you even consider using registered memory, which is "more expensive" but not that much more expensive.

And if you want to have some real fun, cause "registered GDDR" to be a thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: