Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

mildly unrelated: so when I ask GPT-4 a question, it is routed to an instance with about 166-194GB of memory?

> Further details on GPT-4's size and architecture have been leaked. The system is said to be based on eight models with 220 billion parameters each, for a total of about 1.76 trillion parameters, connected by a Mixture of Experts (MoE).

    For a 7B parameter model using 4-8GB: Average = (4+8)/2 = 6GB Memory usage per parameter = 6/7 = ~0.857GB/B
    
    For a 13B parameter model using 8-15GB: Average = (8+15)/2 = 11.5GB Memory usage per parameter = 11.5/13 = ~0.885GB/B
    
    For a 30B parameter model using 13-33GB: Average = (13+33)/2 = 23GB Memory usage per parameter = 23/30 = ~0.767GB/B
    
    For a 70B parameter model using 31-75GB: Average = (31+75)/2 = 53GB Memory usage per parameter = 53/70 = ~0.757GB/B

    The average of these values is: (0.857 + 0.885 + 0.767 + 0.757)/4 = ~0.817 GB/B

    Estimated memory usage = 220 * 0.817 = ~179.74GB


That's an interesting math. I don't think they are using 4 bits, or even 8. My bet would be with 16 bits. (Bear in mind that's just speculation, for "math's sake").

So we are talking about 4x your numbers per specialist model:

180GB * 4 = 720GB. If you count the greater context, let's say 750GB.

Anyone remember how many specialists they are supposedly using for each request?

If it's 2, we are talking about 1.5TB of processed weights for each generated token. With 4, it's 3TB/token.

At 0.06 for 1k tokens we get

3TB*1k/0.06 = 50 petabytes of processed data per dollar.

Doesn't seems so expensive now.


Probably. It's no secret that OpenAI has a ton of computing hardware.

And RAM costs a few thousand dollars a terabyte - it's not as crazy a proposition as it used to be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: