mildly unrelated: so when I ask GPT-4 a question, it is routed to an instance with about 166-194GB of memory?
> Further details on GPT-4's size and architecture have been leaked. The system is said to be based on eight models with 220 billion parameters each, for a total of about 1.76 trillion parameters, connected by a Mixture of Experts (MoE).
For a 7B parameter model using 4-8GB: Average = (4+8)/2 = 6GB Memory usage per parameter = 6/7 = ~0.857GB/B
For a 13B parameter model using 8-15GB: Average = (8+15)/2 = 11.5GB Memory usage per parameter = 11.5/13 = ~0.885GB/B
For a 30B parameter model using 13-33GB: Average = (13+33)/2 = 23GB Memory usage per parameter = 23/30 = ~0.767GB/B
For a 70B parameter model using 31-75GB: Average = (31+75)/2 = 53GB Memory usage per parameter = 53/70 = ~0.757GB/B
The average of these values is: (0.857 + 0.885 + 0.767 + 0.757)/4 = ~0.817 GB/B
Estimated memory usage = 220 * 0.817 = ~179.74GB
That's an interesting math. I don't think they are using 4 bits, or even 8. My bet would be with 16 bits. (Bear in mind that's just speculation, for "math's sake").
So we are talking about 4x your numbers per specialist model:
180GB * 4 = 720GB. If you count the greater context, let's say 750GB.
Anyone remember how many specialists they are supposedly using for each request?
If it's 2, we are talking about 1.5TB of processed weights for each generated token. With 4, it's 3TB/token.
At 0.06 for 1k tokens we get
3TB*1k/0.06 = 50 petabytes of processed data per dollar.
> Further details on GPT-4's size and architecture have been leaked. The system is said to be based on eight models with 220 billion parameters each, for a total of about 1.76 trillion parameters, connected by a Mixture of Experts (MoE).