That's an interesting math. I don't think they are using 4 bits, or even 8. My bet would be with 16 bits. (Bear in mind that's just speculation, for "math's sake").
So we are talking about 4x your numbers per specialist model:
180GB * 4 = 720GB. If you count the greater context, let's say 750GB.
Anyone remember how many specialists they are supposedly using for each request?
If it's 2, we are talking about 1.5TB of processed weights for each generated token. With 4, it's 3TB/token.
At 0.06 for 1k tokens we get
3TB*1k/0.06 = 50 petabytes of processed data per dollar.
So we are talking about 4x your numbers per specialist model:
180GB * 4 = 720GB. If you count the greater context, let's say 750GB.
Anyone remember how many specialists they are supposedly using for each request?
If it's 2, we are talking about 1.5TB of processed weights for each generated token. With 4, it's 3TB/token.
At 0.06 for 1k tokens we get
3TB*1k/0.06 = 50 petabytes of processed data per dollar.
Doesn't seems so expensive now.