mildly unrelated: so when I ask GPT-4 a question, it is routed to an instance wi...

rodoxcasta · on July 24, 2023

That's an interesting math. I don't think they are using 4 bits, or even 8. My bet would be with 16 bits. (Bear in mind that's just speculation, for "math's sake").

So we are talking about 4x your numbers per specialist model:

180GB * 4 = 720GB. If you count the greater context, let's say 750GB.

Anyone remember how many specialists they are supposedly using for each request?

If it's 2, we are talking about 1.5TB of processed weights for each generated token. With 4, it's 3TB/token.

At 0.06 for 1k tokens we get

3TB*1k/0.06 = 50 petabytes of processed data per dollar.

Doesn't seems so expensive now.

immibis · on July 24, 2023

Probably. It's no secret that OpenAI has a ton of computing hardware.

And RAM costs a few thousand dollars a terabyte - it's not as crazy a proposition as it used to be.