It's not complicated to explain. The model can handle 4000 tokens at once. So al...

dpflan · on Dec 29, 2022

Is there a "law of tokens" growth for LLMs, ala Moore's Law, but for LLM capabilities based upon token capacity?

visarga · on Dec 29, 2022

Complexity is quadratic in sequence length. For 512 tokens it is 262K, but for 4000 tokens it becomes 16M and goes OOM on a single GPU. We need about 100K-1M tokens to load whole books at once.

Since 2017 there have been hundreds of attempts to bring O(N^2) to O(N), but none of them replaced the vanilla attention yet in large models. They lose on accuracy. Maybe Flash attention has a shot (https://arxiv.org/abs/2205.14135).

Workaccount2 · on Dec 29, 2022

Sure, that is chatGPT in late 2022.

What about Open.ChatGPT in mid 2024?