GPT-4 was a long time ago, and honestly mostly useless. But a lot of that progre...

GPT-4 was a long time ago, and honestly mostly useless. But a lot of that progress was already present in the intervening models, and it's easy to forget it happened when comparing GPT-5 to the state of the art a month ago rather than two years ago.

This is hard to quantify exactly since very few benchmarks have the kind of scales where comparing two deltas would be meaningful. But if we pick the Artifical Analysis composite score[0] as the baseline, GPT-3.5 Turbo was at 11, GPT-4 at 25, and GPT-5 at 69. It's just that most of the post-GPT-4 improvement was with o1 and o3.

Feels like a pretty fair statement.

[0] https://artificialanalysis.ai/#frontier-language-model-intel...