If this cannot eliminate hallucinations or at least reduce them to be statistically unlikely to be happen, and I assume it has more params than GPT4's trillion parameters, that means the scaling law is dead isn't it?
I interpret this to mean we're in the ugly part of the old scaling law, where `ln(x)` for `x > $BIGNUMBER` starts to becoming punishing, not that the scaling law is in any way empirically refuted. Maybe someone can crunch the numbers and figure out if the benchmarks empirically validate the scaling law or not, relative to GPT-4o (assuming e.g. 200 million params vs 5T params).
I mean the scaling laws were always logarithms, and logarithms become arbitrarily close to flat if you can't drive them with exponential growth, and even if you do it's barely linear. The scaling laws always predicted that model scaling would stop/slow being practical at some point.
Right but the quantum leap in capabilities that came from GPT2->GPT3->GPT3.5Turbo (which I personally felt didn't fare as well at coding as the former)->GPT4 won't be replicated anytime soon with the pure text/chat generation models.