If this cannot eliminate hallucinations or at least reduce them to be statistica...

energy123 · 2025-02-28T06:26:39 1740723999

I interpret this to mean we're in the ugly part of the old scaling law, where `ln(x)` for `x > $BIGNUMBER` starts to becoming punishing, not that the scaling law is in any way empirically refuted. Maybe someone can crunch the numbers and figure out if the benchmarks empirically validate the scaling law or not, relative to GPT-4o (assuming e.g. 200 million params vs 5T params).

ksynwa · 2025-02-28T06:49:48 1740725388

Why would they want to eliminate that? Altman said that hallucinations are how LLMs express creativity.

sudosysgen · 2025-02-28T06:36:30 1740724590

I mean the scaling laws were always logarithms, and logarithms become arbitrarily close to flat if you can't drive them with exponential growth, and even if you do it's barely linear. The scaling laws always predicted that model scaling would stop/slow being practical at some point.

anshumankmr · 2025-02-28T08:18:07 1740730687

Right but the quantum leap in capabilities that came from GPT2->GPT3->GPT3.5Turbo (which I personally felt didn't fare as well at coding as the former)->GPT4 won't be replicated anytime soon with the pure text/chat generation models.

sudosysgen · 2025-02-28T15:01:54 1740754914

Sure, that's also predicted by a logarithmic scaling law, you have extremely rapid growth until the inflection point.