My summary: Anything which drives predictions of "the next token" must create a ...

toxik · on June 1, 2024

Crucially it will tend to find the simplest such representation that still solves the problem. This is why ultimately the model is only sufficient to solve problems that it was trained to solve.

Simplest in terms of optimization, by the way.

skyde · on June 1, 2024

What do you mean by simplest in term of optimization?

I get it find solution that are easy for SGD or Adam optimizer to find.

But why would such solution be less simple than other ?

Reubend · on June 2, 2024

(I could be wrong here. Please correct me if that's the case.)

I think the comment you're replying to means exactly what you're saying, which is that it will find solutions which are "easy" to find for the optimizer, and therefore solutions which are simple to achieve through the convergence of some optimizer.

ilrwbwrkhv · on June 1, 2024

Wonder if it can be applied to financial markets and make short term options trading profitable.