Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Re 3) Even if you don't care about long context length, Mamba is much cheaper per token of auto-regressive output. Each token has to only compute the next step of a linear RNN, the transformer has to attend back over all previous outputs, which rapidly grows in cost and memory.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: