Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Crazy that there are now five and a half companies that all have roughly state of the art LLMs.

> We developed a new training technique which we refer to as MetaP that allows us to reliably set critical model hyper-parameters such as per-layer learning rates and initialization scales. We found that chosen hyper-parameters transfer well across different values of batch size, model width, depth, and training tokens.

This sounds interesting. Anyone have a link to the paper or other documentation on MetaP?



It's quite similar to muP

https://github.com/microsoft/mup




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: