Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The Chinchilla scaling law describes, apart from the training data size, the optimal number of parameters for a given amount of computing power for training. See

https://dynomight.net/scaling/



For training, yes, but these models are optimized for inference, since inference will be run many more times than training. The original Llama models were run way past chinchilla-optimal amounts of data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: