The Chinchilla scaling law describes, apart from the training data size, the opt... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		cubefox on May 3, 2023 \| parent \| context \| favorite \| on: OpenLLaMA: An Open Reproduction of LLaMA The Chinchilla scaling law describes, apart from the training data size, the optimal number of parameters for a given amount of computing power for training. See https://dynomight.net/scaling/

sp332 on May 3, 2023 [–]

For training, yes, but these models are optimized for inference, since inference will be run many more times than training. The original Llama models were run way past chinchilla-optimal amounts of data.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact