1. It’s faster and cheaper to train a smaller model 2. Better than tokens is to ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ljlolel on July 24, 2023 \| parent \| context \| favorite \| on: Llama2.c: Inference llama 2 in one file of pure C 1. It’s faster and cheaper to train a smaller model 2. Better than tokens is to train on probability distributions (distillation) and trees of probability distributions

nickpsecurity on July 28, 2023 [–]

I've never seen anything about training on probability distributions or trees of them. Do you have articles with examples you could share with us?

I did try a quick search for it. Found some interesting papers. The links to them are below in case anyone finds them interesting.

https://arxiv.org/abs/2212.11481

https://towardsdatascience.com/a-new-way-to-predict-probabil...

https://arxiv.org/pdf/1912.07913.pdf

https://dukespace.lib.duke.edu/dspace/bitstream/handle/10161...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact