There are multiple variations of the model starting from 1.5B parameters. | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		elorant on Jan 27, 2025 \| parent \| context \| favorite \| on: Nvidia’s $589B DeepSeek rout There are multiple variations of the model starting from 1.5B parameters.

bufferoverflow on Jan 27, 2025 | [–]

Those are distillations of the model.

rsanek on Jan 27, 2025 | [–]

have you used those? in my experience even the 70B distillation is far worse than what you can expect from o1 / the R1 available on the web

elorant on Jan 27, 2025 | [–]

No, I haven't. I've used Perplexity's R1 but I don't know how many parameters it has. It's quite good, although too slow.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact