It's because they're natively trained with 1 bit, so it's not losing anything. N... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		MarsIronPI 20 days ago \| parent \| context \| favorite \| on: Show HN: 1-Bit Bonsai, the First Commercially Viab... It's because they're natively trained with 1 bit, so it's not losing anything. Now, the question might be how they manage to get decent predictive performance with such little precision. That I don't know.

syntaxpr 20 days ago [–]

Not training. Transposing rows/columns of matrices to group 128 parameters with similar (shared) scale factor. Qwen-3 model.

MarsIronPI 19 days ago | [–]

I'm not sure what you mean. Could you please elaborate?

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact