Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The rule of thumb is that inference speed halves with every doubling of parameter size (and obviously a doubling of memory size).

You can check out real world performances on devices here: https://llm.mlc.ai/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: