For Intel CPUs, Phi-2 (2.7B) and TinyLlama (1.1B) run reasonably well using llam... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ethan_smith 6 months ago \| parent \| context \| favorite \| on: I want everything local – Building my offline AI w... For Intel CPUs, Phi-2 (2.7B) and TinyLlama (1.1B) run reasonably well using llama.cpp with 4-bit quantization. GGUF models with INT4 quantization typically need ~2GB RAM per billion parameters, so even older machines can handle smaller models.

akawry 6 months ago [–]

Take a look at ik_llama.cpp: https://github.com/ikawrakow/ik_llama.cpp

CPU performance is much better than mainline llama, as well as having more quantization types available

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact