Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To compare, running Ollama with Q4 Mixtral on my 24GB 3090 Ti locally (with 22.5GB of the 26GB model on the GPU) I get 14 tok/s generation, so the Hetzner server and the 4000 really aren't bad at all (notably my 3090 draws 70W during inference also).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: