To compare, running Ollama with Q4 Mixtral on my 24GB 3090 Ti locally (with 22.5...

To compare, running Ollama with Q4 Mixtral on my 24GB 3090 Ti locally (with 22.5GB of the 26GB model on the GPU) I get 14 tok/s generation, so the Hetzner server and the 4000 really aren't bad at all (notably my 3090 draws 70W during inference also).