To compare, running Ollama with Q4 Mixtral on my 24GB 3090 Ti locally (with 22.5GB of the 26GB model on the GPU) I get 14 tok/s generation, so the Hetzner server and the 4000 really aren't bad at all (notably my 3090 draws 70W during inference also).