Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Is LMDeploy the Ultimate Solution? Why It Outshines VLLM, TRT-LLM, TGI, and MLC (bentoml.com)
16 points by helloericsf on June 20, 2024 | hide | past | favorite | 8 comments


How does Exllama rank among these? Heard good things about it.


Seems interesting! https://github.com/turboderp/exllama "A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights."


4-bit quantization tends to come at the cost of output quality losses. https://github.com/ggerganov/llama.cpp/issues/9


Quality loss with quantization is expected. It seems like with GPTQ the loss is within acceptable range based on the perplexity score shown.


Why aren't there more of these benchmark studies? How did TGI make the cut?


Why was onnx not part of the tested runtimes? Seems like an oversight


Personally, I never seen onnx used for LLM.


onnx is not a good option for LLM type of autoregressive generation




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: