Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Is LMDeploy the Ultimate Solution? Why It Outshines VLLM, TRT-LLM, TGI, and MLC
(
bentoml.com
)
16 points
by
helloericsf
on June 20, 2024
|
hide
|
past
|
favorite
|
8 comments
ssheng
on June 20, 2024
|
next
[–]
How does Exllama rank among these? Heard good things about it.
helloericsf
on June 20, 2024
|
parent
|
next
[–]
Seems interesting!
https://github.com/turboderp/exllama
"A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights."
helloericsf
on June 20, 2024
|
parent
|
prev
|
next
[–]
4-bit quantization tends to come at the cost of output quality losses.
https://github.com/ggerganov/llama.cpp/issues/9
ssheng
on June 20, 2024
|
root
|
parent
|
next
[–]
Quality loss with quantization is expected. It seems like with GPTQ the loss is within acceptable range based on the perplexity score shown.
ShawnBasquiat
on June 20, 2024
|
prev
|
next
[–]
Why aren't there more of these benchmark studies? How did TGI make the cut?
timliu9
on June 20, 2024
|
prev
[–]
Why was onnx not part of the tested runtimes? Seems like an oversight
helloericsf
on June 20, 2024
|
parent
|
next
[–]
Personally, I never seen onnx used for LLM.
chaoyu
on June 20, 2024
|
parent
|
prev
[–]
onnx is not a good option for LLM type of autoregressive generation
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: