Sounds like what Llama.cpp used to be.

avhon1 · on July 23, 2023

I'm not sure what you mean by "used to be", the llama.cpp github repository was committed to just 4 hours ago.

This project cites llama.cpp as inspiration, but seems much-simplified. It only supports llama-2, only supports fp-32, and only runs on one CPU thread.

LoganDark · on July 23, 2023

> I'm not sure what you mean by "used to be", the llama.cpp github repository was committed to just 4 hours ago.

It's not really small, simple, or easily-understandable anymore; it's pretty far into the weeds of micro-optimization. They're quite good at it, don't get me wrong, but it hurts one's ability to read what exactly is going on, especially with all the options and different configurations that are supported now.

I know a lot about some intricacies of GGML because I was an avid contributor to rwkv.cpp for a few weeks, but I still don't understand llama.cpp. It's just on a completely different level.

enriquto · on July 23, 2023

The beauty of a vcs is that all previous versions are still there for everybody to study and enjoy. Including the glorious first commit of llama.cpp

LoganDark · on July 23, 2023

Yeah, this is something that is often forgotten, but I'm guilty of a few large refactors myself on rwkv.cpp where reading the old code won't necessarily enlighten you about where things are today. I'd be surprised if llama.cpp doesn't have any of these.