I'm not sure what you mean by "used to be", the llama.cpp github repository was committed to just 4 hours ago.
This project cites llama.cpp as inspiration, but seems much-simplified. It only supports llama-2, only supports fp-32, and only runs on one CPU thread.
> I'm not sure what you mean by "used to be", the llama.cpp github repository was committed to just 4 hours ago.
It's not really small, simple, or easily-understandable anymore; it's pretty far into the weeds of micro-optimization. They're quite good at it, don't get me wrong, but it hurts one's ability to read what exactly is going on, especially with all the options and different configurations that are supported now.
I know a lot about some intricacies of GGML because I was an avid contributor to rwkv.cpp for a few weeks, but I still don't understand llama.cpp. It's just on a completely different level.
Yeah, this is something that is often forgotten, but I'm guilty of a few large refactors myself on rwkv.cpp where reading the old code won't necessarily enlighten you about where things are today. I'd be surprised if llama.cpp doesn't have any of these.