Yes, but even that can still be run (slowly) on cpu-only systems down to about 32gb. Memory virtualization is a thing. If you get used to using it like email rather than chat, it’s still super useful even if you are waiting 1/2 hour for your reply. Presumably you have a fast distill on tap for interactive stuff.
I run my models in an agentic framework with fast models that can ask slower models or APIs when needed. It works perfectly, 60 percent of the time lol.
I run my models in an agentic framework with fast models that can ask slower models or APIs when needed. It works perfectly, 60 percent of the time lol.