Thank you for your question! Because I'm not sure exactly what you're looking fo...

Thank you for your question!

Because I'm not sure exactly what you're looking for when you say 'compares to' -- whether accuracy, speed, or architecture -- I'll hit all 3, but sorry if it's a bit much.

1. Accuracy: For simple tasks (like sentiment analysis on straightforward examples), it won't be much more accurate than a classical linear classifier, if at all.

1a. Accuracy on more diverse or challenging tasks: Because a linear classifier is just so damned simplistic, it simply cannot handle anything even resembling a reasoning task. Meanwhile, (when specifically trained), this architecture managed to get 8/10 on textual entailment tasks, which are generally considered the sort of entry level gold standard for reasoning ability.

2. Speed: It's slower than a classical classifier...in light of the ~1B params it's pushing. They're both still pretty much blazing fast, but the tiny classical classifier will definitely be faster.

3. Architecture: Here's where it gets interesting.

The architecture of the core model here differs significant from a classical linear classifier:

Classical Classifier: Input: BGE embedding (in this hypothetical) Output: Class labels through softmax Internal Architecture: No nonlinearity, no hidden layers, direct projection

General Classifier: Input: BGE Embedding Output: Class labels through nearest neighbor cosine similarity search of vocabulary Internal architecture: An input projection sparse layer, a layer for combining the 3 inputs after their upwards projection, and 14 hidden layers with nonlinearity (GELU), layernorms, skip connections -- all of the standard stuff you'd expect in an LLM, but...not in an LLM.

I hope that clears up your questions! If not, I'm happy to tell you more.