Try enjoying reading for purposes other than spending as few brain tokens as possible to acquire maximum info. It takes time to understand another persons perspective. Sophie’s Choice wouldn’t be as good a movie if you watched the 30-second TL;DR.
To each their own, I found it tedious and annoying. I quit reading maybe 1/4 of the way in. By then already I had loud alarms going off that I need to read the comments because I'm sure many of the points are easy for a real expert to debunk - too much feels off.
Well I found the text to be obviously inflated with AI, becoming much more verbose than necessary, even if syntactically, grammatically and structurally it was correct.
I would write that prose. It’s very powerful to use small sentences with small words to drive a point home. Like when you are in some drawn out argument about th future with your spouse and your child comes in the room. She says quietly, “please stop fighting I’m hungry”. How do you argue with that? You can’t, it’s just true.
That too. Honestly I am expecting that if AI is such the wonder-miracle that people act like it is that it should be able to spot complex back-doors that require multiple services that look benign when red teamed but when used in conjunction provide the lowest CPU ring access along with all the obfuscated undocumented CPU instructions and of course all the JTAG debugging functions of all the firmware.
My vote: AI induced psychosis via sycophantic assurances that the results are real. Plus a heap of Dunning-Kruger by allowing someone with just enough knowledge to be dangerous to get far enough to waste everyone's time.
LLM prose is very bland and smooth, in the same way that bland white factory bread is bland and smooth. It also typically uses a lot of words to convey very simple ideas, simply because the data is typically based on a small prompt that it tries to decompress. LLMs are capable of very good data transformation and good writing, but not when they are asked to write an article based on a single sentence.
That's true. I.e. it's not that they're not capable of doing better, it's just whoever's prompting them is typically too lazy to add an extra sentence or three (or a link) to steer it to a different region of the latent space. There's easily a couple dozen dimensions almost always left at their default values; it doesn't take much to alter them and nudge the model to sample from a more interesting subspace style-wise.
(Still, it makes sense to do it as a post-processing style transfer space, as verbosity is a feature while the model is still processing the "main" request - each token produced is a unit of computation; the more terse the answer, the dumber it gets (these days it's somewhat mitigated by "thinking" and agentic loops)).
The real improvement will be when the software engineers get into the training loop. Then we can have MoE that use cache-friendly expert utilisation and maybe even learned prefetching for what the next experts will be.
> maybe even learned prefetching for what the next experts will be
Experts are predicted by layer and the individual layer reads are quite small, so this is not really feasible. There's just not enough information to guide a prefetch.
For CPU with bigger K you would put the centroids in a search tree, so take advantage of the sparsity, while a GPU would calculate the full NxK distance matrix. So from my understanding the bottleneck they are fixing doesn't show up on CPU.
Most data I've used is for geospatial with D<=4 (xyzt) so for me search trees worked great. But for things like descriptor or embedding clustering yes, trees wouldn't be useful.
reply