I've seen many a JPEG explainer, but this one wins for most aesthetic. The interactive visuals were also nice. My only criticism is the abrupt ending; should have concluded with the "now lets put it all together" slider.
I'm surprised the point/comment ratio is this skewed. There's so much meat in the post to chew on. I like your writing. This was one of those blogs where I can tell you spent a massive amount of time on the technical, but simplified it to layman's terms. I hope you keep putting out stuff :).
I have a couple questions:
1. I think this quote should be raising *many more* eyebrows.
> The astounding thing about Goliath wasn’t that is was a huge leap in performance, it was that the damn thing functioned at all. To this day, I still don’t understand why this didn’t raise more eyebrows.
You put a cat's brain into a dog's head and its still breathing! It didn't flatline immediately! Is yesterday's news? This seems like the biggest take away. Why isn't every <MODEL_PROVIDER> attempting LLM-surgery at this moment? Have you noticed any increasede discourse in this area?
2. You mentioned you spent the beginning of your career looking at brains in biotech. How did you end up in a basement of GPU's, working not in biotech, but still kind of looking at brains?
Cheers. I will go back though my other old projects (optogenetics, hacking Crispr/CAS9 etc), and put them on my blog.
On your questions:
1) A few other papers have been mentioned in the thread, like Solar10.7B. They duplicated the whole transformer stack, and it kinda helped. But as I found experimentally, that probably not a great idea. You are duplicating 'organs' (i.e. input processing stuff), that should only have one copy. Also, that paper didn't see immediate improvements; they had to do continued pre-training to see benefits. At that point, I'm guessing the big labs stopped bothering. Limited by hardware, I had to find unusual angles to approach this topic.
2) Nah, no more wetware for me. I did a half decade of research at a big neurobiology institute, and while it was very enjoyable, I can truly say that grant writing and paper review are 'not my thing'. This reason this info was delayed so long is that I wanted a paper in the AI field to go along with my papers in other fields. But as a Hobbyist with no official affiliation, and the attention span of a gnat, I gave up and started a blog instead. Maybe someone will cite it?
>You put a cat's brain into a dog's head and its still breathing! It didn't flatline immediately! Is yesterday's news?
i think it isn't surprising giving how for example kernels in the first layers in visual CNNs converge to Gabors which are also the neuron transfer functions in the first layers of cat, human, etc. visual cortexes, and that there is math proving that such kernels are optimal (at some reasonable conditions).
And so i'd expect that the layers inside LLM reach or come close to some optimality which is universal across brains and LLMs (main reasons for such optimality is energy (various L2 like metrics), information compression and entropy)
This doesn't match my own experience. I dream of the day the stuff I don't find interesting can get automated but again and again I find myself having to do things by hand.
I wonder if this is similar to Chess and Go getting 'solved'. Hard problem spaces that only the biggest brains could tackle. Maybe it turns out creating highly performant, distributed systems with a plethora of unittests is a cakewalk for LLMs, while trying to make a 'simple web app' for a niche microscopy application is like trying to drive around San Francisco.
> In practice, that means more logic fits in context, and sessions stretch longer before hitting limits. The AI maintains a broader view of your codebase throughout.
This is one of those 'intuitions' that I've also had. However, I haven't found any convincing evidence for or against it so far.
In a similar vein, this is why `reflex`[0] intrigues me. IMO their value prop is "LLM's love Python, so let's write entire apps in python". But again, I haven't seen any hard numbers.
I'm not diminishing the ethics debate, but it's crazy to me how easy it was for two non-technical rich dudes in a garage to build Clearview AI (And before vibe-coding!):
1. scrape billions of faces from the internet
2. `git clone` any off the shelf facial-recognition repo
Unfortunately we will see this kind of cases more and more with AI rise. I don't believe it is the only app that could do relevant labeled searching in faces etc.
Am I the only one that finds it amusing that conpanies like Google and Facebook sent Clearview legal letters complaining about scraping data from their sites?
I'm surprised at the lukewarm reception. Admittedly I don't follow the image-to-3D space as much, but last time I checked in, the gloopy fuzzy outputs did not impress me.
I want to highlight what I believe is the coolest innovation: their novel O-Voxel data structure. I'm still trying to wrap my head around how they figured out the conversion from voxel-space to mesh-space. Those two worlds don't work well together.
A 2D analogy is that they figured out an efficient, bidirectional, one-shot method of converting PNG's into SVG's, without iteration. Crazy.
@simonw's successful port of JustHTML from python to javascript proved that an agent iteration + an exhaustive test suite is a powerful combo [0].
I don't know if TLA+ is going to suddenly appear as 'the next language I want to learn' in Stackoverflow's 2026 Developer Survey, but I bet we're going to see a rise in testing frameworks/languages. Anything to make it easier for an agent to spit out tokens or write smaller tests for itself.
Not a perfect piece of evidence, but I'm really interested to see how successful Reflex[1] is in this upcoming space.
We also constantly move our heads and refocus our eyes. We can get a rough idea of depth from only a static stereo pair, but in reality we ingest vastly more information than that and constantly update our internal representation in real time.
> But that just describes basically everyone, none of us have no agency, but all of us are also caught up in larger systems we can't opt out of.
But isn't the drama between the billionaire heiress and her starving-artist lover more interesting than the lawyer girlfriend deciding whether she wants to marry her below-average-salary boyfriend?
reply