Their moat is cuda and cuda libraries and everything built on top.
When a new architecture drops, it's always PyTorch running on CUDA, other PyTorch backends are best effort, even if they reach feature parity, many industry power users went closer to the metal to squeeze performance and that stuff is too specific to Nvidia stuff.
if there is something that will beat Nvidia, it won't be something reaching feature parity with slightly better economics (like AMD, also Nvidia could just reduce their margins), it needs to be a novel approach worth rewriting the codebase for (maybe Cerebras, maybe a new player).
> Their moat is cuda and cuda libraries and everything built on top
Sure, but to state the obvious that is only a factor for people using CUDA !
There are also whole segments of the AI market, like Google using TPUs, Amazon using Trainium chips where CUDA is irrelevant.
If the AI boom is really going to happen, then inference volume needs ramp up and dominate training costs, and the winners are going to be whoever can do inference the cheapest, which probably isn't going to be anyone paying the NVIDIA tax !
The benefit of CUDA is more for development, and the hyperscalers serving models that use CUDA APIs - bespoke business models. Anthropic currently support both CUDA and Trainium, and X.ai (who seem to be fizzling out) are CUDA, although there was some talk of Musk getting Samsung to make "AI chips" of some sort.
As far as AMD goes, I'm sure the developers at AMD's biggest sites - the exascale national labs - have a whole other level of support than consumers, and no doubt a toolset that works great for those fixed environments.
I don't understand why AMD can't offer a drop-in replacement for cuda which implements an identical API.
How much actual diversity is there among standard AI workloads? I would expect this is an 80/20 thing where 80% of the workload uses 20% of the features.
3 things, they can, there is a precedent for that with Google v. Oracle for Java, and they have something!
AMD engineered something called HIP which is CUDA API compatible libraries that targets AMD's hardware, it's the closest thing we have for drop-in replacement to Nvidia's software moat.
It works for simple stuff but loses terribly for frontier kernels (like Flash Attention 3), novel approaches (e.g. Mamba) or networking (e.g. NCCL), also they are rough on the edges, so what you gain from GPU costs is lost in engineering cost.
My previous company tried to compete in this GPU game while putting effort to have a good software stack (Rivos), drop in replacement and cheaper with decent software.
But that vision was rough, any new player had to implement the bad APIs due to backward compatibility concerns, following specs wasn't sufficient as a lot of the AI stack was depending on observable effects (Hyrum's Law), and Nvidia simply just had a long head start, the company is now dead (acquired by Meta) and AFAIK there isn't another player.
Best case scenario AMD puts more effort into their software stack but I just think they do not have enough internal talent to compete.
Training will continue to be an Nvidia's thing and that's where most of the money sits, unless suddenly the AI research scene pivots to using JAX but I do not see it coming any time soon, if anything, I've seen internal efforts at Google to make PyTorch work nicely with TPUs. Some players like Anthropic started using JAX for training but all the small players are using Nvidia, I'm guessing it has something to do with Nvidia partnering aggressively with startups.
I think AMD have essentially given up on the consumer / small scale GPU compute market, while being extremely successful selling their AI chips to much bigger customers. Some of the biggest supercomputers (clusters) in the world, such as the Lawrence Livermore and Oak Ridge exascale computers, are AMD Instinct based, but the tools and level of support they get is not going to be the same as someone at home trying to get ROCm running on their gaming card.
I wonder how big the market is for consumer/etc vs these massive installations?
> I don't understand why AMD can't offer a drop-in replacement for cuda which implements an identical API.
AMD, Apple and Intel all sell raster GPUs. Their GPU architecture is not optimized for general-purpose compute, and reorienting around that goal would create a "Fifteen Competing Standards" scenario pretty quickly. It's as much of a hardware issue as it is a software one, and none of these businesses like to cooperate (see: the last 15 years of Khronos drama).
In AMD's case, they don't see a need to sell consumer GPUs with a true CUDA analog since their datacenter product is architecturally distinct from their GPUs. Consumers come to AMD for cheap graphics performance, and adding additional hardware on top of the SMs would be a waste of money for many (or most) customers. This is why you see such a rift between CDNA and RDNA chips on compute workloads, and why it's unlikely that we'll see a CUDA-equivalent product out of AMD any time soon.
At some point there will be models that are ‘good enough’ and run on chinese chips, mobile processors, and run of the mill chips from Apple. Whether this is a one bit ternary model, innovations to limit the size of the context, or something else it is coming. The balance has already shifted to making these systems less
resource intensive which is a clear need based on the enormous data center cost.
I'm thinking that personally, technology is not bad in a vacuum and not necessarily bad in society, but it just reveals that our system is ill-equipped to guarantee good usage of it.
We could have fun defining what's good usage but we're so far from it, it would just make me sad.
Unclear if it's the only cause but wafer scale is great for very low latency, but loses to throughput per dollar compared to classic Nvidia like GPUs.
I don't think they can reduce the gap, SRAM is just more expensive than HBM and their architecture needs a lot of it.
So, the price makes it necessarily niche to some specific use-cases like HFT or intelligent duplex voice assistants, I'm still semi-bullish personally.
Obsolete because of what? Because with limited hardware you’re never aiming for state of the art, and for fine-tuning, you don’t steer for too long anyway.
I don't know why people mess with tesseract in 2026, attention-based OCRs (and more recently VLMs) outperformed any LSTM-based approach since at least 2020.
My guess is that it's the entry-point to OCR and the internet is flooded by that, just like pandas for data processing.
I was surprised to learn (from this article) that there are local models that can do this (not sure if there are any that run on hardware I actually have though, unlike Tesseract which works fine on the scanning hardware I set up for it ~5 years ago.) For privacy reasons, cloud-based OCR is a non-starter...
Quite, I threw a so-so photo of an old, long receipt at Qwen 3.5 0.8MB (runs in <2GB) and it nailed spitting 20+ items out in under a second. AI is good at many things, but picking modern dependencies not so much.
You drop the memory throughput requirements because of the packed representation of bits so an FMA can become the bottleneck, and you bypass the problem of needing to upscale the bits to whatever FP the FMA instruction needs.
typically for 1-bit matmul, you can get away with xors and pop_counts which should have a better throughput profile than FMA when taking into account the SIMD nature of the inputs/outputs.
It can probably be made more efficient by taking a column-first format.
Since we are in CPU land, we mostly deal with dot products that match the cache size, I don't assume we have a tiled matmul instruction which is unlikely to support this weird 1-bit format.
Haven't looked closely, but on modern x86 CPUs it might be possible to do much better with the gf2affineqb instructions, which let us do 8x8 bit matrix multiplications efficiently. Not sure how you'd handle the 2-bit part, of course.
Confusing, since this is specific to an architecture that no one making money will use (8B is consumer space, not enterprise).
The produced code shouldn't hold much interesting IP?
Don’t think that’s a fair interpretation of what I said.
Liquid money rich? No.
Can get pulled for big tech packages? Also no, for most of the employees.
AFAIK, big tech didn’t aggressively poach OpenAI-like talent, they did spend 10M+ pay packages but it was for a select few research scientists. Some folks left and came but it boiled down to culture mostly.
In my experience, tracking objective things like "nutrition" and "sleep hours" is immensely useful to reflect on what went wrong, and tracking subjective things like "mood" or "stress" is useless given hedonic adaptation or heavy swings that make problems obvious, and not need tracking.
What's key is be able to visualize metrics easily on the data and frictionless data entry, I've got a decent setup with iPhone Action + Obsidian + QuickAdd scripts on Obsidian Sync (mobile + laptop). for visualization I use Obsidian Bases and Obsidian notes that run Dataview code blocks and Chart.js, couldn't be happier.
I could track things that are not interesting to reflect on like vitamin D supplementation for accountability but I've never bothered, especially if it's taken ~daily.
Strongly agree with this. I’ve been using Apple’s “mood” log for about two years now, and it is extremely helpful for me to have a concrete view of the history of my general affect.
“This entire month I’ve been feeling good, I want to pinpoint why,” or “it’s clear since stressor X entered my life, my affect is lower; how can I resolve this?”
These long term trends are harder for me to track without data. It might be easy for others, but not me!
As someone with Schizoaffective Disorder Bipolar Type, if you are not diagnosed with a mood disorder, tracking "swings in you mood" when you have no clinical disorder seems like a disorder of its own.
I have had people tell me they were "manic". Then I showed them videos I took when I was manic and they see what I mean when I tell them they are not manic.
We have come to a place where we do not want even normal fluctuation in mood, and that is a illness of its own, but it is a cultural illness.
Maybe for some it's a lot more extreme than for others, but even if it's not so dramatic as to be categorized as a mental illness wouldn't you want to know if, say, there were a direct correlation between whether you went for your morning run and your mood later in the day?
Is this something that needs to be tracked to bring into your awareness? We have a memory storage device sitting on top of our spine. When I drink I feel drunk. Easy. If the change is noticeable you will notice it and remember it.
I am just trying to save you time and escape the cycle of "optimizations" which is where all this data logging leads.
Yes? As I wrote, the mere act of writing down your feelings forces you to acknowledge them and see patterns in them. Sometimes while writing something down I realise "wait, I've been through this before" or "every time this person is around, I feel this way". It helped me be more self-aware, for my own good.
It turns out that our memory storage device uses a very lossy form of compression. Memories get simplified and distorted over time. Heck, I can't even remember when something started hurting, so how should I notice a year-long pattern of thinking around a certain topic?
The language you use to describe this is fun, I make an app for self tracking called Reflect and would love your opinion of it, even if it doesnt suit your needs exactly.
When a new architecture drops, it's always PyTorch running on CUDA, other PyTorch backends are best effort, even if they reach feature parity, many industry power users went closer to the metal to squeeze performance and that stuff is too specific to Nvidia stuff.
if there is something that will beat Nvidia, it won't be something reaching feature parity with slightly better economics (like AMD, also Nvidia could just reduce their margins), it needs to be a novel approach worth rewriting the codebase for (maybe Cerebras, maybe a new player).
reply