One set of models run on 8GB VRAM / 16GB RAM and another set runs on 24GB VRAM / 64GB RAM. Both are very useful for easy and easy-to-moderate complex code, respectively.
The latest open, small models are incredibly useful even at smaller sizes when configured properly (quant size, sampling params, careful use of context etc).
Bit of a hype madhouse whenever a new model is released, but it's pretty easy to filter out simple hype from people showing reproducible experiments, specific configs for llama.cpp, github links etc.
This. And when possible, first asking the AI to add more granular logging around the code where the problem is - then re-run the code and feed the new log in a new context.
I've used this to debug some moderately complex bugs in golang and godot code and it works really well - the combo of having a new context with the (sometimes overly) granular debug logging and only the required, specific source code.
Keep it simple and run a fresh, new context for each prompt.
I use the pi-mono coding agent with several different new open models running locally.
The simpler and more precise the prompt the better it works. Some examples:
"Review all golang code files in this folder. Look for refactor opportunities that make the code simpler, shorter, easier to understand and easier to maintain, while not changing the logic, correctness or functionality of the code. Do not modify any code; only describe potential refactor changes."
After it lists a bunch of potential changes, it's then enough to write "Implement finding 4. XYZ" and sometimes add "Do not make any other changes" to keep the resulting agent actions focused.
What laptop has that much VRAM and RAM for $3500 with good/okay-ish Linux support? I was looking to upgrade my asus zephyrus g14 from 2021 and things were looking very expensive. Decided to just keep it chugging along for another year.
Then again, I was looking in the UK, maybe prices are extra inflated there.
A3B-35B is better suited for laptops with enough VRAM/RAM.
This dense model however will be bandwidth limited on most cards.
The 5090RTX mobile sits at 896GB/s, as opposed to the 1.8TB/s of the 5090 desktop and most mobile chips have way smaller bandwith than that, so speeds won't be incredible across the board like with Desktop computers.
Every other news for the past month has been about lacking capacity. Everyone is having scaling issues with more demand than they can cover. Anthropic has been struggling for a few months, especially visible when EU tz is still up and US east coast comes online. Everything grinds to a halt. MS has been pausing new subscriptions for gh Copilot, also because a lack of capacity. And yet people are still on bubble this, collapse that? I don't get it. Is it becoming a meme? Are people seriously seeing something I don't? For the past 3 years models have kept on improving, capabilities have gone from toy to actually working, and there's no sign of stopping. It's weird.
Both are possible; increasing demand and bubble collapse.
The way this could happen is if model commoditization increases - e.g. some AI labs keep publishing large open models that increasingly close the gap to the closed frontier models.
Also, if consumer hardware keep getting better and models get so good that most people can get most of their usage satisfied by smaller models running on their laptop, they won't pay a ton for large frontier models.
I’m going to stick my neck out a bit and predict that model commoditization will never happen as long as humans keep producing new content and innovation for models to train on. Sure, some open models will be good enough to write software against, but that’s but a fraction of the overall market for this technology.
I'm sure you were kidding, but seriously, the fact that AI-produced music pretty much all sounds the same is a good indicator that AI isn't particularly creative.
It’s not about creativity. The incentive to produce drops to zero when an LLM is just going to slurp it up and regurgitate it without some form of compensation (notoriety, money, whatever).
There's a massive amount of demand at the current price point, this does not exclude a bubble considering that the current cost to consumers is lower than what capacity expansion costs.
Though nowadays it feels like the bubble is going to end up being mainly an OpenAI issue. The others are at least vaguely trying to balance expansion with revenue, without counting on inventing a computer god.
Is the internet bigger or smaller than it was in 1998 compared to today?
Demand for internet and web services is significantly higher today than in 2000 but a bubble still popped. Heck a regular old recession or depression, completely unrelated to AI could happen next year and could collapse the industry. I mean housing is more expensive than ever nearly 20 years after collapsing in the Great Recession.
The problem that I have with dotcom comparisons is that people miss what popped and what remained after that bubble. Catsdotcom and Dogsdotcom popped. But the tech remained, and now we have FAANG++.
If we apply the same logic, any of oAI, xAI, Anthropic might pop, but realistically they won't, and even if they do, some other players will take their spots, and the tech will survive, and more importantly the demand will still be there. This cat isn't going back into the bag. People want this now. More than all the providers can give them. Today. The demand won't suddenly disappear now that "we got a hit" like someone put it recently.
In 2008 there was a subprime mortgage crisis that caused the housing market to crash. Nearly all banks who participated in this survived. There was and still is significant demand for houses, financed through mortgages.
The bubble can burst, most if not all the big players still survive 20 years later and yet significant value and capital can still be destroyed in the process.
Same for the dot com. There was demand for the internet, it couldn’t meet the expectations of the day, and yet here we are with like 100x more internet services than before all these years later. Saying the AI bubble will pop is not a prediction that all AI companies will cease to exist immediately. Amazon lost 80% of their stock price in 2000. Is Amazon bigger or smaller than they were in 2000 today?
Reminds me of when hedge funds started laying increasingly shorter fiber-optic cable lines to achieve the lowest possible latency for high-frequency trading.
TPU8t is for training. But even still, once you’ve trained, you need to run the model too. And these kinds of models already have a huge latency hit so there’s not much hurting running it away from the trading switches.
One set of models run on 8GB VRAM / 16GB RAM and another set runs on 24GB VRAM / 64GB RAM. Both are very useful for easy and easy-to-moderate complex code, respectively.
The latest open, small models are incredibly useful even at smaller sizes when configured properly (quant size, sampling params, careful use of context etc).