More

vibe42 · 2026-04-24T13:23:31 1777037011

I run both MoE and dense models on laptops.

One set of models run on 8GB VRAM / 16GB RAM and another set runs on 24GB VRAM / 64GB RAM. Both are very useful for easy and easy-to-moderate complex code, respectively.

The latest open, small models are incredibly useful even at smaller sizes when configured properly (quant size, sampling params, careful use of context etc).

vibe42 · 2026-04-24T02:16:03 1776996963

https://old.reddit.com/r/LocalLLaMA/

Bit of a hype madhouse whenever a new model is released, but it's pretty easy to filter out simple hype from people showing reproducible experiments, specific configs for llama.cpp, github links etc.

vibe42 · 2026-04-23T18:37:16 1776969436

Outside Trading.

vibe42 · 2026-04-22T19:57:19 1776887839

This. And when possible, first asking the AI to add more granular logging around the code where the problem is - then re-run the code and feed the new log in a new context.

I've used this to debug some moderately complex bugs in golang and godot code and it works really well - the combo of having a new context with the (sometimes overly) granular debug logging and only the required, specific source code.

vibe42 · 2026-04-22T19:54:17 1776887657

Keep it simple and run a fresh, new context for each prompt.

I use the pi-mono coding agent with several different new open models running locally.

The simpler and more precise the prompt the better it works. Some examples:

"Review all golang code files in this folder. Look for refactor opportunities that make the code simpler, shorter, easier to understand and easier to maintain, while not changing the logic, correctness or functionality of the code. Do not modify any code; only describe potential refactor changes."

After it lists a bunch of potential changes, it's then enough to write "Implement finding 4. XYZ" and sometimes add "Do not make any other changes" to keep the resulting agent actions focused.

vibe42 · 2026-04-22T19:40:33 1776886833

With the pi-mono coding agent (running local, open models) this works very well:

"Do not modify any code; only describe potential changes."

I often add it to the end when prompting to e.g. review code for potential optimizations or refactor changes.

vibe42 · 2026-04-22T16:04:33 1776873873

Q4-Q5 quants of this model runs well on gaming laptops with 24GB VRAM and 64GB RAM. Can get one of those for around $3,500.

Interesting pros/cons vs the new Macbook Pros depending on your prefs.

And Linux runs better than ever on such machines.

doix · 2026-04-22T16:07:38 1776874058

What laptop has that much VRAM and RAM for $3500 with good/okay-ish Linux support? I was looking to upgrade my asus zephyrus g14 from 2021 and things were looking very expensive. Decided to just keep it chugging along for another year.

Then again, I was looking in the UK, maybe prices are extra inflated there.

green7ea · 2026-04-22T18:54:41 1776884081

I got a HP g1a for about 3k€ with 64gb of ram when it came out

kroaton · 2026-04-22T16:09:32 1776874172

A3B-35B is better suited for laptops with enough VRAM/RAM. This dense model however will be bandwidth limited on most cards.

The 5090RTX mobile sits at 896GB/s, as opposed to the 1.8TB/s of the 5090 desktop and most mobile chips have way smaller bandwith than that, so speeds won't be incredible across the board like with Desktop computers.

jadbox · 2026-04-22T16:19:43 1776874783

I find A3B-35B as an ideal model for small local projects- definitely the best for me so far

vibe42 · 2026-04-22T13:15:01 1776863701

Their latest open models are pretty competitive with other open models, and some innovation around the smaller sizes (2-4 GB).

They're helping close to the distance to realistic quality inference on phones and other smaller devices.

ethbr1 · 2026-04-22T16:35:02 1776875702

> They're helping close to the distance to realistic quality inference on phones and other smaller devices.

If someone monopolized OS marketshare for mid- to low-priced devices, that does seem like it would be a useful research focus.

Whereas offering the same with compute-inefficiency cloud inference would be economically unviable at scale.

Free on-device Google premium closed-source models* = free Google Maps 2.0

* As long as you ship Google Apps and Play Services

vibe42 · 2026-04-22T13:12:47 1776863567

The pics of the cooling system is pretty good sci-fi / cyberpunk / steampunk inspo.

If the whole AI bubble spectularly collapes, at least we got a lot of cool pics of custom hardware!

NitpickLawyer · 2026-04-22T13:17:52 1776863872

> If the whole AI bubble spectularly collapes

Every other news for the past month has been about lacking capacity. Everyone is having scaling issues with more demand than they can cover. Anthropic has been struggling for a few months, especially visible when EU tz is still up and US east coast comes online. Everything grinds to a halt. MS has been pausing new subscriptions for gh Copilot, also because a lack of capacity. And yet people are still on bubble this, collapse that? I don't get it. Is it becoming a meme? Are people seriously seeing something I don't? For the past 3 years models have kept on improving, capabilities have gone from toy to actually working, and there's no sign of stopping. It's weird.

vibe42 · 2026-04-22T13:36:31 1776864991

Both are possible; increasing demand and bubble collapse.

The way this could happen is if model commoditization increases - e.g. some AI labs keep publishing large open models that increasingly close the gap to the closed frontier models.

Also, if consumer hardware keep getting better and models get so good that most people can get most of their usage satisfied by smaller models running on their laptop, they won't pay a ton for large frontier models.

otterley · 2026-04-22T20:29:38 1776889778

I’m going to stick my neck out a bit and predict that model commoditization will never happen as long as humans keep producing new content and innovation for models to train on. Sure, some open models will be good enough to write software against, but that’s but a fraction of the overall market for this technology.

byproxy · 2026-04-22T20:48:23 1776890903

> as long as humans keep producing new content and innovation

Well.. we won't have to as we'll have models to do it for us!

otterley · 2026-04-22T20:53:29 1776891209

I'm sure you were kidding, but seriously, the fact that AI-produced music pretty much all sounds the same is a good indicator that AI isn't particularly creative.

what · 2026-04-23T03:16:46 1776914206

It’s not about creativity. The incentive to produce drops to zero when an LLM is just going to slurp it up and regurgitate it without some form of compensation (notoriety, money, whatever).

hgoel · 2026-04-22T13:45:40 1776865540

There's a massive amount of demand at the current price point, this does not exclude a bubble considering that the current cost to consumers is lower than what capacity expansion costs.

Though nowadays it feels like the bubble is going to end up being mainly an OpenAI issue. The others are at least vaguely trying to balance expansion with revenue, without counting on inventing a computer god.

kemotep · 2026-04-22T16:04:01 1776873841

Is the internet bigger or smaller than it was in 1998 compared to today?

Demand for internet and web services is significantly higher today than in 2000 but a bubble still popped. Heck a regular old recession or depression, completely unrelated to AI could happen next year and could collapse the industry. I mean housing is more expensive than ever nearly 20 years after collapsing in the Great Recession.

NitpickLawyer · 2026-04-22T16:48:07 1776876487

The problem that I have with dotcom comparisons is that people miss what popped and what remained after that bubble. Catsdotcom and Dogsdotcom popped. But the tech remained, and now we have FAANG++.

If we apply the same logic, any of oAI, xAI, Anthropic might pop, but realistically they won't, and even if they do, some other players will take their spots, and the tech will survive, and more importantly the demand will still be there. This cat isn't going back into the bag. People want this now. More than all the providers can give them. Today. The demand won't suddenly disappear now that "we got a hit" like someone put it recently.

kemotep · 2026-04-22T19:34:10 1776886450

I feel like this fails to address my point.

In 2008 there was a subprime mortgage crisis that caused the housing market to crash. Nearly all banks who participated in this survived. There was and still is significant demand for houses, financed through mortgages.

The bubble can burst, most if not all the big players still survive 20 years later and yet significant value and capital can still be destroyed in the process.

Same for the dot com. There was demand for the internet, it couldn’t meet the expectations of the day, and yet here we are with like 100x more internet services than before all these years later. Saying the AI bubble will pop is not a prediction that all AI companies will cease to exist immediately. Amazon lost 80% of their stock price in 2000. Is Amazon bigger or smaller than they were in 2000 today?

jere · 2026-04-22T20:11:27 1776888687

Username checks out

vibe42 · 2026-04-22T13:11:29 1776863489

Training their own, closed, internal models on their own data sets? Probably a good way to squeeze out some market trading signals.

nickandbro · 2026-04-22T13:19:26 1776863966

Reminds me of when hedge funds started laying increasingly shorter fiber-optic cable lines to achieve the lowest possible latency for high-frequency trading.

written-beyond · 2026-04-22T13:31:09 1776864669

I thought these TPUs were primarily used for inference?

vlovich123 · 2026-04-22T13:59:03 1776866343

TPU8t is for training. But even still, once you’ve trained, you need to run the model too. And these kinds of models already have a huge latency hit so there’s not much hurting running it away from the trading switches.

knowaveragejoe · 2026-04-22T14:11:49 1776867109

As the article states, there's both training and inference dedicated chips.