More

zobzu · 2026-05-05T20:29:28 1778012968

gemma is also just way faster. i dont wanna wait 10min to get a 5-10% better answer (and sometimes, actually worse answer).

best is to use your own model router atm, depending on the task

SwellJoe · 2026-05-05T20:41:14 1778013674

I'm pretty sure Qwen is faster? The MoE version of Qwen is 3B active, while Gemma 4 is 4B active. Similarly, the dense Qwen is 27B while Gemma is 31B. All else being equal (though I know all else isn't equal), Qwen should be faster in both cases. I haven't actually measured with any precision, but on my AMD hardware (Strix Halo or dual Radeon Pro V620) they seem quite similar in both cases...both MoE models are fast enough for interactive use, both dense models are notably smarter but much slower, long time to first response and single-digit tokens per second once it starts talking.

vparseval · 2026-05-06T01:29:21 1778030961

qwen-3.6 is really interesting. The dense 27B model is pretty slow for me whereas the sparse 31B is blazingly fast but it also needs to be since it's so chatty. It produces pages and pages of stream of consciousness stuff. 27B does this to a lesser extent but slow enough that I can actually read it whereas 31B just blasts by.

I haven't yet compared either to Gemma 4. I tried that out the day after it came out with the patched llama.cpp that added support for it but I couldn't make tool calling work and so it was kind of useless. I should try again to see if things have changed but judging by what people say, qwen-3.6 seems stronger for coding anyway.

ctbellmar · 2026-05-06T14:36:20 1778078180

I had the same experience with 31B. Runs well on 4090 too!

Craighead · 2026-05-06T00:39:04 1778027944

I'm using both incessantly and having a great time.

ActorNightly · 2026-05-07T03:41:42 1778125302

Qwen without thinking is just as fast. I have 4 parameter settings based on recommendation. If you want a good coding problem, the thinking coding mode works well, but takes a while to arrive at an answer. If you want faster turn around time, instruction mode works without thinking.

zobzu · 2026-05-05T20:27:42 1778012862

flash is the fast (duh) model though. its not always beneficial to use pro. in practice: 1/ set to flash 3.1 ; 2/ force to pro...sometimes. mainly when the cli fails to predict what model to use.

note that it will sometimes fall back to flash 2, which sucks

mapontosevenths · 2026-05-05T21:35:34 1778016934

Flash will absolutely destroy a complex codebase. It's like a drunk junior programmer. Don't trust it with anything more complex than autocomplete.

Pro is expensive, but good. However they've decreased the pitiful stipend they used to include in even the ultra plan to the point were it's barely usable. I pivoted back to ChatGPT Pro after the recent downgrade they gave Ultra users. Googles Ultra plan cost 2.5x as much and delivers about half the usage.

chrisweekly · 2026-05-06T03:18:44 1778037524

Tangent: this is one of those situations where slang is harmful to understanding. When I saw "will absolutely destroy" my first interpretation was a positive connotation. Of course further context made it clear you were being straightforward, and this isn't aimed at you. Along these lines, "drop" has become a problematic term: "Acme co dropped support for Foo" means it's EOL, but "Foo dropped today" implies it just landed. Idioms are hard enough when they don't serve as borderline autoantonyms. To wrap up this extended digression, if anyone else finds this sort of thing interesting, and could use a good laugh, check out Ismo (a standup comic from Finland who makes truly hilarious observations about English as a second language).

https://youtu.be/oGmzfjuicE0?si=nL_W75s8UDp1g-zI

https://youtu.be/jXcMoHeWaYQ?si=QMi7nEwVWvCZyzbl

kridsdale1 · 2026-05-06T12:37:27 1778071047

I had the same experience.

sureMan6 · 2026-05-05T22:27:56 1778020076

Yeah I don't get the user who said Gemini is generous with the quota, I get more use out of codex with the 5 hour limits than Gemini gives me in a week

psychoslave · 2026-05-06T05:34:49 1778045689

> It's like a drunk junior programmer.

Thanks for the laugh. :)

zobzu · 2026-05-04T14:30:14 1777905014

climate change via hair drier ;D

zobzu · 2026-05-04T05:02:51 1777870971

even 900mhz sux vs 433. the lower the better it penetrates matter for the same amplitude.

lower than 430 you start to run into severe bandwidth issues though. and its not allowed to transmit lora/dss on 430 in the us without license hence the 900mhz

at 2.4ghz the real world usage is limited. might as well use wifi. the only advantage is short range bandwidh while keeping lora compat.

zobzu · 2026-05-04T04:55:24 1777870524

"TUI" is for people who cant learn text commands: looks pretty, easy to use, not flexible and not powerful. just use a GUI already.

zobzu · 2026-04-28T04:54:28 1777352068

even google doesnt only use TPUs.

danpalmer · 2026-04-28T05:23:45 1777353825

Google is in a different position to others in that they're the only frontier lab with a cloud infra business. It obviously makes sense to sell GPUs on cloud infra as people want to rent them. In that respect Google buys a ton of GPUs to rent out.

What's unclear to me is how much Google uses GPUs for their own stuff. Yes Gemini runs on GPUs now, so that Google can sell Gemini on-prem boxes (recent release announced last week), but is any training or inference for Gemini really happening on GPUs? This is unclear to me. I'd have guessed not given that I thought TPUs were much cheaper to operate, but maybe I'm wrong.

Caveat, I work at Google, but not on anything to do with this. I'm only going on what's in the press for this stuff.

johndough · 2026-04-28T13:05:54 1777381554

> Gemini on-prem boxes (recent release announced last week)

Do you have any more information on this? I only found this article about it: https://venturebeat.com/technology/googles-gemini-can-now-ru...

It mentions that Gemini can run on eight NVIDIA GPUs, but not which GPU and which Gemini model. Either way, this puts an upper bound of 288 * 8 = 2304 GB on the size of the Gemini model, which as far as I know has been a secret until now.

danpalmer · 2026-04-29T06:48:29 1777445309

I have no more info, but wouldn't be able to share if I did. The public info like that article is what I'm going off.

bartwr · 2026-04-28T12:29:08 1777379348

I have most likely outdated info, I left Google Research 4y ago. Back then, available TPU instances were plenty and GPU scarce. Nobody wanted to mess with an immature crashing compiler and very steep performance cliffs (performance was excellent only if you stayed within the guardrails, and being outside was supported and not even resulting in a warning - as it was so common in code). But I believe most of it has changed for the better for TPUs.

zobzu · 2026-04-24T14:37:20 1777041440

they all hope to make lot of money of of it. meshcore has a marketing team spamming reddit all day and a mao to make you believe people use it right now. then you connect to yhe mesh and you're utterly alone there. at least meshtastic has real users lol.

zobzu · 2026-04-24T14:34:48 1777041288

bold to think half the comments here arent from deepseek itself :)

I personally love the bit "us initiated tech war" lol. thats right, they started making AI its their fault! bad imperialist US !

yeah, v5 will do better

zobzu · 2026-04-22T02:29:26 1776824966

yeah.. but.. it cost more overall. i can kust buy a brand new laptop every 3y and its cheaper if i stick to other brands.

ThePowerOfFuet · 2026-04-22T20:21:13 1776889273

Yeah, man. Who gives a fuck about e-waste and the environment anyway?

zobzu · 2026-04-16T18:37:21 1776364641

i find it interesting that they advertise it as "trusted because european"