If those smaller models are sufficient for your use cases, go for it. But for ho...

DebtDeflation · 2025-07-06T11:40:12 1751802012

It's not just about smaller models. I recently bought a Macbook M4 Max with 128GB RAM. You can run surprisingly large models locally with unified memory (albeit somewhat slowly). And now AMD has brought that capability to the X86 world with Strix. But I agree that how long Google, Meta, Alibaba, etc. will continue to release open weight models is a big question. It's obviously just a catch-up strategy aimed at the moats of OpenAI and Anthropic, once they catch up the incentive disappears.

msgodel · 2025-07-06T10:08:11 1751796491

Even Google and Facebook are releasing distills of their models (Gemma3 is very good, competitive with qwen3 if not better sometimes.)

There are a number of reasons to do this: You want local inference, you want attention from devs and potential users etc.

Also the smaller self hostable models are where most of the improvement happens these days. Eventually they'll catch up with where the big ones are today. At this point I honestly wouldn't worry too much about "gatekeepers."

brookst · 2025-07-06T12:29:45 1751804985

Pricing for commodities does not allow for “recouping costs”. All it takes is one company seeing models as a complementary good to their core product, worth losing money on, and nobody else can charge more.

I’d support an Apache for ML but I suspect it’s unnecessary. Look at all of the money companies spend developing Linux; it will likely be the same story.

tankenmate · 2025-07-06T09:51:34 1751795494

"Maybe we will see larger cooperatives, like a Apache Software Foundation for ML?"

I suspect the Linux Foundation might be a more likely source considering its backers and how much those backers have provided LF by way of resources. Whether that's aligned with LF's goals ...

ben_w · 2025-07-06T11:55:59 1751802959

> Open Source endeavors will have a hard time to bear the resources to train models that are competitive.

Perhaps, but see also SETI@home and similar @home/BOINC projects.

Gigachad · 2025-07-06T10:48:04 1751798884

Seems like you don’t have to train from scratch. You can just distil a new model off an existing one by just buying api credits to copy the model.

hatefulmoron · 2025-07-06T13:56:19 1751810179

"Just" is doing a lot of heavy lifting there. It definitely helps with getting data but actually training your model would be very capital intensive, ignoring the cost of paying for those outputs you're training on.

einrealist · 2025-07-06T11:56:16 1751802976

Your "API credits" don't buy the model. You just buy some resource to use the model that is running somewhere else.

threeducks · 2025-07-06T14:14:49 1751811289

What the parent poster means is that you can use the API to generate many question/answer pairs on which you then train your own model. For a more detailed explanation of this and other related methods, I can recommend this paper: https://arxiv.org/pdf/2402.13116

Drakim · 2025-07-06T13:41:16 1751809276

You don't understand what Gigachad is talking about. You can buy API credits to gain access to a model in the cloud, and then use that to train your own local model though a process called distilling.