Interesting, the pacing seemed very slow when conversing in english, but when I spoke to it in spanish, it sounded much faster. It's really impressive that these models are going to be able to do real time translation and much more.
The Chinese are going to end up owning the AI market if the American labs don't start competing on open weights. Americans may end up in a situation where they have some $1000-2000 device at home with an open Chinese model running on it, if they care about privacy or owning their data. What a turn of events!
sitting here in the US, reading that China is strongly urging the adoption of Linux and pushing for open CPU architectures like RISC-V and also self-hosted open models
It is in their selfish interest to push for open weights.
That's not to say they are being selfish, or to judge in any way the morality of their actions. But because of that incentive, you can't logically infer moral agency in their decision to release open-weights, IP-free CPUs, etc.
Leaving China aside, it's arguably immoral that our leading AI models are closed and concentrated in the hands of billionaires with questionable ethical histories (at best).
I mean China's push for open weights/source/architecture probably has more to do with them wanting legal access to markets than it does with those things being morally superior.
Of course, but that translates in a benefit for most people, even for Americans. In my case (European), I cannot but support the Chinese companies in this respect, as we would be especially in trouble if the common models are the norm.
Depends on whether you want to be in it. A ladder might be enough to peek over the top and rip it off. Do it better. Which seems to be what is happening.
That only works for China's domestic market. As long as the IP they are "taking inspiration from" is protected in the target markets, they effectively lock themselves out by doing that.
In the case of technology like RISC, pretty much all the value add is unprotected, so they can sell those products in the US/EU without issue.
This is exactly what I do. I have two 3090s at home, with Qwen3 on it. This is tied into my Home Assistant install, and I use esp32 devices as voice satellites. It works shockingly well.
I run Home Assistant on an RPi4 and have an ESP32-based Core2 with mic (https://shop.m5stack.com/products/m5stack-core2-esp32-iot-de...), along with a 16GB 4070 Ti Super in an always-on Windows system I only use for occasional gaming and serving media. I'd love to set up something like you have. Can you recommended a starting place, or ideally, a step-by-step tutorial?
I've never set up any AI system. Would you say setting up such a self-hosted AI is at a point now where an AI novice can get an AI system installed and integrated with an existing Home Assistant install in a couple hours?
I mean - the AI itself will help you get all that setup.
Claude code is your friend.
I run proxmox on an old Dell R710 in my closet that hosts my homeassistant (amongst others) VM and then I've setup my "gaming" PC (which hasn't done any gaming in quite some time) to dual boot (Windows or Deb/Proxmox) and just keep it booted into Deb as another proxmox node. That PC also has a 4070 Super that I have setup to passthru to a VM and on that VM I've got various services utilizing the GPU. This includes some that are utilized by my hetzner bare metal servers for things like image/text embeddings as well as local LLM use (though, rather minimal due to VRAM constraints) and some image/video object detection stuff with my security cameras (slowly working on a remote water gun turret to keep the racoons from trying to eat the kittens that stray cats keep having in my driveway/workshop).
Install claude code (or, opencode, it's also good) - use Opus (get the max plan) and give it a directory that it can use as it's working directory (don't open it in ~/Documents and just start doing things) and prompt it with something as simple as this:
"I have an existing home assistant setup at home and I'd like to determine what sort of self-hosted AI I could setup and integrate with that home assistant install - can you help me get started? Please also maintain some notes in .md files in this working directory with those note files named and organized as you see appropriate so that we can share relevant context and information with future sessions. (example: Hardware information, local urls, network layout, etc) If you're unsure of something, ask me questions. Do not perform any destructive actions without first confirming with me."
Plan mode. _ALWAYS_ use plan mode to get the task setup, if there's something about the plan you don't like, say no and give it notes - it will return with a new plan. Eventually agree to the plan when it's right - then work through that plan not in plan mode, but if it gets off the plan, get back in plan mode to get the/a plan set and then again let it go and just steer it in regular mode.
I dont have max plan, but on the Pro i tried for a month, i was able to blow trough my 5 hour limit by a single prompt (with 70k context codebase attached). The idea of paying so much money to get few questions per "workday" seems insane to me
I just wanted to touch on this despite being days later in hopes you see this - I've seen this sort of feedback about the Pro plan quite a bit. I skipped it and went for max so I don't have any experience with it but I can tell you that I've _never hit my/any usage limit_ with the max plan.
Like, I don't know if my account is broken or everyone else just uses things differently. I use claude code, I have it hard-stuck to Opus 4.1 - I don't even touch Sonnet. I _abuse_ the context - I used to /compact early or /clear often depending on the task... but these days (Opus seems much better with nearly full context than Sonnet was) if I'm still on the same task/group of tasks or I think that the current context would be useful for the next thing/task/step I don't even /compact anymore. I've found that if I just run it right up to full and let it auto /compact it does a _really_ good job picking up where it left off. (Which wasn't always the case) Point being - I'm exclusively using Opus 4.1 while also constantly cycling through and maxing out context only to restart with /compact'd context so it's not even starting empty and just keep going.
Hours a day like this. Never hit a limit. (I've said elsewhere that I do believe the general time I work, which is late evening and early morning in north america, does have something to do with this but I don't actually know)
That's great to hear. I was mostly impressed with Qwen3 coder on my 4090, but am hobbled by the small memory footprint of the single card. What motherboard are you using with your 3090s? Like the others, I too am curious about those esp32s and what software you run on them.
Keep up the good hacking - it's been fun to play with this stuff!
I actually am not using the 3090s as one unit. I have Qwen3-30B-A3B as my primary model and it fits on a single GPU, then I have all the TTS/STT on the other GPU.
For the physical hardware I use the esp32-s3-box[1]. The esphome[2] suite has firmware you can flash to make the device work with HomeAssistant automatically. I have an esphome profile[3] I use, but I'm considering switching to this[4] profile instead.
For the actual AI, I basically set up three docker containers: one for speech to text[5], one for text to speech[6], and then ollama[7] for the actual AI. After that it's just a matter of pointing HomeAssistant at the various services, as it has built in support for all of these things.
I assume it's very similar to what Home Assistant's backing commercial entity Nabu Casa sells with the "Home Assistant Voice PE" device, which is also esp32-based. The code is open and uses the esphome framework so it's fairly easy to recreate on custom HW you have laying around.
He is referring the M5 Atom's I believe. I strongly recommend the ESP32 S3 box now, you can fire up Bobbas special firmware for it, search on Github, and its a blast with Home Assistant.
When has the average American ever been willing to spend a $1,000-2,000 premium for privacy-respecting tech? They already save $20-200 to buy IoT cameras which provide all audio and video from inside their home directly to the government without a warrant (Ring vs Reolink/etc).
To be fair, it isn't $1000-2000 extra, it's the new laptop/pc you just bought that is powerful enough (now, or in the near future) to run these open weight models.
Wiredpancake got flagged to death but they’re right. MacWhisper provides a great example of good value for dead-simple user-friendly on-device processing.
You mean like a home with a yard large enough to keep the neighbors out of sight?
Granted, based on how annoyingly chill we are with advertisements and government surveillance, I suppose this desire for privacy never extended beyond the neighbors.
> Americans may end up in a situation where they have some $1000-2000 device at home with an open Chinese model running on it, if they care about privacy or owning their data.
I think HN vastly overestimates the market for something like this. Yes, there are some people who would spend $2,000 to avoid having prompts go to any cloud service.
However, most people don’t care. Paying $20 per month for a ChatGPT subscription is a bargain and they automatically get access to new versions as they come.
I think the at-home self hosting hobby is interesting, but it’s never going to be a mainstream thing.
There is going to be a big market for private AI appliances, in my estimation at least.
Case in point: I give Gmail OAuth access to nobody. I nearly got burned once and I really don’t want my entire domain nuked. But I want to be able to have an LLM do things only LLMs can do with my email.
“Find all emails with ‘autopay’ in the subject from my utility company for the past 12 months, then compare it to the prior year’s data.” GPT-OSS-20b tried its best but got the math obviously wrong. Qwen happily made the tool calls and spat out an accurate report, and even offered to make a CSV for me.
Surely if you can’t trust npm packages or MS to not hand out god tokens to any who asks nicely, you shouldn’t trust a random MCP server with your credentials or your model. So I had Kilocode build my own. For that use case, local models just don’t quite cut it. I loaded $10 into OpenRouter, told it what I wanted, and selected GPT5 because it’s half off this week. 45 minutes, $0.78, and a few manual interventions later I had a working Gmail MCP that is my very own. It gave me some great instructions on how to configure an OAuth app in GCP, and I was able to get it running queries within minutes from my local models.
There is a consumer play for a ~$2499-$5000 box that can run your personal staff of agents on the horizon. We need about one more generation of models and another generation of low-mid inference hardware to make it commercially feasible to turn a profit. It would need to pay for itself easily in the lives of its adopters. Then the mass market could open up. A more obvious path goes through SMBs who care about control and data sovereignty.
If you’re curious, my power bill is up YoY, but there was a rate hike, definitely not my 4090;).
Totally agree on the consumer and SMB play (which is why we're stealthily working on it :). I'm curious what capabilities the next generation of models (and HW) will provide that doesn't exist now. Considering Ryzen 395 / Digits / etc can achieve 40-50+ T/s on capable mid-size models (e.g., OSS120B/Qwen-Next/GLM Air) with some headroom for STT and a lean TTS, I think now is the time to enter but seems to me the 2 key things that are lacking are 1) reliable low-latency multi-modal streaming voice frameworks for STT+STT and 2) reliable fast and secure UI Computer use (without relying on optional accessibility tags/meta).
My greatest concern for local AI solutions like this is the centrality of email and the obvious security concerns surrounding email auth.
Depends on the setup, but programmatic access to a Gmail account that's used for admin purposes would allow for hijacking via key/password exfiltration of anything in the mailbox, sending unattended approvals, and autonomous conversations with third parties that aren't on the lookout for impersonation. In the average case, the address book would probably get scraped and the account would be used to blast spam to the rest of the internet.
Moving further, if the OAuth Token confers access to the rest of a user's Google suite, any information in Drive can be compromised. If the token has broader access to a Google Workspace account, there's room for inspecting, modifying, and destroying important information belonging to multiple users. If it's got admin privileges, a third party can start making changes to the org's configuration at large, sending spam from the domain to tank its reputation while earning a quick buck, or engage in phishing on internal users.
The next step would be racking up bills in Google's Cloud, but that's hopefully locked behind a different token. All the same, a bit of lateral movement goes a long way ;)
I agree the market is niche atm, but I can't help but disagree with your outlook long term. Self hosted models don't have the problems ChatGPT subscribers are facing with models seemingly performing worse over time, they don't need to worry about usage quotas, they don't need to worry about getting locked out of their services, etc.
All of these things have a dark side, though; but it's likely unnecessary for me to elaborate on that.
Given that $2000 might only buy you about 10 date nights with dinner and drinks, the value proposition might actually be pretty good if posterity is not a feature requirement.
The sales case for having LLMs at the edge is to run inference everywhere on everything. Video games won't go to the cloud for every AI call, but they will use on-device models that will run on the next iteration of hardware.
The US is probably ahead but they're so obsessed with moats, IP and safety that their lagginess is self imposed.
China has nothing to lose and everything to gain by releasing stuff openly.
Once China figures put how to make high performance FPGA chips really cheap, its game over for the US. The only power the US has is over GPU supply...and even then its pretty weak.
Not to mention NVIDIA crippling its own country with low VRAM cards. China is taking older cards, stripping the RAM and upgrading other older cards.
Americans may end up in a situation where they have some $1000-2000 device at home with an open Chinese model running on it
Wouldn't worry about that, I'm pretty sure the government is going to ban running Chinese tech in this space sooner or later. And we won't even be able to download it.
Not saying any of the bans will make any kind of sense, but I'm pretty sure they're gonna say this is a "strategic" space. And everything else will follow from there.
When DeepSeek first hit the news, an American senator proposed adding it to ITAR so they could send people to prison for using it. Didn't pass, thankfully.
For criminal concerns regarding retroactive ITAR additions, yes. However, significant civil financial penalties if congress so wished could still be constitutional as the ex post facto clause has been held to apply exclusively to criminal matters starting in Calder v. Bull [1].
History is littered with unconstitutional, enforced laws, as well. Watched a lot of Ken Burns docs this weekend while sick. “The West” has quite a few examples.
There are a lot of things in the US Constitution. But the Supreme Court is the final arbiter, and they're moving closer and closer to "whatever you say, big daddy."
It seems it needs around a $2,500 GPU, do you have one?
I tried Qwen online via its website interface a few months ago, and found it to be very good.
I've run some offline models including Deepseek-R1 70B on CPU (pretty slow, my server has 128 GB of RAM but no GPU) and I'm looking into what kind of setup I would need to run an offline model on GPU myself.
Is there a AI market for open weights? Companies like Alibaba, Tencent, Meta or Microsoft makes a lot sense. They can build on open weights, and not losing values, potentially beneficial for share prices. The only winner is application and cloud providers, I don't see how they can make money from the weights itself to be honest.
I don't know if there is a market for it, but I know that open weights puts pressure on the closed-models companies into releasing their weights and losing their privileged situations.
The only money to be made is in compute, not open weights themselves. What point is a market when a commons like huggingface or modelscope? Alibaba made modelscope to compete with HF, and that's a commons not a market either, if that tells you anything.
By analogy, you can legally charge for copies of your custom Linux distribution, but what's the point when all the others are free?
It promotes an open research environment where external researchers have the opportunity to learn, improve and build. And it keeps the big companies in check, they can't become monopolies or duopolies and increase API prices (as is usually the playbook) if you can get the same quality responses from a smaller provider on OpenRouter
So we're celebrating the real numbers, but maybe we should hoist up the illusory numbers? Back in the day, they thought that some numbers were "imaginary" numbers (e.g. sqrt(-1)) and nowadays, engineers use those imaginary numbers all the time and they feel as real as the reals.
So, here's to math keeping our imagination limber and extending our ideas of what's real.
That's interesting! I'm supposing you've been really afraid before, and you'll know it's quite an agitated state to be in. So normally, without fear, folks will be more at peace, more at ease.
As far as sociopaths and egomaniacs go, my experience is that they are usually quite fearful, though they try to mask it with compensating measures. You might feel bad for them if they weren't so often spoiling other peoples fun with their antics.
Your LLM (CC) doesn't have your whole codebase in context, so it can run off and make changes without considering that some remote area of the codebase are (subtly?) depending on the part that claude just changed. This can be mitigated to some degree depending on the language and tests in place.
The LLM (CC) might identify a bug in the codebase, fix it, and then figure, "Well, my work here is done." and just leave it as is without considering ramifications or that the same sort of bug might be found elsewhere.
I could go on, but my point is to simply validate the issues people will be having, while also acknowledging those seeing the value of an LLM like CC. It does provides useful work (e.g. large tedious refactors, prototyping, tracking down a variety of bugs, and so on...).
This is why you keep CLAUDE.md updated, there it’ll write down what is where and other relevant info about the project.
Then it doesn’t need to feel (or rg) through the whole codebase.
You also use plan mode to figure out the issue, write the implementation plan in a .md file. Clear context, enter act mode and tell it to follow the plan.
Can probably give access to tools like ast-grep to Claude. Will help it see all references. I still agree some dynamic references might still be left. Only way is to prompt well enough. Since I tested this on a Ruby on Rails codebase, I dealt with this.
Who makes the markets in India? Is it the big Indian banks, or do these multinational trading firms act as market makers? If so, how do they distinguish between their trading and market making activities? It seems like it'd be relatively easy to rig a market (control the price) with enough capital and management over the trading.
What does SpaceX have to do with the Musk/Trump spat? Shouldn't those SpaceX contracts be based on how well the country is served by them and at what price.
Trump needs to take his lumps on his BBB. That bill is full of pork for billionaires and cuts funding for poor folks. It should come as no surprise that people don't like it.
> Shouldn't those SpaceX contracts be based on how well the country is served by them and at what price.
I'm always amazed when I read questions like this.
I mean ... have you been paying attention?
Law firms getting security clearances canceled, incarceration without due process, Harvard defunded, memecoins, gutting of the federal government, &c. &c. &c. &c.
Every data point screams malevolence and lack of concern for the common good of the nation.
And you're confused about decision-making over SpaceX that "seems" to ignore how the country is best served?
Don't get me wrong. You pose a valid question. In fact, only a person who himself cares about the common good would ask this type of question.
But man, the big flashing warning signs should be answering your question for you.
> What does SpaceX have to do with the Musk/Trump spat?
Well, SpaceX is owned by Musk. Therefore Trump, if seeking to hurt Musk, could attempt to hurt SpaceX.
The ends justify the means. The country's best interests are collateral damage, the benefit that SpaceX offers the country is not relevant to Trump's ego/feelings having been hurt.
Too little inequality means there is nothing to strive for, the ambitious won’t have much reward to work for.
Too much inequality means too few participants in the economic decision making process which leads to instability (i.e. the mad king phenomenon). We are getting closer to the point of instability now, as we reach higher levels of inequality.
How much do you think was the reward that set in motion all the free and open source software that underlie the many OSes, DBs, and dev tools that we still use for free today?
Fair amount? Yes, many started out as someone messing around but paid developers funded by commercial entities followed most of successful open source we use today.
So then we have proof by example that money is not a necessary incentive for people to create, no? And even if the FOSS projects eventually evolved into foundations that solicit some form of funding, they’re still mostly non-profits that are collecting just enough money to sustain their continued operations and aren’t really chasing profits and growth like regular companies do.
Money was exactly what allowed FOSS projects to succeed. Developers with a day job have enough free time to do their own projects for their own reasons.
Ambition, with respect to wealth, can be a double-edged sword. One can imagine being both ambitious and allowing everyone to share in the benefits, and the act of participating in some advance as its own reward. Those that do gain some advantage often are hiding the tremendous work of others, gaining a grotesque share of the fortune for their being in the right place and time. To paraphrase a scene from Life of Brian, it is the (so-called) ambitious, often, who are the problem.
> Too little inequality means there is nothing to strive for, the ambitious won’t have much reward to work for.
That kind of "ambition" is greed. There are other forms of currency that are arguably much more desired to a healthy society and don't go away in a more equal society: reputation, respect, intellectual authority, etc.
> There are other forms of currency that are arguably much more desired to a healthy society and don't go away in a more equal society: reputation, respect, intellectual authority, etc.
Those doesn't scale though, most jobs doesn't award you any of those. Money is the only real reward low status jobs has gotten ever, take that away and why would anyone wanna work low status jobs?
No, but if we build a society where inequality ceases to be an issue, jobs will stop rewarding greed and start rewarding whatever other currencies attract the best talent.
Similarly, entrepreneurship will be based on those currencies.
Money is a human construct. Inequality is a human construct. Neither are a requirement for the existence of humanity or for humans to thrive.
> if we build a society where inequality ceases to be an issue
Looking back through history, there's always jerks who ruin this, being equal isn't enough: either they want to have more money than, and power over, everyone else; or they explicitly want others to suffer, usually people who look different.
Dealing with those personalities has cost us trillions of dollars throughout human history, but it keeps happening, because you can't evolve better brains in the few thousand years humans have been civilized.
It's very interesting to see how people's imaginations are so captured by propaganda that they don't allow themselves to even consider that societies may be based on things other than money, profit and greed.
Talking about the putative dangers of "too little inequality" when the amount we have today is so absurdly high seems kind of disingenuous, even if that's not what's intended.
In order for there to be "too little inequality" to the extent that you describe, it would have to be impossible for anyone to, say, earn more than about $100k/yr (based on current median incomes in the US). So far as I know, no one with an iota of credibility has proposed anything remotely like this.
As it stands, any time any level of increase in taxes on the ultra-wealthy is proposed, people come out of the woodwork claiming that a higher tax rate on those making $100m, $500m, $1b/yr would just destroy all interest in striving and doing well, which is so clearly absurd.
...And, unfortunately, we also know that there are a fair number of people for whom reduced inequality would destroy their ambition, because their ambition has nothing to do with being reasonably wealthy themselves or having financial security: the reward, for them, is very explicitly being able to treat other people like they aren't even human. And that's not something society should ever accept, from anyone.
> To be lost strips you down to just you, in a world you no longer fully understand, and makes clear how fragile your senses of self and place really are.
When you really get into the unknown, you can get lost without leaving your house, but getting literally lost in the woods is probably a good way to stir up that feeling directly.
The most lost I’ve felt is when my cultural metaphysics dropped away, and I realized how much I was holding onto a false sense of certainty so I wouldn’t have to experience how wild this existence might actually be, or how much my sense of self and place in the universe comes from my culture’s current understanding of it.
So to the intrepid travelers, may your journeys be filled with adventures and wonders beyond imagination!
The famous American scout Frederick Russell Burnham once said that spending 10 days by yourself in the wild would teach you more about survival than he could teach you in 6 months of instruction.