I’ve been thinking a lot about the light vm side lately but it’s not an area we are going to attack ourselves. I think there’s a really good pairing between what we’re working on.
Agents, ai code executions are a very good use case.
I think anyone looking to use infra that needs below properties are well served by this project:
1. subsecond vm cold starts
2. kernel isolation (vs containers)
3. consistent local <-> remote environment
4. elastic cpu, memory.
5. ease to setup.
I am designing it as a infra primitive on purpose for general workloads as opposed to others in the microvm space i.e. firecracker was designed for lambda/serverless workloads.
Libraries are the single reason I got back into video games after a multi-decade hiatus.
I played very few games from 2002 to 2017. Didn't want to keep buying new computers, and did not want to bother with consoles (graphics was better on PC than a non-HD TV).
In 2010 I bought a PS3, but only to watch Blu-Ray, Netflix and stream from my PC to TV. Did not play games on it.
Then in 2016/2017, on a whim, I decided to check out a game from the library. I Googled some good games, and picked Telltale's The Living Dead.
Oh wow. One of the best games I've ever played. For the next 2 months I kept checking out games and playing them.
Then for some reason I stopped. I started again in 2022 and haven't looked back. Seriously cut down my TV watching so I can play the games. I don't use the library any more - I just buy the games.
It's a very unique and fulfilling experience to be one with the nature. You get to learn that chickens eat almost anything. There's definitely a sense of belonging in nature that I miss
This is exactly the world I'm working towards with packaging tooling with a virtual machine i.e. electron but with virtual machines instead so the isolation aspect comes by default.
I found his story to give my mom hope when her cancer had metastasized to her brain in 2025.
Cancer cells in the brain are in a nutrition rich environment for growth and at the same time dangerous for treatment for removal and to prevent growth. The expected 5 year life expectancy is less than 5%.
Dr. Richard Scolyer had been diagnosed at 2023 and is still with us today. I hope he succeeds in his work.
So... I'm working on an open source technology to make a literal virtual machine shippable i.e. freezing everything inside it, isolated due to vm/hypervisor for sandboxing, with support for containers too since it's a real linux vm.
The problems you mentioned resonated a lot with me and why I'm building it, any interest in working to solve that together?: https://github.com/smol-machines/smolvm
Thanks for the pointer! Love the premise project. Just a few notes:
- a security focused project should NOT default to train people installing by piping to bash. If i try previewing the install script in the browser it forces download instead of showing as plain text. The first thing i see is an argument
# --prefix DIR Install to DIR (default: ~/.smolvm)
that later in the script is rm -rf deleting a lib folder. So if i accidentally pick a folder with ANY lib folder this will be deleted.
- Im not sure what the comparison to colima with krunkit machines is except you don't use vm images but how this works or how it is better is not 100% clear
- Just a minor thing but people don't have much attention and i just saw aws and fly.io in the description and nearly closed the project. it needs to be simpler to see this is a local sandbox with libkrun NOT a wrapper for a remote sandbox like so many of the projects out there.
Will try reaching you on some channel, would love to collaborate especially on devX, i would be very interested in something more reliable and bit more lightweight in placce of colima when libkrun can fully replace vz
Love this feedback, agree with you completely on all of it - I'll be making those changes.
1. In comparison with colima with krunkit, I ship smolvm with custom built kernel + rootfs, with a focus on the virtual machine as opposed to running containers (though I enable running containers inside it).
What is the alternative to bash piping? If you don't trust the project install script, why would you trust the project itself? You can put malware in either.
That assumes you even need an install script. 90% of install scripts just check the platform and make the binary executable and put it in the right place. Just give me links to a github release page with immutable releases enabled and pure binaries. I download the binary but it in a temporary folder, run it with a seatbelt profile that logs what it does. Binaries should "just run" and at most access one folder in a place they show you and that is configurable! Fuck installers.
It turns out that it's possible for the server to detect whether it is running via "| bash" or if it's just being downloaded. Inspecting it via download and then running that specific download is safer than sending it directly to bash, even if you download it and inspect it before redownloading it and piping it to a shell.
The server can also put malware in the .tar.gz. Are you really checking all the files in there, even the binaries? If you don't what's the point of checking only the install script?
The latest Firefox build that Debian did only took just over one hour on amd64/armhf and 1.5 hours on ppc64el, the slowest Debian architecture is riscv64 and the last successful build there took only 17.5h, so definitely not days. Your average modern developer-class laptop is going to take a lot less than riscv64 too.
> If you don't what's the point of checking only the install script?
The .tar.gz can be checksummed and saved (to be sure later on that you install the same .tar.gz and to be sure it's still got the same checksum). Piping to Bash in one go not so much. Once you intercept the .tar.gz, you can both reproduce the exploit if there's any (it's too late for the exploit to hide: you've got the .tar.gz and you may have saved it already to an append-only system, for example) and you can verify the checksum of the .tar.gz with other people.
The point of doing all these verifications is not only to not get an exploit: it's also to be able to reproduce an exploit if there's one.
There's a reason, say, packages in Debian are nearly all both reproducible and signed.
And there's a reason they're not shipped with piping to bash.
Other projects shall offer an install script that downloads a file but verifies its checksum. That's the case of the Clojure installer for example: if verifies the .jar. Now I know what you're going to say: "but the .jar could be backdoored if the site got hacked, for both the checksum in the script and the .jar could have been modified". Yes. But it's also signed with GPG. And I do religiously verify that the "file inside the script" does have a valid signature when it has one. And if suddenly the signing key changed, this rings alarms bells.
Why settle for the lowest common denominator security-wise? Because Anthropic (I pay my subscription btw) gives a very bad example and relies entirety on the security of its website and pipes to Bash? This is high-level suckage. A company should know better and should sign the files it ships and not encourage lame practices.
Once again: all these projects that suck security-wise are systematically built on the shoulders of giants (like Debian) who know what they're doing and who are taking security seriously.
This "malware exists so piping to bash is cromulent" mindset really needs to die. That mentality is the reason we get major security exploits daily.
> And I do religiously verify that the "file inside the script" does have a valid signature when it has one.
If you want to go down this route, there is no need to reinvent the wheel. You can add custom repositories to apt/..., you only need to do this once and verify the repo key, and then you get this automatic verification and installation infrastructure. Of course, not every project has one.
Probably on the side of your project, but did you try SmolBSD? <https://smolbsd.org>
It's a meta-OS for microVMs that boots in 10–15 ms.
It can be dedicated to a single service (or a full OS), runs a real BSD kernel, and provides strong isolation.
Overall, it fits into the "VM is the new container" vision.
Disclaimer: I'm following iMil through his twitch streams (the developer of smolBSD and a contributor to NetBSD) and I truly love what he his doing. I haven't actually used smolBSD in production myself since I don't have a need for it (but I participated in his live streams by installing and running his previews), and my answer might be somewhat off-topic.
At a glance, it's a matter of compatibility, most software has first class support for linux. But very interesting work and I'm going to follow it closely
Run locally on macs, much easier to install/use, and designed to be "portable" meaning you can package a VM to preserve statefulness and run it somewhere else.
worked in AWS and specifically with firecracker in the container space for 4 years - we had a very long onboarding doc to dev on firecracker for containers... So I made sure to focus on ease of use here.
At one meeting to build out a new service as a next generation to a flagship AWS service that I worked on, I got to meet all the product leaders and managers.
At that meeting, I realized most of them had never used the product and see their claim to leadership role due to their the ability to manage up and down.
I use the product on my personal projects and I hated it with a passion.
I agree and sadly I wouldn't hold hopes to see actual meaningful changes (granted - last time had windows was win 7),
My reasoning is from bitter experience. I saw too many these honest talks/commitments - it always this pattern when product/company starts to decline. Suddenly somebody with technical background shows up talks about past mistakes and what need to fix. Even sometimes holds discussion, which is usually very reasonable. But as time goes there only cosmetic changes with excuses like lack of resources, market wind changed this time, too hard make changes due politics and etc.
Something that comes to mind for me is the old Bill Gates trustworthy computing memo [0], from the era when early windows xp was getting flak for poor security. That was supposedly the turning point where they started those overhauls towards service pack 2 and likewise added a security focus in other products, and they decided they couldn't sneak in easter egg flight simulators into excel any more because it just added opportunities for flaws.
What stands out to me is the organization needs to be accept that change is needed and 'walk the walk', and also that those efforts take time. I've no idea what things are in motion in MS, but I wonder how quickly they can turn the ship, how much momentum is in their current direction and how much force is in turning. Moving the taskbar seems like addressing a loud persistent talking point, but it's one among many. What's the timeline (even though windows version timing seems like 'when they need branding')? Win12? Win13?
The only thing I'd add is that not only did he tweet the infamous tweet that caused the backlash, Pavan ridiculed those in the backlash (since deleted). Also, Satya still spews the same "agentic OS" narrative as recent as last week.
So, I hope for the best, but I don't plan on taking them at their word.
Everyone at MSFT who is senior is a lying piece of shit these days. I remember on here Satya being treated like the second coming of Jesus due to his promises. Any comments against him were downvoted.
Absolutely nothing wrong with an "agentic OS", agentic UX is the future of personal computing. The ideal is that something intelligent understands what you want to do and gets it done.
Unless you really think we've reached the pinnacle of user interface with repetitive clicking around and menus.
The problem is with shoving AI down user's throats. Make it an option, not the only option.
> The ideal is that something intelligent understands what you want to do and gets it done.
Maybe? For a couple of decades, we believed that computers you can talk to are the future of computing. Every sci-fi show worth a dime perpetuated that trope. And yet, even though the technology is here, we still usually prefer to read and type.
We might find out the same with some of the everyday uses of agentic tech: it may be less work to do something than to express your desires to an agent perfectly well. For example, agentic shopping is a use case some companies are focusing on, but I can't imagine it being easier to describe my sock taste preferences to an agent than click around for 5 minutes and find the stripe pattern I like.
And that's if we ignore that agents today are basically chaos monkeys that sometimes do what you want, sometimes rm -rf /, and sometimes spend all your money on a cryptocurrency scam. So for the foreseeable future, I most certainly don't want my OS to be "agentic". I want it to be deterministic until you figure out the chaos monkey stuff.
I think your last paragraph is the real issue that will forever crush improvements over clicking on stuff. Once you get to "buy me socks" you're just entering some different advertising domain. We already see it with very simple things like getting Siri to play a song. Two songs with the same name, the more popular one will win, apply that simple logic to everything and put a pay to play model in it and there's your "agentic" OS of the future.
I beg to differ that "the technology is here". Everyone I see who uses voice commands have to speak in a very contrived manner so that the computer can understand them properly. Computer vision systems still run into all sorts of weird edge cases.
We've progressed an impressive lot since, say, the nineties when computers (and the internet) started to spread to the general consumer market but the last 10% or so of the way is what would really be the game changer. And if we believe Pareto, of course that is gonna be 90% of the work. We've barely scratched the surface.
> it may be less work to do something than to express your desires to an agent perfectly well
As I use AI more and more to write code I find myself just implementing something myself more and more for this reason. By the time I have actually explained what I want in precise detail it's often faster to have just made the change myself.
Without enough detail SOTA models can often still get something working, but it's usually not the desired approach and causes problems later.
yeah for me even with other people, the amount of times you think "it would be easier for me to just show you" is maybe 30% of interactions with agents currently.
perplexity keeps trying to get me to use "computer" and for the life of me I can't think of anything I'd actually do with it.
It all depends on where the the AI is running. The problem with the idea, is that for the majority of Windows boxes where it would be running do not have the bare metal hardware to support local models and thus it would be in the cloud and all of the issues associated with that when it comes to privacy/security. It would be neat, given MSFT's footprint, to look to develop small models, running locally, with user transparency when it comes to actions, but that doesn't align with MSFT's core objectives.
AFAIK the existing Copilot features always use the NPU and do not fall back to the cloud. Given that Windows 12 will require an NPU I don't see why it would fall back either.
This is true for only features of Copilot+. The issue that MSFT faces, especially as it pushes Copilot EVERYWHERE is the reality of the majority if the hardware running Windows does not, and will not have, the NPU required for 12, nor is there the actual consumer purchasing power, to upgrade hardware to have an NPU. This a reality that MSFT just does not seem want to deal with while the push the technology onto consumers because its not based off of the reality of the install base they are dealing with but rather trying to justify their strategic investment into AI in the B2C space without doing the proper product market fit to justify it.
- "summarize the discussions on hacker news of last week based on what I would find interesting".
- "Plan my summer vacation with my family, suggest different options"
- "Look at my household budget and find ways to be more frugal."
There are thousands of things I can think of when it comes to how an agentic OS would work better than the current Screen Keyboard paradigm. I mean all these things I could now do with Claude or Codex and some of these things I already do with these tools.
>What specifically does an agentic OS UX look like beyond giving claude access to local files and a browser?
Providing the structure of a unified framework: APIs, safeguards, routing to the appropriate model or pipeline, and controlled access to devices and data. The capability is already there. What’s missing is a sane permission system that operates at the level of intent. Having used OpenClaw, that’s IMO the missing piece. It’s a fun experience, but in its current state I would not trust it to autonomously run any meaningful part of my life.
UX-wise, chat is kind of a crutch. It’s slow and inherently limiting. I imagine something closer to a natural, ongoing conversation paired with an execution layer: some sort of approval or review dashboard where planned actions are ready for approval or returned for refinment before they happen. Probably with a conservative moderator agent in the loop that flags things based on preferences and hard-coded policies.
Calling it an OS isn’t accurate, I agree. But that's how people will perceive it. Most people already think of the application layer on Android as "the OS," not the kernel or drivers. This will be the first-class interface on your device, so that’s what it gets called. It doesn’t mean browsers or dedicated applications go away.
Three years ago I would not have thought the IDE would stop being the application I spend most of my time in. Now it’s mostly a passive code viewer and Git browser.
Compare that to everyday workflows. Researching anything still feels incredibly antiquated. Buying a phone, planning a vacation, comparing options means opening dozens of tabs, copy-pasting specs or prices into spreadsheets, reading through fine print, dealing with low-quality or honestly untrustworthy reviews, checking distances manually on maps. It’s boring and tedious work.
Meanwhile, in a professional life, these systems already behave like a team of secretaries: always available, reasonably competent, and scalable. Not perfect, but easily good enough to offload a huge amount of cognitive overhead.
What I'm trying to say is the long path is "get shit done". No work is completed by reading AI summaries of informative content. Its just productivity porn
Even theoretical AI still has the other mind problem from economics.
Communicating and predicting desires, preferences, thoughts, feelings from one mind to another is difficult.
Fundamentally the easiest way of getting what you want is to be able to do it yourself.
Introduce an agent, and now you get the same utility issues of trying to guess what gifts to buy someone for their birthday. Sure every now and then you get the marketers "surprise and delight", but the main experience is relatively middling, often frustrating and confusing, and if you have any skill or knowledge in The area or ability to do it yourself, ultimately frustrating.
We've already been through this when people a decade ago thought voice was the future of the computer.
When that completely didn't work, we thought that augmented reality was the future of the computer, which also didn't work out.
You need a screen to be able to verify what you're doing (try shopping on Amazon without a screen), which means you also need a UI around it, which then means voice (and by extension agents which also function by conversation) is slower and dumber than the UI, every time.
Meanwhile I have yet to see any brand excited to be integrated with ChatGPT and Claude. Unlike a consumer; being a purely "reasoning-based" agent, they're most likely to ignore everything aesthetic and pick the bottom of the barrel cheapest option for any category. How do you convince an AI to show your specific product to a customer? You don't.
"Agentic typewriters are the future of typewriting. The idea is that something intelligent understands what you want to type and types it for you. Unless you really think we've reached the pinnacle of typewriter interfaces with repetitive key taps and carriage returns."
See how that sounds a bit silly? It's because it presents a false dichotomy. That our choice is between either the current state of interfaces or an agentic system which strips away your autonomy and does it for you.
We’ve had computing technology that clearly understands what the user wants to do. It’s called a command line interface. No guessing, no recommendations, no dark patterns, no bullshit.
free, open source -> https://github.com/smol-machines/smolvm
I worked with firecracker a lot back in the day and realized it was a pain to use. And containers had a lot of gotchas too.
Since sandboxing is all the rage now - I think it'd be a better infra primitive than firecracker that works locally/remote and etc.
reply