Hacker Newsnew | past | comments | ask | show | jobs | submit | carderne's commentslogin

If anyone from Astral sees this: at this level of effort, how do you deal with the enormous dependence on Github itself? You maintain social connections with upstream, and with PyPA... what if Github is compromised/buggy and changes the effect of some setting you depend on?

We talk to GitHub as well! You're right that they are an enormous and critical dependency, and we pay close attention to the changes they make to their platform.

> what if Github is compromised/buggy

What if? GitHub has is extremely buggy! I'm getting increasingly frustrated with the paper cuts that have become endemic across the entire platform. For example its not uncommon for one of our workflows to fail when cloning a branches of the repo they are running in.


I deliberately didn't mention this because I think most of the pain with Github over the last year is probably caused to some degree by their scale, which seems like an unrelated issue. (But maybe not.)

I’m surprised it works for you with such a simple config? I’m the one that added the allowRead option to Claude’s underlying sandbox [0] and had quite a job getting my toolchains and skills to work with it [1].

[0] Fun to see the confusing docs I wrote show up more or less verbatim on Claude’s docs.

[1] My config is here, may be useful to someone: https://github.com/carderne/pi-sandbox/blob/main/sandbox.jso...


How do agents tend to deal with getting blocked? Messing around with sandboxes, I've quite even seen them get blocked, assume something is wrong, and go _crazy_ trying to get around the block, never stopping to ask for user input. It might be good to add to the error message: "This is deliberate, don't try to get around it."

For those using pi, I've built something similar[1] that works on macOS+Linux, using sandbox-exec/bubblewrap. Only benefit over OP is that there's some UX for temporarilily/permanently bypassing blocks.

[1] https://github.com/carderne/pi-sandbox


Claude Code and Codex quickly figure out they are inside sandbox-exec environment. Maybe because they know it internally. Other agents often realize they are being blocked, and I haven't seen them go haywire yet.

Big love for Pi - it was the first integration I added to Safehouse. I wanted something that offers strong guarantees across all agents (I test and write them nonstop), has no dependencies (e.g., the Node runtime), and is easy to customize, so I didn't use the Anthropic sandbox-runtime.


Interesting, that's not been my experience! Maybe you've got the list of things to allow/block just right. While testing different policies I've frequently seen Opus 4.6 go absolutely nuts trying to get past a block, unless I made it more clear what was happening.

Yeah I think for general use the transparency of what your thing does is really great compared to a pile of TypeScript and whatnot.


ah I also did my own sandbox and at least twice the agent inside tried really hard to go around the firewall, so I ended up intercepting calls to `connect` to return a message that says "Connection refused by the sandbox, don't try to bypass".

Code here: https://github.com/gbrindisi/agentbox


There is sandbox-runtime [1] from Anthropic that uses bubblewrap to sandbox on Linux (and works the same as OP on macOS). You can look at the code to see how it uses it. Anthropic's tool only support read blacklist, not a whitelist, so I forked it yesterday to support that [2].

[1] https://github.com/anthropic-experimental/sandbox-runtime [2] https://github.com/carderne/sandbox-runtime


How does this work with self-hosting? Is the assumption that self-hosters won’t run into this problem?

For most use-cases I’d probably prefer to just delete the payloads some time after the job completes (persisting that data is business logic problem). And keep the benefits of “just use Postgres”, which you guys seem to have outgrown.


Candidly we're still trying to figure that out: all of the plumbing is there in the open source, but the actual implementation of writes to S3 are only on the cloud version. This is partially because we're loath to introduce additional dependencies, and partially because this job requires a decent amount of CPU and memory and would have to run separate from the Hatchet engine, which adds complexity to self-hosted setups. That said, we're aware of multi-TB self-hosted instances, and this would be really useful for them - so it's important that we can get this into the open source.

The payloads are time-partitioned (in either case) so we do drop them after the user-defined retention period.


I guess you don't get the luxury of being opinionated enough to say: forget your old data.

Anyway great write-up, even though I'm sure it's painful having to run this system on top of your once-elegant Postgres solution.


I got pi to write me a very basic sandbox based on an example from the pi github. Added hooks for read/write/edit/bash, some prompts to temp/perm override. Have a look, copy-paste what you like.

https://github.com/carderne/pi-sandbox


The people pushing oh-my-pi seem to have missed the point of pi... Downloading 200k+ lines of additional code seems completely against the philosophy of building up your harness, letting your agent self-improve, relying on code that you control.

If you want bags of features, rather clone oh-my-pi somewhere, and get your agent to bring in bits of it a time, checking, reviewing, customising as you go.


Yeah ohmypi is garbage. The point is you have a thing shell and add your own on top by just talking to pi itself or pick in selective extensions.


I'd say it's the idea/fact/feeling that, in 2026, agency matters more than skill/wisdom/intelligence.

Long read on the topic (quite funny, covers Cluely): https://harpers.org/archive/2026/03/childs-play-sam-kriss-ai...


Probably, Roy was born agentic as a part of a package which included an disregard for intellectual growth.

This doesn't mean that being agentic cannot be cultivated by regular people.

In 2026, yes, agency matters more than skill/wisdom/intelligence to get VC funds. But what's the point of agency alone if you are leading such a life?

What gives me hope is that in 2026, skillful people can delegate a lot of their work to LLMs, which gives them time to learn the "agentic" part which is basically marketing and talking with people.

(just thinking out loud)


It means the marginal cost to sell another subscription is lower than what they sell it for. I don't know if that's true, but it seems plausible.


This is super interesting framing. I’m definitely a completer, not that I like much about Slack. Probably useful to have this kind of discussion before/while making knowledge management decisions in startups.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: