Seems everyone is working on the same things these days. I built a persistent Python REPL subprocess as an MCP tool for CC, it worked so insanely well that I decided to go all the way. I already had an agentic framework built around tool calling (agentlib), so I adapted it for this new paradigm and code agent was born.
The agent "boots up" inside the REPL. Here's the beginning of the system prompt:
>>> help(assistant)
You are an interactive coding assistant operating within a Python REPL.
Your responses ARE Python code—no markdown blocks, no prose preamble.
The code you write is executed directly.
>>> how_this_works()
1. You write Python code as your response
2. The code executes in a persistent REPL environment
3. Output is shown back to you IN YOUR NEXT TURN
4. Call `respond(text)` ...
You get the idea. No need for custom file editing tools--Python has all that built in and Claude knows it perfectly. No JSON marshaling or schema overhead. Tools are just Python functions injected into the REPL, zero context bloat.
I also built a browser control plugin that puts Claude directly into the heart of a live browser session. It can inject element pickers so I can click around and show it what I'm talking about. It can render prototype code before committing to disk, killing the annoying build-fix loop. I can even SSH in from my phone and use TTS instead of typing, surprisingly great for frontend design work. Knocked out a website for my father-in-law's law firm (gresksingleton.com) in a few hours that would've taken 10X that a couple years ago, and it was super fun.
The big win: complexity. CC has been a disaster on my bookkeeping system, there's a threshold past which Claude loses the forest for the trees and makes the same mistakes over and over. Code agent pushes that bar out significantly. Claude can build new tools on the fly when it needs them. Gemini works great too (larger context).
> there's a threshold past which Claude loses the forest for the trees and makes the same mistakes over and over.
Try using something like Beads with Claude Code. Also don't forget to have a .claude/instructions.md file, you can literally ask Claude to make it for you, this is the file Claude reads every time you make a new prompt. If Claude starts acting "off" you tell it to reread it again. With Beads though, I basically tell Claude to always use it when I pitch anything, and to always test that things build after it thinks its done, and to ask me to confirm before closing a task (They're called beads but Claude figures what I mean).
With Beads the key thing I do though once all that stuff is setup is I give it my ideas, it makes a simple ticket or tickets. Then I both braindump and / or ask it to do market research on each item and the parameters I want to be considered, and then to update the task accordingly. I then review them all. Then I can go "work on these in parallel" and it spins up as many agents as there are tasks, and goes to work. I don't always make it do the work in parallel if its a lot of tasks because Zed has frozen up on me, I read that Claude Code is fine its just the protocol that Zed uses that gets hosed up.
I find that with Beads, because I refine the tickets with Claude, it does way better at tasks this way. It's like if you're hand crafting the "perfect" prompt, but the AI model is writing it, so it will know exactly what you mean because it wrote it in its own verbiage.
Git as db is clever, and the sqlite cache is nice. I'd been sketching sqlite based memory features myself. So much of the current ecosystem is suboptimal just because it's new. The models are trained around immutable conversation ledgers with user/tool/assistant blocks, but there are compelling reasons to manipulate both sides at runtime and add synthetic exchanges. Priming with a synthetic failure and recovery is often more effective than a larger, more explicit system message. Same with memory, we just haven't figured out what works best.
For code agents specifically, I found myself wanting old style completion models without the structured turn training--doesn't exist for frontier models. Talking to Claude about its understanding of the input token stream was fascinating; it compared it to my visual cortex and said it'd be unreasonable to ask me to comment on raw optic nerve data.
"Tell it to reread it again"-exactly the problem. My bookkeeping/decision engine has so much documentation I'm running out of context, and every bit is necessary due to interconnected dependencies. Telling it to reread content already in the window feels wrong, that's when I refine the docs or adjust the framework. I've also found myself adding way more docstrings and inline docs than I'd ever write for myself. I prefer minimal, self-documenting code, so it's a learning process.
There's definitely an art to prompts, and it varies wildly by model, completely different edge cases across the leading ones. Thanks again for the tip; I suspect we'll see a lot of interesting memory developments this year.
This sounds really cool! I love the idea behind it, the agent having persistent access to a repl session, as I like repl-based workflows in general. Do you have any code public from this by any chance?
See CodeAgent or subrepl.py if you're just interested in the REPL orchestration. I also have a Python REPL MCP server that works with CC. It isn't published, but I could share it by request.
My favorite part of code agent is the /repl command. I can drop into the REPL mid session and load modules, poke around with APIs and data, or just point Claude in the right direction. Sometimes a snippet of code is worth 1000 words.
I get where you're coming from, especially since role playing was so vital in early models in a way that is no longer necessary, or even harmful; however, when designing a complex system of interactions, there's really no way around it. And as humans we do this constantly, putting on a different hat for different jobs. When I'm wearing my developer hat, I have to reason about the role of each component in a system, and when I use an agent to serve in that role, by curating it's context and designating rules for how I want it to behave, I'm assigning it a persona. What's more, I may prime the context user and assistant messages, as examples of how I want it to respond. That context becomes the agent's personality--it's persona.
The framework is all python, but I used C for this helper. It uses unprivileged user namespaces to mount an overlay and run an arbitrary command, then when the command finishes, it writes a tarball of edits, which I use to create a unified diff. The framework orchestrates it all transparently, but the helper itself could be used standalone. Here's a short document about the sandbox in the context of it's use in my project:
I also have a version that uses SUID instead of unprivileged user namespaces, available by request.
I often use claude code with --dangerously-skip-permissions but every once in a while it bites me. I've learned to use git for everything and put instructions to always commit BEFORE writes in CLAUDE.md. Claude can go off the rails on harder bug fixes, especially if there are multiple rounds of context compacting, it can really screw things up. It usually honors guidance not to modify outside of the project, but a simple sandbox adds so much, after the session is over you can see what changed and decide what to do with it. It really helps with the problem where it makes unexpected changes to the codebase, which you might not even notice otherwise, which can introduce serious bugs. The permission models of all the coding agents are rough--either you can't get anything done, or you throw caution to the wind. Full sandboxes are quite restrictive, which is why I rolled by own. Honestly your best option right now is just to have good version control and run coding agents in dedicated environments.
"The global North's carbon problem subsidizes the global South's energy access."
This is problematic. The subsidized economy will grow inefficiently, the wealth transfer will inevitably result in a corrupt class of bureaucrats who seek to maintain the status quo even when it doesn't make sense. Time will pass and it will get worse until there is political will for change, and that change will result in the suffering of those whom the initial intent was to help.
Is it just me or have all mainstream news agencies suffered a significant loss in quality in recent years? It all seems lazy, opinionated, more like social media and less like old school journalism, less trustworthy... and now they're cutting book reviews?! Maybe I'm just getting older.
Journalism was always bad, it just seemed better in the past because people had less to compare it to, less ability to check things out themselves, etc. As for "Old School Journalism", was that the sort that helped George Bush start the Iraq War? Or the sort that started the Spanish-American War? If there was ever a golden age of journalists when people spat straight facts without interjecting their bias, I genuinely have no clue when it was.
You can find an archive of thousands of PBS News Hour episodes online, I've watched dozens of episodes from the 80s and 90s. This show has a tone and air of respectability, a thoughtful show for high brow people who like to consider the facts. But that's really just the surface aesthetic. Besides modern news shows being flagrantly tacky, the meat of what they do is the same; repeat some basic 'facts' about the story, many of which will be proven wrong in later years, then have some people selected through mysterious processes come on to talk about how the viewer should feel. In retrospect very little of it was ever accurate and stories which seemed important then aren't in retrospect.
Well, not to be too obvious, but people do not pay for news anymore, they expect news to be free on Google or social media. Hence firing of journalists, loss of quality everywhere. Less money, less quality.
You're right. It's the collapse of web advertising, I think. News websites can't make money from ads adjacent to articles, so now the article is the ad.
I've been seeing AI slop being used as ad-hominem. If I'm writing a couple paragraphs, I'll often run it through a model and ask it to make minimal edits for spelling and grammar. It makes it more readable and saves me time editing. If someone doesn't like my thoughts and they see an em dash, they can call it AI slop instead of responding, which is really annoying because the model otherwise does a good job of editing. In some cases I've been accused of AI slop for original unedited content.
I actually associate it with a younger writer, perhaps it skipped a generation. I can imagine that someone who grew up in a world where typing an em dash meant looking up an alt code would develop a style that avoids them—but it's only a long press of the - away on the device where I do most of my writing by volume so of course
I'm going to use them.
I just use commas or semi-colons where I would have previously used em dashes. It's annoying to have to adapt to avoid triggering people's faulty AI slop pattern matching, but the alternatives are perfectly fine.
"AI slop" is following the same path as "Dunning-Kruger effect," "enshittification," and so many other terms. Someone introduces a term that's useful to describe an actual phenomenon, it rapidly spreads to dominate the discourse because it's topical and punchy, and pretty soon using it is such a strong signal of being one of the "cool people who hates all the correct bad stuff" that people use it to describe stuff they merely don't like or disagree with. Once everyone's using it, it becomes useless for both its original descriptive purpose and as a social signal, so all the trendy discourse addicts move onto the next linguistic innovation and you only see random people on Facebook or Reddit who are behind the times using it, usually inaccurately as they're just following the misuse they learned it from.
It's particularly scary watching "AI slop" follow that path because of the extreme moral polarization associated with using LLMs or generative art. There's people who will see some casual mention of a game or film or app or something "using AI" on social media without evidence and immediately blast off into a witch hunt to make sure the whole world knows that whoever involved with that thing are Bad People who need to be shunned and punished. It has almost immediately become the go-to way to slam someone online because it carries such strong implications, requires little/no evidence, and is almost impossible to fully refute. Think there's a lot to learn from observing this, and it does not bode well for the next few years of discourse.
Love it! It's going on my toolbar. I face the same problem, constantly trying to hunt down the latest pricing which is often changing. I think it's great that you want to add more models and features, but maybe keep the landing page simple with a default filter that just shows the current content.
reply