Show HN: Stop Claude Code from forgetting everything

ramoz · 2025-12-30T00:14:20 1767053660

I struggle with these abstractions over context windows, esp when anthropic is actively focused on improving things like compaction, and knowing the eventual* goal is for the models to yave real memory layers baked in. Until then we have to optimize with how agents work best and ephemeral context is a part of that (they weren’t RL’d/trained with memory abstractions so we shouldn’t use them at inference either). Constant rediscovery that is task specific has worked well for me, doesn’t suffer from context decay, though it does eat more tokens.

Otherwise the ability to search back through history is a valuable simple git log/diff or (rip)grep/jq combo over the session directory. Simple example of mine: https://github.com/backnotprop/rg_history

AndyNemmity · 2025-12-30T00:19:44 1767053984

There is certainly a level where at any time you could be building some abstraction that is no longer required in a month, or 3.

I feel that way too. I have a lot of these things.

But the reality is, it doesn't really happen that often in my actual experience. Everyone is very slow as a whole to understand what these things mean, so far you get quite a bit of time just with an improved, customized system of your own.

ramoz · 2025-12-30T00:38:16 1767055096

My somewhat naive heuristic would be that memory abstractions are a complete mistep in terms of optimization. There is no "super claude mem" or "continual claude" until there actually is.

https://backnotprop.com/blog/50-first-dates-with-mr-meeseeks...

AndyNemmity · 2025-12-30T00:41:01 1767055261

I tend to agree with you, however compacting has gotten much worse.

So... it's tough. I think memory abstractions are generally a mistake, and generally not needed, however I also think that compacting has gotten so wrong recently that they are also required until Claude Code releases a version with improved compacting.

But I don't do memory abstraction like this at all. I use skills to manage plans, and the plans are the memory abstraction.

But that is more than memory. That is also about having a detailed set of things that must occur.

ramoz · 2025-12-30T00:44:19 1767055459

I’m interested to see your setup.

I think planning is a critical part of the process. I just built https://github.com/backnotprop/plannotator for a simple UX enhancement

Before planning mode I used to write plans to a folder with descriptive file names. A simple ls was a nice memory refresher for the agent.

AndyNemmity · 2025-12-30T01:17:25 1767057445

I understand the use case for plannotator. I understand why you did it that way.

I am working alone. So I am instead having plans automatically update. Same conception, but without a human in the mix.

But I am utilizing skills heavily here. I also have a python script which manages how the LLM calls the plans so it's all deterministic. It happens the same way every time.

That's my big push right now. Every single thing I do, I try to make as much of it as deterministic as possible.

edmundsauto · 2025-12-30T16:49:29 1767113369

Would you share an overview of how it works? Sounds interesting

AndyNemmity · 2025-12-30T17:15:37 1767114937

Perhaps I can release it as a standalone github skill, and then do a blog post on it or something.

I'm just also working on real projects as well, so a lot of my priority is focused on new skills building, and not worrying about managing the current ones I have as github repos.

edmundsauto · 2026-01-01T21:13:07 1767301987

That would probably be a lot of work for little gain. Would you be open to asking Claude to summarize your approach and just put it into a paste? I'm less interested in specific implementations and more about approaches, what the tradeoffs are and where it best applies.

saberience · 2025-12-30T11:05:47 1767092747

Do we really need another vibe-coded LLM context/memory startup?

Do the authors have any benchmarks or test to show that this genuinely improved outputs?

I have tried probably 10-20 other open source projects and closed source projects purporting to improve Claude Code with memory/context, and still to this date, nothing works better than simply keeping my own library of markdown files for each project specification, markdown files for decisions made etc, and then explicitly telling Claude Code to review x,y,z markdown files.

I would also suggest to the founders, don't found a startup based on improving context for Claude Code, why? Because this is the number 1 thing the Claude Code developers are working on too, and it's clearly getting better and better with every release.

So not only are you competing with like 20+ other startups and 20+ other open-source projects, you are competing with Anthropic too.

sdoering · 2025-12-30T11:22:52 1767093772

This. Exactly this. Even relatively well working tools (from my experience and for my project types) like Agent OS are no guarantee, that Claude will not go on a tangent, use the "memory files" the framework tells it to use.

And I agree with your sentiment, that this is a "business field" that will get eaten by the next generations of base models getting better.

christinetyip · 2025-12-30T13:32:28 1767101548

I mostly agree with this, if the goal were “better persistent memory inside Claude Code,” that wouldn’t be very interesting.

For a single agent and a single tool, keeping project specs and decisions in markdown and explicitly pointing the model at them works well. We do that too.

What we’re focused on is a different boundary: memory that isn’t owned by a specific agent or tool.

Once you start switching between tools (Claude, Codex, Cursor, etc.), or running multiple agents in parallel, markdown stops being “the memory” and becomes a coordination mechanism you have to keep in sync manually. Context created in one place doesn’t naturally flow to another, and you end up re-establishing state rather than accumulating it.

That’s why we're not thinking about this as "improving Claude Code”. We’re interested in the layer above that: a shared, external memory that can be plugged into any other model and tools, that any agent can read from or write to, and that can be selectively shared with collaborators. Context created in Claude can be reused in Codex, Manus, Cursor, or other agents from collaborators - and vice versa.

If one already built and is using one agent in one tool and is happy with markdown, they probably don’t need this. The value shows up once agents are treated as interchangeable workers and context needs to move across tools and people without being re-explained each time.

ramoz · 2025-12-30T14:40:20 1767105620

If markdown in a git repository isn’t good enough for collaboration, then why would any plugged in abstraction be better?

You imply you have a solution for current wholistic state. For this you would need a solution for context decay and relevant curation — with benchmarks that prove it is also more valuable than constant rediscovery (for quality and cost).

That narrative becomes harsher once you pivot to “general purpose agents” because you’re then competing with every existing knowledge work platform. So you’ll shift into “unified context for all your KW platforms” - where presumably the agents already have access (Claude today can basically go scrape all knowledge from anywhere).

So then it becomes an offering of “current state” in complex human processes and this is a concept I’m not sure any technology can capture; whether it’s across codebases (which for humans we settled on git) and especially not general working scenarios. And I guess this is where it becomes a unified multi-agent wholistic state capture. Ambitious and fun problem.

austinbaggio · 2025-12-30T22:12:51 1767132771

| need a solution for context decay and relevant curation — with benchmarks that prove it is also more valuable than constant rediscovery (for quality and cost).

I agree. We are looking at some metr benchmarks, not expecting a simple answer to this, but do you have any in mind you find compelling?

ramoz · 2025-12-30T23:28:02 1767137282

Not really. But, You can go viral again with a "Coding Agents with memory build better software using less tokens" showcasing how you benchmarked a "twitter rebuild" -

1. Setup Claude Code to build some layers of the stack

2. Setup Codex to build others.

In one instance equip them both with your product. Maybe bake in some tribal knowledge.

In another instance let them work raw.

In both instances, capture:

     - Time to completion
     - Tokens spent
     - Ability to meet original spec
     - Subjective quality 
     - Number of errors and categorize between the layers, to state something like "raw-claude's backend kept failing with raw-codex's frontend" etc

I imagine this benchmark working well in your favor.

floatrock · 2025-12-30T14:19:06 1767104346

right. I stopped reading at "ENSUE_API_KEY | Required. Get one at [dashboard](link to startup showing this is an ad)"

First thought: why do I need an API key for what can be local markdown files. Make contents of CLAUDE.md be "Refer to ROBOTS.md" and you've got yourself a multi-model solution.

Main objection to corporate AI uptake is what are you gonna do with our data. The value prop over local markdown files here is not at all clear to even begin asking that question.

gbnwl · 2025-12-30T01:22:17 1767057737

I'm not sure how many HN users frequent other places related to agentic coding like the subreddits of particular providers, but this has got to be the 1000th "ultimate memory system"/break-free-of-the-context-limit-tyranny! project I've seen, and like all other similar projects there's never any evidence or even attempt at measuring any metric of performance improved by it. Of course it's hard to measure such a thing, but that's part of exactly why it's hard to build something like this. Here's user #1001 that's been told by Claude "What a fascinating idea! You've identified a real gap in the market for a simple database based memory system to extend agent memory."

beefsack · 2025-12-30T03:34:11 1767065651

I feel like so many of these memory solutions are incredibly over-engineered too.

You can work around a lot of the memory issues for large and complex tasks just by making the agent keep work logs. Critical context to keep throughout large pieces of work include decisions, conversations, investigations, plans and implementations - a normal developer should be tracking these and it's sensible to have the agent track them too in a way that survives compaction.

wfn · 2025-12-30T10:06:49 1767089209

Yes. I have (as part of Claude output) a

- `FEATURE_IMPL_PLAN.md` (master plan; or `NEXT_FEATURES_LIST.md` or somesuch)

- `FEATURE_IMPL_PROMPT_TEMPLATE.md` (where I replace placeholders with next feature to be implemented; prompt includes various points about being thorough, making sure to validate and loop until full test pipeline works, to git version tag upon user confirmation, etc.)

- `feature-impl-plans/` directory where Claude is to keep per-feature detailed docs (with current status) up to date - this is esp. useful for complex features which may require multiple sessions for example

- also instruct it to keep main impl plan doc up to date, but that one is limited in size/depth/scope on purpose, not to overwhelm it

- CLAUDE.md has summary of important code references (paths / modules / classes etc.) for lookup, but is also restricted in size. But it includes full (up-to-date) inventory of all doc files, for itself

- If I end up expanding CLAUDE.md for some reason or temporarily (before I offload some content to separate docs), I will say as part of prompt template to "make sure to read in the whole @CLAUDE.md without skipping any content"

ramoz · 2025-12-30T03:52:40 1767066760

Great advise. For large plans I tell the agent to write to an “implementation_log.md” and make note of it during compaction. Additionally the agent can also just reference the original session logs.

hasperdi · 2025-12-30T08:56:35 1767084995

The problem with this approach, is that the model may forget to update the log... It usually happens when the context window >50% filled

ramoz · 2025-12-30T14:23:41 1767104621

I found this happen less often if the task is a part of the plan. It typically gets in a cycle habit of editing code and updating the doc

hasperdi · 2025-12-30T17:47:47 1767116867

thanks, I'll keep that in mind

ilvez · 2025-12-30T10:10:34 1767089434

.. and not only those, but the baseline as well aka CLAUDE.md.. I've countless of times told it basics, in the same session without compacting etc etc

SkyPuncher · 2025-12-30T06:51:53 1767077513

Yep. I just have my agents write out key details to a markdown file. Doesn’t have to be perfect. Just enough to reorient itself to a problem.

ryanthedev · 2025-12-30T13:51:45 1767102705

I agree. Plan files and I use git for my work logs. Have been successful.

xnx · 2025-12-30T08:31:11 1767083471

Some with a coding background love prompt engineering, contrived supporting systems, json prompting and any other superstition that makes it feel like they're really doing something.

They refuse to believe that it's possible to instruct these tools in terse plain English and get useful results.

austinbaggio · 2025-12-30T01:43:17 1767058997

Which of the 1000 is your favorite? There does seem to be a shallow race to optimizing xyz benchmark for some narrow sliver of the context problem, but you're right, context problem space is big, so I don't think we'll hurry to join that narrow race.

gbnwl · 2025-12-30T01:57:21 1767059841

| Which of the 1000 is your favorite?

None, that's what I'm trying to say. My favorite is just storing project context locally in docs that agents can discover on their own or I can point to if needed. This doesn't require me to upload sensitive code or information to anonymous people's side projects and has and equivalent amount of hard evidence for efficacy (zero), but at least has my own anecdotal evidence of helping and doesn't invite additonal security risk.

People go way overboard with MCPs and armies of subagents built on wishes and unproven memory systems because no one really knows for sure how to get past the spot we all hit where the agentic project that was progressing perfectly hits a sharp downtrend in progress. Doesn't mean it's time to send our data to strangers.

gck1 · 2025-12-30T06:39:51 1767076791

> no one really knows for sure how to get past the spot we all hit where the agentic project that was progressing perfectly hits a sharp downtrend in progress.

FWIW, I find this eventual degradation point comes much later and with fewer consequences when there are strict guardrails inside and outside of the LLM itself.

From what I've seen, most people try to fix only the "inside" part - by tweaking the prompts, installing 500 MCPs (that ironically pollute the context and make problem worse), yell in uppercase in hopes that it will remember etc, and ignore that automated compliance checks existed way before LLMs.

Throw the strictest and most masochistic linting rules at it in a language that is masochistic itself (e.g. rust), add tons of integration tests that encode intent, add a stop hook in CC that runs all these checks and you've got a system that is simply not allowed to silently drift and can put itself back on track with feedback it gets from it.

Basically, rather than trying to hypnotize an agent to remember everything by writing a 5000 line agents.md, just let the code itself scream at it and feed the context.

Davidzheng · 2025-12-30T03:57:21 1767067041

Wait.

christinetyip · 2025-12-30T09:35:19 1767087319

This is fair, many memory projects out there boil down to better summaries or prompt glue without any clear way to measure impact.

One thing I’d clarify about what we’re building is that it’s not meant to be “the best memory for a single agent.”

The core idea is portability and sharing, not just persistence.

Concretely:

- you can give Codex access to memory created while working in Claude

- Claude Code can retrieve context from work done in other tools

- multiple agents can read/write the same memory instead of each carrying their own partial copy

- specific parts of context can be shared with teammates or collaborators

That’s the part that’s hard (or impossible) to do with markdown files or tool-local memory, and it’s also why we don’t frame this as “breaking the context limit.”

Measuring impact here is tricky, but the problem we’re solving shows up as fragmentation rather than forgetting: duplicated explanations, divergent state between agents, and lost context when switching tools or models.

If someone only uses a single agent in a single tool and already are using their customized CLAUDE.md, they probably don’t need this. The value shows up once you treat agents as interchangeable workers rather than a single long-running conversation.

gbnwl · 2025-12-30T12:19:52 1767097192

> That’s the part that’s hard (or impossible) to do with markdown files or tool-local memory.

I'm confused because every single thing in that list is trivial? Why would Codex have trouble reading a markdown file Claude wrote or vice versa? Why would multiple agents need their own copy of the markdown file instead of just referring to it as needed? Why would it be hard to share specific files with teammates or collaborators?

Edit - I realize I could be more helpful if I actually shared how I manage project context:

CLAUDE.md or Agents.md is not the only place to store context for agents in a project, you can just store docs at any layer of granularity you want. What's worked best for me is to:

1. Have a standards doc(s) (you can point the agents to the same standards doc in their respective claude.md/agents.md)

2. Before coding, have the agent create implementation plans that get stored in to tickets (markdown files) for each chunk of work that would take about a context window length (estimated).

3. Work through the tickets and update them as completed. Easy to refer back to when needed.

4. If you want you can ask the agent to contribute to an overall dev log as well, but this gets long fast. Is useful for agents to refer to the last 50 lines or so to immediately get up to speed on "what just happened?", but so could git history.

5. Ultimately the code is going to be the real "memory" of the true state, so try to organize it in a way that's easy for agents to comb through (no 5000 lines files that agents have trouble trying to carefully jump around in to find what they need without eating up their entire context window immediately).

christinetyip · 2025-12-30T13:20:17 1767100817

You’re right that reading the same markdown file is trivial, that’s not the hard part.

Where it stopped being trivial for us was once multiple agents were working at the same time. For example, one agent is deciding on an architecture while another is already generating code. A constraint changes mid-way. With a flat file, both agents can read it, but you’re relying on humans as the coordination layer: deciding which docs are authoritative, when plans are superseded, which tickets are still valid, and how context should be scoped for a given agent.

This gets harder once context is shared across tools or collaborators’ agents. You start running into questions like who can read vs. update which parts of context, how to share only relevant decisions, how agents discover what matters without scanning a growing pile of files, and how updates propagate without state drifting apart.

You can build conventions around this with files, and for many workflows that works well. But once multiple agents are updating state asynchronously, the complexity shifts from storage to coordination. That boundary - sharing and coordinating evolving context across many agents and tools — is what we’re focused on and what an external memory network can solve.

If you’ve found ways to push that boundary further with files alone, I’d genuinely be curious - this still feels like an open design space.

gbnwl · 2025-12-30T13:51:05 1767102665

You're still not closing the gap between the problems you're naming and how your solution solves them?

> With a flat file, both agents can read it, but you’re relying on humans as the coordination layer: deciding which docs are authoritative, when plans are superseded, which tickets are still valid, and how context should be scoped for a given agent.

So the memory system also automates project management by removing "humans as the coordination layer"? From the OP the only details we got were

"What it does: (1) persists context between sessions (2) semantic & temportal search (not just string grep)"

Which are fine, but neither it nor you explain how it can solve any of these broader problems you bring up:

"deciding which docs are authoritative, when plans are superseded, which tickets are still valid, and how context should be scoped for a given agent, questions like who can read vs. update which parts of context, how to share only relevant decisions, how agents discover what matters without scanning a growing pile of files, and how updates propagate without state drifting apart."

You're claiming that semantic and temporal search has solved all of this for free? This project was presented as a memory solution and now it seems like you're saying its actually an agent orchestration framework, but the gap between what you're claiming your system can achieve and how you claim it works seems vast.

stingraycharles · 2025-12-30T07:38:05 1767080285

imho, if it’s not based on a RAG, it’s not a real memory system. the agent often doesn’t know what it doesn’t know, and as such relevant memories must be pushed into the context window by embedding distance, not actively looked up.

AndyNemmity · 2025-12-30T02:02:48 1767060168

The funny part is, the vast majority of them are barely doing anything at all.

All of these systems are for managing context.

You can generally tell which ones are actually doing something if they are using skills, with programs in them.

Because then, you're actually attaching some sort of feature to the system.

Otherwise, you're just feeding in different prompts and steps, which can add some value, but okay, it doesn't take much to do that.

Like adding image generation to claude code with google nano banana, a python script that does it.

That's actually adding something claude code doesn't have, instead of just saying "You are an expert in blah"

austinbaggio · 2025-12-30T02:07:51 1767060471

It sounds like you've used quite a few. What programs are you expecting? Assuming you're talking about doing some inference on the data? Or optimizing for some RAG or something?

AndyNemmity · 2025-12-30T02:09:55 1767060595

An example of a skill i gave, adding image generation to nano banana.

another is one claude code ships with, using rip grep.

Those are actual features. It's adding deterministic programs that the llm calls when it needs something.

austinbaggio · 2025-12-30T02:38:17 1767062297

Oh got it - tool use

AndyNemmity · 2025-12-30T02:41:05 1767062465

Exactly. That adds actual value. Some of the 1000s of projects do this. Those pieces add value, if the tool adds value which also isn’t a given

troupo · 2025-12-30T08:17:16 1767082636

> You can generally tell which ones are actually doing something if they are using skills, with programs in them.

> Otherwise, you're just feeding in different prompts and steps

"skills" are literally just .md files with different prompts and steps.

> That's actually adding something claude code doesn't have, instead of just saying "You are an expert in blah"

It's not adding anything but a prompt saying "when asked to do X invoke script Y or do steps Z"

AndyNemmity · 2025-12-30T16:49:45 1767113385

Skill are md files, but they are not just that. They are also scripts. That's what adding things are. You can make a skill that is just a prompt, but that misses the point of the value.

You're packaging the tool with the skill, or multiple tools to do a single thing.

troupo · 2025-12-30T21:29:30 1767130170

In the end it's still an .md file pointing to a script that ends being just a prompt for the agent that the agent may or may not pick up, may or may not discover, may or may not forget after context compaction etc.

There's no inherent magic to skills, or any fundamental difference between them and "just feeding in different prompts and steps". It literally is just feeding different prompt and steps.

AndyNemmity · 2025-12-31T17:50:05 1767203405

I find in my experience that it's trivial to have the skill systematically call the script, and perform the action correctly. This has not been a challenge to me.

Also, the pick up or not pick up, or discover or may not discover is solved as well. It's handled by my router, which I wrote about here - https://vexjoy.com/posts/the-do-router/

So these are solved problems to me. There are many more problems which are not solved, which are the interesting space to continue with.

Forgeties79 · 2025-12-30T01:56:45 1767059805

Have you tried using it? Not being flippant and annoying. Just curious if you tried it and what the results were

Game_Ender · 2025-12-30T03:22:40 1767064960

Why should he put effort into measuring a tool that the author has not? The point is there are so many of these tools an objective measure that the creators of these tools can compare against each other would be better.

So a better question to ask is - Do you have any ideas for an objective way to a measure a performance of agentic coding tools? So we can truly determine what improves performance or not.

I would hope that internal to OpenAI and Anthropic they use something similar to the harness/test cases they use for training their full models to determine if changes to claude code result in better performance.

morkalork · 2025-12-30T04:49:57 1767070197

Well, if I were Microsoft and training co-pilot, I would log all the <restore checkpoint> user actions and grade the agents on that. At scale across all users, "resets per agent command" should be useful. But then again, publishing the true numbers might be embarrassing..

kuboble · 2025-12-30T07:26:29 1767079589

I'm not sure it's a good signal.

I often use restore conversion checkpoint after successfully completing a side quest.

gbnwl · 2025-12-30T02:03:38 1767060218

Who has time to try this when there's this huge backlog here: https://www.reddit.com/r/ClaudeAI/search/?q=memory

Forgeties79 · 2025-12-30T02:28:27 1767061707

Have you tried any of those?

gbnwl · 2025-12-30T02:46:02 1767062762

Yes, they haven't helped. Have you found one that works for you?

austinbaggio · 2025-12-30T02:59:52 1767063592

What are you both looking for? What is the problem you want solved?

ggm · 2025-12-30T04:10:51 1767067851

Is a series of postings all in the form of questions an indication somebody hooked "eliza" up as an input device?

morkalork · 2025-12-30T04:44:35 1767069875

Nah, just another one of those spam bots on all the small-business, finance and tradies sub-reddits: "Hey fellow users, have you ever suffered from <use case>? What is the problem you want solved? Tell me your honest opinions below!"

troupo · 2025-12-30T08:27:23 1767083243

It does nothing but send a bunch of data to a "alpha use at your own risk" third-party site that may or may not run some LLM on your data: https://ensue-network.ai/login

johnnyfived · 2025-12-30T02:35:16 1767062116

I imagine HN, despite being full of experts and vet devs, also might have a prevalent attitude of looking down on using tools like MCP servers or agentic AI libraries for coding, which might be why something like this advertised seems novel rather than redundant.

troupo · 2025-12-30T08:29:26 1767083366

Because experts snd vets can uduslly quickly disassemble layers of marketing bullshit and see through false promises?

Because experts snd vets often use these tools and find them extremely lacking?

DrewADesign · 2025-12-30T03:39:55 1767065995

> I imagine HN, despite being full of experts and vet devs, also might have a prevalent attitude of looking down on using tools like MCP servers or agentic AI libraries for coding, which might be why something like this advertised seems novel rather than redundant.

I’m not sure where the ‘despite’ comes in. Experts and vets have opinions and this is probably the best online forum to express them. Lots of experts and vets also dislike extremely popular unrelated tools like VB, Windows, “no-code” systems, and Google web search… it’s not a personality flaw. It doesn’t automatically mean they’re right, either, but ‘expert’ and ‘vet’ are earned statuses, and that means something. We’ve seen trends come and go and empires rise and fall, and been repeatedly showered in the related hype/PR/FUD. Not reflexively embracing everything that some critical mass of other people like is totally fine.

gbnwl · 2025-12-30T04:00:54 1767067254

I think maybe the point they were trying to make is that despite people on HN being very technically experienced, skepticism and distrust of LLM-assisted coding tools may have prevented many of them from exploring the space too deeply yet. So a project like this may seem novel to many readers here, when the reality for users who've been using and following tools like Claude Code (and similar) closely for a while now is that claims like the one's this project is making come out multiple times per week.

johnnyfived · 2025-12-30T04:07:40 1767067660

They pretty much perfectly encapsulated the point in their fired up response haha.

DrewADesign · 2025-12-30T14:17:02 1767104222

Ha — so it seems… so it seems…

johnnyfived · 2025-12-30T23:19:37 1767136777

Good sport!

willtemperley · 2025-12-30T06:06:18 1767074778

Replace "Automation" with "Agentic coding" here:

https://xkcd.com/1319/

ossa-ma · 2025-12-30T00:27:51 1767054471

There are a quadrillion startups (mem0, langmem, zep, supermemory), open source repos (claude-mem, beads), and tools that do this.

My approach is literally just a top-level, local, git version controlled memory system with 3 commands:

- /handoff - End of session, capture into an inbox.md

- /sync - Route inbox.md to custom organised markdown files

- /engineering (or /projects, /tasks, /research) - Load context into next session

I didn't want a database or an MCP server or embeddings or auto-indexing when I can build something frictionless that works with git and markdown.

Repo: https://github.com/ossa-ma/double (just published it publicly but its about the idea imo)

Writeup: https://ossa-ma.github.io/blog/double

bl4ckneon · 2025-12-30T01:00:19 1767056419

The extention Cline has a "memory bank" feature. It's just a markdown you add as an instruction. Works well for me. Worked with agents.md as well so not just with the Cline extention. Pretty much the same idea.

fastball · 2025-12-30T02:01:49 1767060109

What is the purpose of a separate /handoff and /sync command? It seems like handoff could just write learnings straight to their final destinations without needing an .inbox.md buffer in-between.

ossa-ma · 2025-12-30T02:31:50 1767061910

I like to read and review what was captured in .inbox.md before it is committed and synced across my knowledge base. Allows me to catch mistakes, tweak preferences, add context and decide whether something is actually worth pushing.

I will typically make multiple '/handoff's per day as I use Claude code whereas I typically use '/sync' at the end of the day to organise them all at once.

AndyNemmity · 2025-12-30T00:43:58 1767055438

Your approach essentially matches mine, but I call them plans. I agree with you that the other tools don't seem to add any value compared to this structure.

I think at this point in time, we both have it right.

kaizenb · 2025-12-30T04:59:07 1767070747

Will explore this idea, thanks.

coffeeboy27 · 2025-12-29T23:18:31 1767050311

What's the data retention/deletion policy and is there a self-hosted option planned? I'd prefer not to send proprietary code to third-party servers.

austinbaggio · 2025-12-29T23:23:18 1767050598

Honestly, very reasonable ask, you're not the first person to ask for a self-hosted version. We have a privacy policy we've drafted that is up-to-date with the current version of the product https://www.ensue-network.ai/privacy-policy.

The project is still in alpha, so you could shape what we build next - what do you need to see, or what gets you comfortable sending proprietary code to other external services?

frumplestlatz · 2025-12-29T23:56:52 1767052612

> what do you need to see, or what gets you comfortable sending proprietary code to other external services?

Honestly? It just has to be local.

At work, we have contracts with OpenAI, Anthropic, and Google with isolated/private hosting requirements, coupled with internal, custom, private API endpoints that enforce our enterprise constraints. Those endpoints perform extensive logging of everything, and reject calls that contain even small portions of code if it's identified as belonging to a secret/critical project.

There's just no way we're going to negotiate, pay for, and build something like that for every possible small AI tooling vendor.

And at home, I feed AI a ton of personal/private information, even when just writing software for my own use. I also give the AI relatively wide latitude to vibe-code and execute things. The level of trust I need in external services that insert themselves in that loop is very high. I'm just not going to insert a hard dependency on an external service like this -- and that's putting aside the whole "could disappear / raise prices / enshittify at any time" aspect of relying on a cloud provider.

austinbaggio · 2025-12-30T02:18:12 1767061092

Yeah I get the dependency concern, and also I think about the trust and pricing challenge a lot. I might be getting ahead of my skis here, but living in a future world, assuming there is a local service, what would you want to see with a context management service for your team to actually use it? Or even better - pay for it?

qudat · 2025-12-29T23:28:20 1767050900

Have you tried https://github.com/steveyegge/beads

rahimnathwani · 2025-12-30T21:01:01 1767128461

Beads is awesome. I've been using it with a greenfield React Native hobby project. I did some work up front on the spec (with help from AI), and started the repo from a boilerplate, but after that every single bead (epic, ticket) and every single line of code has been written by AI (using a mix of claude, codex, cursor-agent/composer-1).

The app works. When I feel like working on it, I just open a CLI coding agent and say 'start working'. Then every so often I say 'commit and push' or 'find opportunities to improve the code base by refactoring, and create an issue for each opportunity'.

(I followed the instructions to add the boilerplate instructions for both bd and bv, to AGENTS.md)

jswny · 2025-12-30T01:16:19 1767057379

Can you give an example of how beads would be used by Claude to do something it otherwise couldn’t? I can’t quite tell what it is useful for

frankc · 2025-12-30T04:16:11 1767068171

Personally, I have been using beads for a few days on a couple of projects. I also like https://github.com/Dicklesworthstone/beads_viewer which is a nice tui for beads (with some additional workflow i haven't tried). I have found its been useful for longer, multi-session implementations. Its easier to get back into the work. I wouldn't go so far as to it couldn't do the work without it, but so far it seems smoother. These things are hard to measure. I think the it's really not that different than how an engineering team would use jira but more hierarchical, which helps preserve context, and with prebuilt instructions for how the agent should use it.

zyan1de · 2025-12-29T23:32:54 1767051174

oh yeah beads is awesome! I'd say this is a bit more general purpose rn especially what is in the skill!

PrayagS · 2025-12-30T12:07:01 1767096421

+1 to beads. Works great

gaigalas · 2025-12-30T02:25:27 1767061527

I like the fact that it forgets.

Each time an LLM looks at my project, it's like a newcomer has arrived. If it keeps repeating mistakes, it's because my project sucks.

It's an unique opportunity. You can have lots of repeated feedback from "infinite newcomers" to a project, each of their failures an opportunity to make things clearer. Better docs (for humans, no machine-specific hacks), better conventions, better examples, more intuitive code.

That, in my opinion, is how markdown (for machines only and not humans) will fall. There will be a breed of projects that thrives with minimal machine-specific context.

For example, if my project uses MIDI, I'm much better doing some specialized tools and examples that introduce MIDI to newcomers (machines and humans alike) than writing extensive "skill documents" that explain what MIDI is and how it works.

Think like a human do. Do you prefer being introduced to a codebase by reading lots of verbose docs or having some ready-to-run examples that can get you going right away? We humans also forget, or ignore, or keep redundant context sources away (for a good reason).

JoshGlazebrook · 2025-12-29T23:36:02 1767051362

Is anyone else just completely overwhelmed with the number of things you _need_ for claude code? Agents, sub agents, skills, claud.md, agents.md, rules, hooks, etc.

We use Cursor where I work and I find it a good medium for still being in control and knowing what is happening with all of the changes being reviewed in an IDE. Claude feels more like a black box, and one with so many options that it's just overwhelming, yet I continue to try and figure out the best way to use it for my personal projects.

Claude code suffers from initial decision fatigue in my opinion.

levocardia · 2025-12-30T02:57:18 1767063438

I just take a grug brain approach. I do touch CLAUDE.md and then just explain how the code/files/project spec work, like I'm writing a slack message or email to a really smart colleague, and then let it rip, always using biggest model with thinking on. If something consistently goes wrong I add more to CLAUDE.md or even better, have Claude Code just update CLAUDE.md itself with the new issue explained. I'm probably 3 months behind what you could get with absolute SOTA practices but it still works so well that I'm amazed and amused on a daily, if not hourly, basis.

dimitri-vs · 2025-12-30T00:08:22 1767053302

I'm in Claude Code 30+ hr/wk and always have a at least three tabs of CC agents open in my terminal.

Agree with the other comments: pretty much running vanilla everything and only the Playwright MCP (IMO way better than the native chrome integration) and ccstatusline (for fun). Subagents can be as simple as saying "do X task(s) with subagent(s)". Skills are just self @-ing markdown files.

Two of the most important things are 1) maintaining a short (<250 lines) CLAUDE.md and 2) having a /scratch directory where the agent can write one-off scripts to do whatever it needs to.

jswny · 2025-12-30T01:11:57 1767057117

I also specifically instruct Claude how to use a globally git ignored scratch folder “tmp” in each repo. Curious what your approach is

austinbaggio · 2025-12-30T02:09:18 1767060558

You store your project context in an ignored tmp folder? Share more plz - what does it look like? What do you store?

jswny · 2025-12-30T02:40:33 1767062433

Not memory, I just instruct it to freely experiment with temporary scripts and artifacts in a specific folder.

This helps it organize temporary things it does like debugging scripts and lets it (or me) reference/build on them later, without filling the context window. Nothing fancy, just a bit of organization that collects in a repo (Git ignored)

brigandish · 2025-12-30T00:34:42 1767054882

How can you - or any human - review that much code?

Normal_gaussian · 2025-12-30T01:54:02 1767059642

When I'm coding I have about 6 instances of VSCode on the go at once; each with their own worktree and the terminal is a dangerous cc in docker. most of the time they are sitting waiting for me. Generally a few are doing spec work/reporting for me to understand something - sometimes with issue context; these are used to plan or redirect my attention if I might've missed something. A few will be just hacking on issues with little to no oversight - I just want it to iterate tests+code+screenshots to come up with a way to do a thing / fix a thing, I'll likely not use the code it generates directly. Then one or two are actually doing work that I'll end up PR'ing or if I'm reviewing they'll be helping me do the review - either mechanically (hey claude, give me a script to launch n instances with a configuration that would show X ... ok, launch them ... ok, change to this ... grab X from the db ... etc.) or insight based (hey claude, check issue X against code Y - does the code reflect their comments; look up the docs for A and compare to the usage in B, give me references).

I've TL'd and PM'd as well as IC'd. Now my IC work feels a lot more like a cross between being a TL and being a senior with a handful of exuberant and reasonably competent juniors. Lots of reviewing, but still having to get into the weeds quickly and then get out of their way.

everfrustrated · 2025-12-30T18:54:50 1767120890

>I've TL'd and PM'd as well as IC'd. Now my IC work feels a lot more like a cross between being a TL

Interesting... I've been in management for a few years now and recently doing some AI coding work. I've found my skills as a manager/TL are far more adaptable to getting the best out of AI agents than my skills as a coder.

austinbaggio · 2025-12-30T22:19:49 1767133189

Same, I was a very average dev coming out of CS, and a PM before this. I find that my product training has been more useful, especially with prototypes, but I do leave nearly all of the hard system, infra, and backend work to my much much more competent engineering teammates.

dimitri-vs · 2025-12-30T12:09:41 1767096581

TBH I'm not building "production grade" apps depended on by hundreds of thousands of users - our clients want to get to a live MVP as fast as possible and love the ability to iterate quickly.

That said, it's well know that Anthropic uses CC for production. You just slow things down a bit, spend more time on the spec/planning stage and manually approve each change. IMO the main hurdle to broader Claude Code adoption isn't a code quality one, it's mostly getting over the "that's not how I would have written it" mindset.

bpolly · 2025-12-30T01:53:16 1767059596

From personal experience, most of my time in Claude Code is spent experimenting, iterating, and refining approaches. The amount of code it produces as it relates to time spent working on it tends to be pretty logarithmic in practice.

blks · 2025-12-30T12:39:36 1767098376

They don’t, they just push garbage, someone else quickly looks over it (or asks another llm to review for him), and merges.

asdev · 2025-12-29T23:41:26 1767051686

you really don't need any of this crap. you just need Claude Code and CLAUDE.MD in directories where you need to direct it. complicated AI set ups are mid curve

parpfish · 2025-12-30T00:29:33 1767054573

I refuse to learn all the complicated configuration because none of it will matter when they drop the next model.

Things that need special settings now won’t in the future and vice versa.

It’s not worth investing a bunch of time into learning features and prompting tricks that will be obsoleted soon

AndyNemmity · 2025-12-30T00:45:51 1767055551

I wish that were true. Models don't feel like they've really had massive leaps.

They do get better, but not enough to change any of the configuration I have.

But you are correct, there is a real possibility that the time invested with be obsolete at some point.

For sure the work towards MCPs are basically obsolete via skills. These things happen.

parpfish · 2025-12-30T01:55:38 1767059738

It doesn’t require any major improvement to the underlying model. As long they tinker with system prompts and builtin tools/settings, the coding agent will evolve in unpredictable ways out of my control

AndyNemmity · 2025-12-30T01:59:07 1767059947

That's a rational argument. In practice, what we're actually doing for the most part is managing context, and creating programs to run parts of tasks, so really the system prompts and builtin tools and settings have very little relevance.

dnautics · 2025-12-30T01:40:16 1767058816

i don't understand this mcp/skill distinction? one of the mcps i use indexes the runtime dependency of code modules so that claude can refactor without just blindly grepping.

how would that be a "skill"? just wrap the mcp in a cli?

fwiw this may be a skill issue, pun intended, but i can't seem to get claude to trigger skills, whereas it reaches for mcps more... i wonder if im missing something. I'm plenty productive in claude though.

AndyNemmity · 2025-12-30T01:49:16 1767059356

So MCPs are a bunch of, essenntially skill type objects. But it has to tell you about all of them, and information about all of them up front.

So a Skill is just a smaller granulatrity level of that concept. It's just one of the individual things an MCP can do.

This is about context management at some level. When you need to do a single thing within that full list of potential things, you don't need the instructions about a ton of other unrelated things in the context.

So it's just not that deep. It would be having a python script or whatever that the skill calls that returns the runtime dependencies and gives them back to the LLM so they can refactor without blindly greping.

Does that make sense?

dnautics · 2025-12-30T10:09:02 1767089342

no that makes no sense. the skill doesn't do anything by itself, the mcp (can be) attached to a deterministic oracle that can return correct information.

AndyNemmity · 2025-12-30T16:51:41 1767113501

But the skill includes the scripts to do things.

So in my nano banana image generation skill, it contains a python script that does all the actual work. The skill just knows how to call the python script.

We're attaching tools to the md files. This is at the granular level of how to hammer a nail, how to use a screw driver, etc. And then the agent, the handyman, has his tool box of skills to call depending on what he needs.

dnautics · 2025-12-31T05:54:07 1767160447

lets say i'm in erlang. you gonna include a script to unpack erlang bytecode across all active modules and look through them for a function call? oorrr... have that code running on localhost:4000 so that its a single invocation away, versus having the llm copypasta the entire script you provided and pray for the best?

AndyNemmity · 2025-12-31T17:52:38 1767203558

The LLM doesn't copy the script, it runs it.

But for sure, there are places it makes sense, and there are places it doesn't. I'm arguing to maximully use it for places that make sense.

People are not doing this. They are leaving the LLM to everything. I am arguing it is better to move everything possible into tools that you can, and have the LLM focus only on the bits that a program doesn't make sense for.

austinbaggio · 2025-12-30T02:04:49 1767060289

In our experience, a lot of it is feel and dev preference. After talking to quite a few developers, we've found the skill was the easiest to get started with, but we also have a CLI tool and an MCP server too. You can check out the docs if you'd prefer to try those - feedback welcome: https://www.ensue-network.ai/docs#cli-tool

dnautics · 2025-12-30T10:13:46 1767089626

yeah but a skill without the mcp server is just going to be super inefficient at certain things.

again going to my example, a skill to do a dependency graph would have to do a complex search. and in some languages the dependency might be hidden by macros/reflection etc which would obscure a result obtained by grep

how would you do this with a skill, which is just a text file nudging the llm whereas the MCP's server goes out and does things.

AndyNemmity · 2025-12-30T16:52:46 1767113566

A skill is not just a text file nudging the llm. You group scripts and programming to the skill, and the skill calls it.

dnautics · 2025-12-31T05:55:32 1767160532

that seems token inefficient. why have the llm do a full round trip. load the skill which contains the potentially hundreds of lines code then copy and paste the code back into the compiler when it could just run it?

not that i care too too much about small amounts of tokens but depleting your context rapidly seems bad. what is the positive tradeoff here?

AndyNemmity · 2025-12-31T17:54:17 1767203657

I don't understand. The Skill runs the tools. In the cases there are problems where you can have programs replace the LLM, I think we should maximully do that.

That uses less tokens. The LLM is just calling the script, and getting the response, and then using that to continue to reason.

So I'm not exactly following.

dnautics · 2026-01-03T02:18:27 1767406707

what you are proposing is functionally equivalent to "wrapping an mcp in a cli" which is what I mentioned in my root comment.

wouldbecouldbe · 2025-12-30T00:06:47 1767053207

It seems to mostly ignore Claude.md

songodongo · 2025-12-30T01:26:25 1767057985

If you can test how often it is being used by having a line in there saying something like “You must start every non-code response with ‘Woohoo!’”

csar · 2025-12-30T03:30:23 1767065423

It’s told to only use it if relevant because most people write bad ones. Someone should write a tool to assess CLAUDE.md quality.

AndyNemmity · 2025-12-30T00:10:33 1767053433

It does, Claude.md is the least effective way to communicate to it.

It's always interesting reading other people's approaches, because I just find them all so very different than my experience.

I need Agents, and Skills to perform well.

_the_inflator · 2025-12-30T00:15:43 1767053743

I like the finetuning aspect to it quite a lot. It makes sense to me. What I achieved now is a very streamlined process of autonomous work of an agent, which can more and more often be simply managed than controlled on a code review level basis for everything.

I agree that this level of finetuning feels overwhelming and might let yourself doubting whether you do utilize Claude to its optimum and the beauty is, that finetunging and macro usage don't interfere, when you stay in your lane.

For example I now don't use the planing agent anymore instead incorporated this process into the normal agents much to the project's advantage. Consistency is key. Anthropic did the right thing.

Codex is quite a different beast and comes from the opposite direction so to say.

I use both, Codex and Claude Opus especially, in my daily work and found them complementary not mutual exclusive. It is like two different evangelists who are on par exercising with different tools to achieve a goal, that both share.

AndyNemmity · 2025-12-30T01:52:15 1767059535

Yeah, at a certainly level, it's just a ton of fun to do. I think that's why so many of us are playing with it.

It's also deeply interesting because it's essentially unsolved space. It's the same excitement as the beginning of the internet.

None of us know what the answers will be.

wouldbecouldbe · 2025-12-30T00:06:08 1767053168

All I use is curse words and it does a damn great job most of the time

lobito25 · 2025-12-30T00:43:09 1767055389

Same here :)))), he's really good at understanding when you're pissed off.

nineteen999 · 2025-12-30T02:26:52 1767061612

I thought I was the only one.

anonzzzies · 2025-12-30T00:30:48 1767054648

Yep, that usually works best.

eterm · 2025-12-29T23:44:24 1767051864

This isn't necessary. Claude will read CLAUDE.md from both:

  1. Current directory ./CLAUDE.md
  2. User directory ~/.claude/CLAUDE.md

I stick general preferences in what it calls "user memory" and stick project specific preferences in the working directory.

austinbaggio · 2025-12-29T23:56:46 1767052606

It feels like Claude is taking more of the Android approach of a less opinionated, but more open stack, so people are bending it to the shape they want to match their workflow. I think of the amnesia problem as pretty agent-agnostic, though, knowing what happens while you're delivering product is more of an agent execution layer problem than a tool problem, and it gets bigger when you have swarms coordinating - Jaya wrote a pretty good article about this https://x.com/AustinBaggio/status/2004599657520123933?s=20

AndyNemmity · 2025-12-30T00:09:16 1767053356

I'm the opposite, I find it straight forward to use all these things, and am surprised people aren't getting it.

I've been trying to write blogs explaining it recently, but I don't think I'm very good at making it sound interesting to people.

What can I explain that you would be interested in?

Here was my latest attempt today.

https://vexjoy.com/posts/everything-that-can-be-deterministi...

majormajor · 2025-12-30T00:13:44 1767053624

You say "My Claude Code Setup" but where is the actual setup there? I generally agree with everything about how LLMs should be called you say, but I don't see any concrete steps of changing Claude Code's settings in there? Where are the "35 agents. 68 skills. 234MB of context."? Is the implementation of the "Layer 4" programs intended to be left to the reader? That's hardly approachable.

AndyNemmity · 2025-12-30T00:16:10 1767053770

I got similar feedback with my first blog post on my do router - https://vexjoy.com/posts/the-do-router/

Here is what I don't get. it's trivial to do this. Mine is of course customized to me and what I do.

The idea is to communicate the ideas, so you can use them in your own setup.

It's trivial to put for example, my do router blog post in claude code and generate one customized for you.

So what does it matter to see my exact version?

These are the type of things I don't get. If I give you my details, it's less approachable for sure.

The most approachable thing I could do would be to release individual skills.

Like I have skills for generating images with google nano banana. That would be approachable and easy.

But it doesn't communicate the why. I'm trying to communicate the why.

majormajor · 2025-12-30T00:31:24 1767054684

I just don't have much faith in "if you're doing it right the results will be magically better than what you get otherwise" anymore. Any single person saying "the problems you run into with using LLMs will be solved if you do it my way" has to really wow me if they want me to put in effort on their tips. I generally agree with your why of why you set up like that. I'm skeptical that it will get over the hump of where I still run into issues.

When you've tried 10 ways of doing it but they all end up getting into a "feed the error back into the LLM and see what it suggests next" you aren't that motivated to put that much effort into trying out an 11th.

The current state of things is extremely useful for a lot of things already.

AndyNemmity · 2025-12-30T00:37:29 1767055049

That's completely fair, I also don't have much faith in that anymore. Very often, the people who make those claims have the most basic implementation that barely is one.

I'm not sure if the problems you run into with using LLMs will be solved if you do it my way. My problems are solved doing it my way. If I heard more about your problems, I would have a specific answer to them.

These are the solutions to where I have run into issues.

For sure, but my solutions are not feed the error back into the LLM. My solutions are varied, but as the blog shows, they are move as much as possible into scripts, and deterministic solutions, and keep the LLM to the smallest possible scope.

The current state of things is extremely useful for a subset of things. That subset of things feels small to me. But it may be every thing a certain person wants to do exists in that subset of things.

It just depends. We're all doing radically different things, and trying very different things.

I certainly understand and appreciate your perspective.

majormajor · 2025-12-30T02:55:04 1767063304

That makes sense.

My basic problem is: "first-run" LLM agent output frequently does one or more of the following: fails to compile/run, fails existing test coverage, or fails manual verification. The first two steps have been pretty well automated by agents: inspect output, try to fix, re-run. IME this works really well for things like Python, less-well for things like certain Rust edge cases around lifetimes and such, or goroutine coordination, which require a different sort of reasoning than "typical" procedural programming.

But let's assume that the agents get even better at figuring out the deal with the more specialized languages/features and are able to iterate w/o interaction to fix things.

If the first-pass output still has issues, I still have concerns. They aren't "I'm not going to use these tools" concerns, because I also sometimes write bugs, and they can write the vast majority of code faster than I can.

But they are "I'm not gonna vibe-code my day job" concerns because the existence of trivially-catchable issues suggests that there's likely harder-to-catch issues that will need manual review to make sure (a) test coverage is sufficient, (b) the mental model being implemented is correct, (c) the outside world is interacted with correctly. And I still find bugs in these areas that I have to fix manually.

This all adds up to "these tools save me 20-30% of my time" (the first-draft coding) vs "these agents save me 90% of my time."

So I'm kinda at a plateau for a few months where it'll be hard to convince me to try new things to try to close that 20-30% -> 90% number.

AndyNemmity · 2025-12-30T03:53:04 1767066784

I experience the same things. What I’ve found is there is no issue I can’t solve so it doesn’t repeat.

The real issue is I don’t know the issues ahead of time. So each experience is an iteration stopping things I didn’t know would happen.

Thankfully, I’m not trying to sell anyone anything. I don’t even want people to use what I use. I only want people to understand the why of what I do, and how it adds me value.

I think it’s important to understand this thing we use as best we can.

The personal value you can get, is entirely up to your tolerance for it.

I just enjoy the process

csar · 2025-12-30T03:36:51 1767065811

For new-ish projects it should give you some crazy speed up out of the box.

For large codebases (my own has 500k lines and my company has a few tens of millions) you need something better like RPI.

If nothing else just being able to understand code questions basically instantly should give you a large speed up, even without any fancy stuff.

ok_dad · 2025-12-30T02:25:16 1767061516

Damn, it really is all just vibes eh? Everyone just vibes their way to coding these days, no proof AI is actually doing anything for you. It's basically just how someone feels now: that's reality.

In some sense, computers and digital things have now just become a part of reality, blending in by force.

AndyNemmity · 2025-12-30T02:37:47 1767062267

I mean, it’s not vibes. I make real projects, and the failures of AI doing it force me to make fixes so that it only ever fails doing that thing once. Then it no longer fails to do that thing.

But the things I am doing might not be the things you are doing.

If you want proof, I intend to release a game to the App Store and steam soon. At that point you can judge if it built a thing adequately.

ok_dad · 2025-12-30T02:50:51 1767063051

No offense intended, I don't even know you at all, but I see people claim things like you did so often these days that I begin to question reality. These claims always have some big disclaimer, as yours does. I still don't know a single personal acquaintance who has claimed even a 2x improvement on general coding efficiency, not even 1.5x in general efficiency. Some of my coworkers say AI is good for this or that, but I literally just waste my time and money when I use it, I've never gotten good results or even adequate results to continue trying. I feel like I am taking crazy pills sometimes with all of the hype!

I hope you're just one of the ones who figured it out early and all the hype isn't fake bullshit. I'd much rather be proven wrong than for humanity to have wasted all this time and resources.

AndyNemmity · 2025-12-30T03:56:22 1767066982

I think the correct approach is to be skeptical. You should push back.

I think of this stuff as trivial to understand from my point of view. I am trying to share that.

I have nothing to sell, I don’t want anyone to use my exact setup.

I just want to communicate the value as I see it, and be understood.

The vast majority of it all is complete bullshit, so of course I am not offended that I may sound like 1000 other people trying to get you to download my awesome Claude Code Plugins repo.

Except I’m not actually providing one lol

ok_dad · 2025-12-30T05:07:31 1767071251

Yea sorry if I did a bit of a rant there.

AndyNemmity · 2025-12-30T06:26:17 1767075977

Nah, you’re good. We’re all working through this craziness together

minimaxir · 2025-12-29T23:41:16 1767051676

With Opus 4.5 in Claude Code, I'm doing fine with just a (very detailed) CLAUDE.md.

austinbaggio · 2025-12-29T23:52:01 1767052321

Do you find you want to share the .md with the teams you work with? Or is it more for your solo coding?

Myrmornis · 2025-12-30T03:11:47 1767064307

Not saying you were suggesting it but people committing AGENTS.md in shared repos is pretty annoying IMO. Those things are personal.

lukev · 2025-12-30T00:00:25 1767052825

A claude.md file will give you 90% of what you need.

Consider more when you're 50+ hours in and understand what more you want.

AndyNemmity · 2025-12-30T00:11:13 1767053473

In my experience, I'm at the most where I entirely ignore Claude.md - so it's very interesting how many people have very different experiences.

austinbaggio · 2025-12-30T00:58:39 1767056319

It is overwhelming. We have support for Cursor mcp as well, but you lose a lot of the auto-magic stuff you get with the Claude Code plugin. Unfortunately, skills are pretty sticky to the Claude Code stack. It is kind of the vim of AI coding agents. . . One of the goals for this tool was to address context management in a single place. i.e instead of setting up all of the rules, claude.md, and skill.md you just semantic query a specific namespace in your knowledge base.

the docs if you are curious: https://www.ensue-network.ai/docs

pigpop · 2025-12-30T00:01:08 1767052868

You don't need all that, just have Claude write the same documentation you would (should) write for any project. I find it best to record things chronologically and then have Claude do periodic reviews of the docs and update key design documents and roadmap milestones. The best part is you get a written record of everything that you can review when you need to remember when and why something changed. They also come in handy for plan mode since they act as a guide to the existing code.

The PMs were right all along!

einsteinx2 · 2025-12-30T01:39:22 1767058762

I use both Cursor and Claude Code in VS Code at work (so I get similar control as Cursor). I don’t really use Claude Code any differently than cursor. People way over complicate it.

csar · 2025-12-30T03:28:14 1767065294

Claude Code is better out of the box, so all that other stuff is orthogonal or optional. If you eg want to give your agent access to your company’s Notion docs you need a skill.

metadat · 2025-12-29T23:53:33 1767052413

Don't forget about the co-agents.. yeah.

animitronix · 2025-12-30T00:22:06 1767054126

Nope, I spend time learning my tools.

zyan1de · 2025-12-29T22:57:48 1767049068

I mostly use it during long Claude Code research sessions so I don’t lose my place between days.

I run it in automatic mode with decent namespacing, so thoughts, notes, and whole conversations just accumulate in a structured way. As I work, it stores the session and builds small semantic, entity-based hypergraphs of what I was thinking about.

Later I’ll come back and ask things like:

what was I actually trying to fix here?

what research threads exist already?

where did my reasoning drift?

Sometimes I’ll even ask Claude to reflect on its own reasoning in a past session and point out where it was being reactive or missed connections.

d4rkp4ttern · 2025-12-30T13:37:57 1767101877

I like that it does not require following any particular "system" or discipline. But having to use a non-local/proprietary memory layer is not ideal.

My own fully-local, minimalistic take on this problem of "session continuation without compaction" is to rely on the session JSONL files directly rather than create separate "memory" artifacts, and seamlessly index them to enable fast full-text search. This is the idea behind the "aichat" command-group + plugin I just added to my claude-code-tools [1] repo. You can quit your Claude-Code/Codex-CLI session S and type

    aichat resume <id-of-session-S-you-just-quit>

It launches a TUI, offering a few ways to continue your work:

- blind trim - clones the session, truncates large tool calls/results and older assistant messages, which can clear up as much as 50% of context depending of course on what's going on; this is a quick hack to continue your work a bit longer

- smart trim - similar but uses headless agent to decide what to truncate

- rollover: the one I use most frequently; it creates a new session S1 (which can optionally be a different CLI agent, allowing cross-agent work continuation), and injects back-pointers to the parent session JSONL file of S, the parent's parent , and so on (what I call session lineage) , into the first user message, and the user can then prompt the agent to use a sub-agent to extract arbitrary context from the ancestor sessions to continue the work.

[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...

BonoboIO · 2025-12-30T04:02:41 1767067361

Different approach: I continuously refine my global CLAUDE.md (~/.claude/CLAUDE.md) instead of external memory systems.

I work primarily in Python and maintain extensive coding conventions there - patterns allowed/forbidden, preferred libs, error handling, etc. Custom slash commands like `/use-recommended-python` (loads my curated libs: pendulum over datetime, httpx over requests) and `/find-reinvented-the-wheel` to catch when Claude ignored existing utilities.

My use case: multiple smaller Python projects (similar to steipete's workflow https://github.com/steipete), so cross-project consistency matters more than single-codebase context.

Yes, ~15k tokens for CLAUDE.md + rules. I sacrifice context for consistency. Worth it.

Also baked in my dev philosophy: Carmack-style - make it work first, then fast. Otherwise Claude over-optimizes prematurely.

These memory abstractions are too complicated for me and too inconsistent in practice. I'd rather maintain a living document I control and constantly refine.

itissid · 2025-12-30T04:06:39 1767067599

The general process feels very much like having kids over for a birthday party. Except you have to get them all to play nice and you have no idea what this other kid was conditioned on by their parents. Generally it would all work fine, all the kids know how the party progresses and what their roles are — if any.

But imagine how hard it would be if these kids had short term memory only and they would not know what to focus on except what you tell them to. You literally have to tell them "Here is A-Z pay attention to 'X' only and go do your thing". Add in other managers for this party like a caterer, clowns, your spouse and they also have to tell them that and remember, communicate what other managers have done. No one has solved for this, really.

This is what it felt like in 2025 to code with LLMs on non trivial projects, with some what of an improvement as the year went by. But I am not sure much progress was made in fixing the process part of the problem.

photios · 2025-12-30T08:19:26 1767082766

Can it run with a local DB? I have zero interest in another monthly subscription pretending to be "an open source tool".

scubbo · 2025-12-30T01:50:56 1767059456

I've been tinkering with building something similar for myself - though for a generic chatbot, rather than for Claude (not every task is coding, and I'd like to keep !). From other comments (e.g. https://news.ycombinator.com/item?id=46428368, https://news.ycombinator.com/item?id=46427950) suggest that many others are already ahead of me. Any recs for tools, libraries, or approaches that I should learn from or adopt? In particular, I've found that - no matter how direct and clear the system prompt is - models have a tendency to respond verbally as if they've made a tool-call recording some gained-knowledge ("thanks! I'll remember that"), but to not actually return the JSON required to trigger the call by the tool.

austinbaggio · 2025-12-30T02:02:08 1767060128

Since you've already thought about this problem, I'd love to hear your feedback after giving this skill a try. It should speed up at least your basic need of having to trigger the LLM to store the memory. One of our colleagues has found success asking at the end of a research session what he missed, how he could improve, etc.

ec109685 · 2025-12-30T00:59:06 1767056346

This is impressive.

Though I have found repo level claude.md that is updated everytime claude makes a mistake plus using —restore to select a previous relevant session works well.

There is no way for Anthropic to optimize Claude code or the underlying models for these custom setups. So it’s probably better to stick with the patterns Anthropic engineers use internally.

austinbaggio · 2025-12-30T02:31:08 1767061868

If you give it a try, I think that use case should work, but if not, I would be grateful if you told us what broke.

And also - I genuinely worry about vendor lock-in, do you?

austinbaggio · 2025-12-30T01:19:01 1767057541

Do you ever switch tools? I don't love the idea of my context being hostage of whatever LLM I choose first.

edmundsparrow · 2026-01-01T11:02:07 1767265327

I've been manually copying responses between chats when I hit token limits, and I'm wondering - have you considered a multi-AI consensus approach instead of persistent memory?

The idea: multiple AIs (Claude, GPT, Gemini, Grok) brainstorm simultaneously and produce one agreed response. This might solve the context problem more elegantly because:

- No token limit anxiety - you get comprehensive answers upfront - Better quality through AI cross-validation - The consensus answer naturally becomes your context - Simpler to implement - just parallel API calls vs memory tree management

Just curious if you've explored this direction or if there's a reason the memory persistence approach works better for your use case?

austinbaggio · 2025-12-30T02:10:49 1767060649

Thanks everyone for the comments, really, I wasn't expecting this.

Quite a few of you have mentioned that you store a lot of your working context across sessions in some md file - what are you actually storing? What data do you actually go back to and refer to as you're building?

EMM_386 · 2025-12-30T03:43:11 1767066191

> in some md file

1a directly from Anthropic on agentic coding and Claude Code best practices.

"Create CLAUDE.md files"

https://www.anthropic.com/engineering/claude-code-best-pract...

It works great. You can put anything you want in there. Coding style, architecture guidelines, project explanation.

Anything the agent needs to know to work properly with your code base. Similar to an onboarding document.

Tools (Claude Code CLI, extensions) will pick them up hierarchically too if you want to get more specific about one subdirectory in your project.

AGENTS.md is similar for other AI agents (OpenAI Codex is one). It doesn't even have to be those - you can just @ the filename at the start of the chat and that information goes in the context.

The naming scheme just allows for it to be automatic.

CPLX · 2025-12-29T22:59:19 1767049159

I absolutely love this concept! It's like the thing that I've been looking for my whole life. Well, at least since I've been using Claude Code, which is this year.

I'm sold.

With that said, I can't think of a way that this would work. How does this work? I took a very quick glance, and it's not obvious at first glance.

The whole problem is, the AI is short on context, it has limited memory. Of course, you can store lots of memory elsewhere, but how do you solve the problem of having the AI not know what's in the memory as it goes from step to step? How does it sort of find the relevant memory at the time that that relevance is most active?

Could you just walk through the sort of conceptual mechanism of action of this thing?

austinbaggio · 2025-12-29T23:15:08 1767050108

Appreciate it - yeah, you're right, models don't work well when you just give it a giant dump of memory. We store memories in a small DB - think key/value pair with embeddings Every time you ask Claude something, the skill:

1. Embeds the current request.

2. Runs a semantic + timestamp-weighted search over your past sessions. Returns only the top N items that look relevant to this request.

3. Those get injected into the prompt as context (like extra system/user messages), so Claude sees just enough to stay oriented without blowing context limits.

Think of it like: Attention over your historical work, more so than brute force recall. Context on demand basically giving you an infinite context window. Bookmark + semantic grep + temporal rank. It doesn’t “know everything all the time.” It just knows how to ask its own past: “What from memory might matter for this?”

When you try it, I’d love to hear where the mechanism breaks for you.

skuenzli · 2025-12-29T23:14:34 1767050074

It looks to me like the skill sets up a connection to their MCP server at api.ensue-network.ai during Claude session start via https://github.com/mutable-state-inc/ensue-skill/blob/main/s...

Then Claude uses the MCP tools according to the SKILL definition: https://github.com/mutable-state-inc/ensue-skill/blob/main/s...

zyan1de · 2025-12-29T23:08:44 1767049724

yeah so you can run it in automatic mode, or read only mode. In automatic mode it hooks onto the conversation and tool calls so you get the entire conversation stored. If you dont want to get super deep, then read only is safe and only stores what you ask. You could ask it things like "why is my reasoning dumb" by recalling passed conversations, or even give it the claude tool call sequence and ask "how can claude be smarter about next time".

I think of it like a file tree with proper namespacing and keep abstract concepts in separate directories. so like my food preferences will be in like /preferences/sandos. or you can even do things like /system-design preferences and then load them into a relevant conversation for next time.

DANmode · 2025-12-29T23:32:09 1767051129

Total speculation:

Text Index of past conversations, using prompt-like summaries.

altmanaltman · 2025-12-29T22:58:44 1767049124

Thank you for specifying it wasn't magic or AGI.

apublicfrog · 2025-12-29T23:54:27 1767052467

> Not magic. Not AGI. Just state.

Very clearly AI written

fragmede · 2025-12-30T00:04:06 1767053046

You're absolutely right!

amannm · 2025-12-30T00:25:10 1767054310

There's a lot of people interested in forming some sort of memory layer around vendored LLM services. I don't think they realize how much impact a single error that disappears from your immediate attention can have on downstream performance. Now think of the accrual of those errors over time and your lack of ability to discern if it was service degradation or a bad prompt or a bad AGENTS.md OR now this "long term memory" or whatever. If this sort of feature will ever be viable, the service providers will offer the best solution only behind their API, optimized for their models and their infrastructure.

AndyNemmity · 2025-12-30T00:07:49 1767053269

I don't understand the use case. I think if you don't use agents, and skills currently effectively, then perhaps this is useful.

If you're using them though, we no longer have the problem of Claude forgetting things.

christinetyip · 2025-12-30T13:41:38 1767102098

That makes sense, and I agree that for a single agent using skills well, Claude’s native context handling has gotten much better.

This wasn't mentioned in the first post, but the use case we’re focused on isn’t really “Claude forgetting,” but context living beyond a single agent or tool. Even if Claude remembers well within a session, that context is still owned by that agent instance.

The friction shows up when you switch tools or models (Claude → Codex / Cursor / etc.), run multiple agents in parallel, or want context created in one place to be reused elsewhere without re-establishing it.

In those cases, the problem isn’t forgetting so much as fragmentation. If someone is happy with one agent and one tool, there are probably a bunch of memory solutions to choose from. The value of this external memory network that you can plug into any model or agent shows up once context needs to move across tools and people.

lkbm · 2025-12-30T01:40:08 1767058808

I'm curious how those replace this? I've barely used either, and would love to hear more.

AndyNemmity · 2025-12-30T01:56:57 1767059817

Okay, Claude.md is an md file with instructions.

Agents are an md file with instructions.

Skills are an md file with instructions.

Commands are.. you get the point.

We're just dealing with instructions. Claude.md is handled by Claude Code. It is forgotten almost entirely often when the context fills.

Okay, what is an agent? An agent is basically a Claude.md file, but you make it extremely granular. So it only has instructions of let's say, Typescript.

We're all just doing context management here. We're trying to make sure our instructions that matter stay.

To do that, we have to remove all other instructions from the picture.

When you're doing typescript, you only know type script things.

Okay, what's a skill? A skill is doing a single thing with type script. Why? So that the context is even smaller.

Instead of the agent having every single instruction you need about typescript, you put them in skills so they only get put into context when that thing is needed.

But skills are also where you connect deterministic programs. For example, I have a skill for creating images in nano banana.

So when the Typescript Agent needs to create an image, it calls the skill, that calls the python script, to create images in nano banana.

We're managing all the context to only be available when it's needed, keeping all other instructions out.

Does that help?

lkbm · 2025-12-31T18:54:32 1767207272

A little. Thanks for the detailed reply. But I wouldn't expect a smaller, more targeted Claude.md would make a big difference. My impression is that we're filling up the context less with Claude.md than with the session work: back and forth about the specific task, Claude's chain-of-thought as it does that, the relevant sections of code, etc. That's what this is trying to compact and reference, iiuc.

What I'd think would help, would be doing things in smaller chunks: an agent to do this small subtask, another to do that small subtask, and the parent task context only grows with the sub-agents reporting back, not with their chain-of-thought.

Could be I should have a much bigger, more detailed Claude.md. Mine tend to be small project overviews and a list of TODOs, not that different from README.md: https://github.com/lkbm/sideways_math/blob/main/CLAUDE.md

AndyNemmity · 2026-01-01T00:58:27 1767229107

It does make a big difference in the sense that the less you tell it about things that aren't relevant to it's task, the better it performs.

This has been found in all sorts of variations, and is accepted. It's not just my word, it's the standard understanding. But also, it's true in my experience.

You also limit the session work when you are offloading to agents, which are calling skills. So it does do that as well.

But that's what we're talking about, having agents do things in smaller chunks. Maybe I didn't explain that clearly enough?

It's just a complex topic. It's hard to cover everything. But yet, that's part of the idea. We're making context very specific to the task, and cutting things up into smaller chunks with dedicated context windows only for that task.

I almost completely ignore Claude.md - there are very few rules I want every subtask to know about.

the main system is for cordination, of the many agents with their own dedicated Claude.mds which call their own skills with their own dedicated instructions.

It's like Russian dolls. Now, I do use much, much MUCH larger Agent mds than anyone else. I've been doing this for 10 months, and I believe it's the correct way to do it.

I intend to write a blog post on it, it's a topic in it's own part.

Even your small Claude.md - I would have in several files. The Typescript agent is the only one that needs to know about the Typescript details. This is the kind of thing I mean.

linsomniac · 2025-12-30T01:29:44 1767058184

The past few weeks I've been experimenting with using less context and less memory and it's been going really well. Where before I'd try to do a bunch of fairly related things in a single session, experimenting with compacting more or less frequently, now I'm clearing my context or exiting and restarting claude and codex. It seems to help it focus on the task at hand, hasn't tended to go off into the weeds as much, and my token costs have dropped way down.

Combined with a good AGENTS.md, it seems to be working really well.

einsteinx2 · 2025-12-30T01:37:40 1767058660

That’s been my experience as well. I find I usually get better output if I create a new conversation for each thing I need. I’ve found that the only times it’s better to continue an existing conversation is if I want to have it make small improvements or changes to something it just wrote, as it tends to do better with the previous context still there. But even that only goes so far, then the scale tips and it works much better with a clean slate. I especially don’t want totally unrelated conversations polluting the context which is why I have all memory features turned off in all the web chat UI’s for the models I use.

ChicagoDave · 2025-12-30T03:21:17 1767064877

I use 92% of context, have Claude write a “work summary” to a context folder, commit, push, quit, restart, repeat.

I’m never stopped and Claude always remembers what we’re doing.

This pattern has been highly productive for 8 months.

csar · 2025-12-30T03:26:31 1767065191

You should never let context get that high unless you’re doing really basic things. Somewhere 40-60% is generally the time to start thinking about exits for tougher tasks. Get out in the 60s.

ChicagoDave · 2025-12-30T10:01:32 1767088892

I keep work chunks small, which is why I can hit 90%. If I do I have a large task like a big planning effort, yes I’d start fresh.

christinetyip · 2025-12-30T09:57:30 1767088650

A lot of the discussion here is about memory inside a single tool, which makes sense.

I’m curious how people think about portability: e.g. letting Claude Code retrieve context that was created while using Codex, Manus, or Cursor, or sharing specific parts of that context with other people or agents.

At that point, log parsing and summaries become per-tool views of state rather than shared state. Do people think a shared external memory layer is overkill here, or a necessary step once you have multiple agents/tools in play?

d4rkp4ttern · 2025-12-30T14:23:06 1767104586

The aichat tool I mentioned in another comment [1] enables exactly this type of cross-agent work-continuation, specifically between Claude-Code and Codex-CLI.

[1] https://news.ycombinator.com/item?id=46433213

realitydrift · 2025-12-30T14:28:30 1767104910

Most memory tools are really about coordination, not recall. The problem shows up when context splinters across sessions, tools, and parallel agents and there’s no longer a clear source of truth. Retrieval only helps if you can see what was pulled in and why, otherwise hidden context quietly warps the work. The only metric that matters is whether you spend less time re-explaining decisions and more time continuing from where you actually left off.

minikomi · 2025-12-30T09:10:53 1767085853

I use gptel[0] with my denote[1] notes, and a tool that can search/retrieve tags/grep/create notes (in a specific sub folder). It's been good enough as a memory for me.

0: https://github.com/karthink/gptel

1: https://protesilaos.com/emacs/denote

huali · 2025-12-30T08:36:54 1767083814

I've built a lightweight Memory MCP service to efficiently store conversation memories. It only implements essential *CRUD* (Create, Read, Update, Delete) methods, minimizing token usage.

Deploy the service on your cloud server or your local computer, then add the streamable MCP and skill to Claude Code.

To activate in a new conversation, simply reference the skill first: `@~/.claude/skills/mem/SKILL.md`.

If you like this project, please give it a star on GitHub!

rmonvfer · 2025-12-30T06:16:33 1767075393

Looks cool but as others have said, it’s really hard to just try all similar projects because all of them promise the same thing but I haven’t seen any of them provide any benchmarks.

Claude Code keeps all the conversation logs stored on-disk right? Why not parse them asynchronously and then use hooks to enrich the context as the conversation goes? (I mean in the most broad and generic way, I guess we’d have to embed them, do some RAG… the whole thing)

christinetyip · 2025-12-30T09:46:13 1767087973

Yep, parsing logs + async RAG works fine if you’re staying inside a single tool.

The issue we ran into when building agent systems was portability. Once you want multiple agents or models to share the same evolving context, each tool reconstructing its own memory from transcripts stops scaling.

We’re less focused on “making agents smarter” and more on avoiding fragmentation when context needs to move across agents, tools, or people — for example, using context created in Claude from Codex, or sharing specific parts of that context with a friend or a team.

That’s also why benchmarks are tricky here. The gains tend to show up as less duplication and less state drift rather than a single accuracy metric. What would constitute convincing proof in this space for you?

dmos62 · 2025-12-30T07:49:45 1767080985

I consider the "perfect fortgetfulness" of LLMs a great feature, because I can then precisely select what the context is for a given task. Context is additive, so once something's in it, it's doing something: most I could do is try to counteract it, which is like playing jailbreak.

Then again, this might be just me. When there's a task to be done, even without an LLM my thought process is about selecting the relevant parts of my context for solving it. What is relevant? What starting point has the best odds of being good? That translates naturally to tasking an LLM.

Let's say I have a spec I'm working on. It's based off of a requirements document. If I want to think about the spec in isolation (let's say I want to ask the LLM what requirements are actually being fulfilled by the spec), I can just pass the spec, without passing the requirements. Then I'll compare the response against the actual requirements.

At the end of the day, I guess I hate the automagicness of a silent context injection. Like I said, it also negates the perfect forgetfulness of LLMs.

devhouse · 2025-12-30T01:57:04 1767059824

Feels like this is solving a problem that /compact should solve but doesn't. The fact that post-compaction Claude 'feels dumber' suggests the summarization is too aggressive? Would be interesting if Anthropic exposed more control over what gets preserved vs. compressed ... or let users provide their own summary template.

johann8384 · 2025-12-31T03:26:49 1767151609

I have markdown files in ~/.claude/guides that I refer to, my subagents have instructions about, and my claude.md in several projects reference them when relevant.

This seems like it wouldn't accomplish much more than those methods. It knows my stack preferences, what I want commit messages to look like, etc.

kingkongjaffa · 2025-12-30T09:52:40 1767088360

I actively don't want to use LLMs this way.

I use things like claude projects on the web app and skills and stuff, and claude code heavily.

I want to manually curate the context, adding memory is a anti pattern for this, I don't want the LLM grabbing tokens from memory that may or may not be relevant, and most likely will be stale.

AmiteK · 2025-12-30T05:22:59 1767072179

One thing that seems under-discussed is what kind of state is worth persisting. Raw chat logs are cheap; distilled decisions, constraints, and preferences are harder but much more valuable.

Even if most approaches fail, exploring that boundary feels useful - especially if the system is transparent about what it stores and why.

deadeye · 2025-12-30T13:57:30 1767103050

Is this for people who haven't read the docs on how to instruct the agent on common setups and preferences?

jMyles · 2025-12-30T01:40:26 1767058826

We built one too, with a web frontend and a 'spy' viewer in case your team wants to watch your interactions. Also has secret redaction:

https://github.com/jMyles/memory-lane

robertwt7 · 2025-12-30T00:21:30 1767054090

Congrats for this! how does this differs from claude-mem? I've been using claude-mem for a while now

https://github.com/thedotmack/claude-mem

bgilly · 2025-12-30T07:26:47 1767079607

Thanks for mentioning this. I installed claude-mem today and it’s already come in handy. Pretty neat how it can go get individual prompts and replies from previous sessions without consuming a lot of tokens. And I finally have some visibility into what my subagents are doing thanks for the real time feed web dashboard.

EMM_386 · 2025-12-30T00:33:14 1767054794

Just put a claude.md file in your directory. If you want more details about a subdirectory put one in there too.

Claude itself can just update the claude.md file with whatever you might have forgot to put in there.

You can stick it in git and it lives with the project.

sabareesh · 2025-12-30T00:10:47 1767053447

Non starter for us, we cant ship propriety data to a third party servers.

austinbaggio · 2025-12-30T01:20:12 1767057612

I assume this is with work? And also assume you do send data, you just need some service agreement or something like with AWS or Microsoft for GH?