Willing to let them loose is the more salient point. If you let your agents loose on your entire body of output and tools at work, then you'll build that knowledge up pretty quickly.
Tall ask right now, with privacy and agency (no pun intended) concerns
I’d bet that an agent would never be able to act the same on an email as I would. It just lacks my world view. This begs the question, would it really make sense for it to write emails on my behalf? It will certainly “close the loop” one way or the other, but I doubt I would like the outcome.
On the clawdbot discord, someone wrote today that, overnight, Claude sent in all iMessage threads from 2019 the message that it will rather ignore such outdated threads.
Same, I was a very average dev coming out of CS, and a PM before this. I find that my product training has been more useful, especially with prototypes, but I do leave nearly all of the hard system, infra, and backend work to my much much more competent engineering teammates.
| need a solution for context decay and relevant curation — with benchmarks that prove it is also more valuable than constant rediscovery (for quality and cost).
I agree. We are looking at some metr benchmarks, not expecting a simple answer to this, but do you have any in mind you find compelling?
Not really. But, You can go viral again with a "Coding Agents with memory build better software using less tokens" showcasing how you benchmarked a "twitter rebuild" -
1. Setup Claude Code to build some layers of the stack
2. Setup Codex to build others.
In one instance equip them both with your product. Maybe bake in some tribal knowledge.
In another instance let them work raw.
In both instances, capture:
- Time to completion
- Tokens spent
- Ability to meet original spec
- Subjective quality
- Number of errors and categorize between the layers, to state something like "raw-claude's backend kept failing with raw-codex's frontend" etc
I imagine this benchmark working well in your favor.
Nah, just another one of those spam bots on all the small-business, finance and tradies sub-reddits: "Hey fellow users, have you ever suffered from <use case>? What is the problem you want solved? Tell me your honest opinions below!"
I can't find it, but there's a good graph that shows Google search decline in share to GPT, but it excludes Gemini. With Gemini, it stays relatively on par. That's pretty much the answer with where one goes. LLMs are higher intent than search could ever be, and they are closer to you selling to yourself than a store selling to you since they have all of your user context
Thanks everyone for the comments, really, I wasn't expecting this.
Quite a few of you have mentioned that you store a lot of your working context across sessions in some md file - what are you actually storing? What data do you actually go back to and refer to as you're building?
It works great. You can put anything you want in there. Coding style, architecture guidelines, project explanation.
Anything the agent needs to know to work properly with your code base. Similar to an onboarding document.
Tools (Claude Code CLI, extensions) will pick them up hierarchically too if you want to get more specific about one subdirectory in your project.
AGENTS.md is similar for other AI agents (OpenAI Codex is one). It doesn't even have to be those - you can just @ the filename at the start of the chat and that information goes in the context.
The naming scheme just allows for it to be automatic.
Not memory, I just instruct it to freely experiment with temporary scripts and artifacts in a specific folder.
This helps it organize temporary things it does like debugging scripts and lets it (or me) reference/build on them later, without filling the context window. Nothing fancy, just a bit of organization that collects in a repo (Git ignored)
It sounds like you've used quite a few. What programs are you expecting? Assuming you're talking about doing some inference on the data? Or optimizing for some RAG or something?
In our experience, a lot of it is feel and dev preference. After talking to quite a few developers, we've found the skill was the easiest to get started with, but we also have a CLI tool and an MCP server too. You can check out the docs if you'd prefer to try those - feedback welcome: https://www.ensue-network.ai/docs#cli-tool
yeah but a skill without the mcp server is just going to be super inefficient at certain things.
again going to my example, a skill to do a dependency graph would have to do a complex search. and in some languages the dependency might be hidden by macros/reflection etc which would obscure a result obtained by grep
how would you do this with a skill, which is just a text file nudging the llm whereas the MCP's server goes out and does things.
that seems token inefficient. why have the llm do a full round trip. load the skill which contains the potentially hundreds of lines code then copy and paste the code back into the compiler when it could just run it?
not that i care too too much about small amounts of tokens but depleting your context rapidly seems bad. what is the positive tradeoff here?
I don't understand. The Skill runs the tools. In the cases there are problems where you can have programs replace the LLM, I think we should maximully do that.
That uses less tokens. The LLM is just calling the script, and getting the response, and then using that to continue to reason.
Tall ask right now, with privacy and agency (no pun intended) concerns
reply