More

austinbaggio · 2026-01-06T20:55:52 1767732952

Willing to let them loose is the more salient point. If you let your agents loose on your entire body of output and tools at work, then you'll build that knowledge up pretty quickly.

Tall ask right now, with privacy and agency (no pun intended) concerns

manmal · 2026-01-06T22:49:46 1767739786

I’d bet that an agent would never be able to act the same on an email as I would. It just lacks my world view. This begs the question, would it really make sense for it to write emails on my behalf? It will certainly “close the loop” one way or the other, but I doubt I would like the outcome.

On the clawdbot discord, someone wrote today that, overnight, Claude sent in all iMessage threads from 2019 the message that it will rather ignore such outdated threads.

austinbaggio · 2025-12-30T22:19:49 1767133189

Same, I was a very average dev coming out of CS, and a PM before this. I find that my product training has been more useful, especially with prototypes, but I do leave nearly all of the hard system, infra, and backend work to my much much more competent engineering teammates.

austinbaggio · 2025-12-30T22:12:51 1767132771

| need a solution for context decay and relevant curation — with benchmarks that prove it is also more valuable than constant rediscovery (for quality and cost).

I agree. We are looking at some metr benchmarks, not expecting a simple answer to this, but do you have any in mind you find compelling?

ramoz · 2025-12-30T23:28:02 1767137282

Not really. But, You can go viral again with a "Coding Agents with memory build better software using less tokens" showcasing how you benchmarked a "twitter rebuild" -

1. Setup Claude Code to build some layers of the stack

2. Setup Codex to build others.

In one instance equip them both with your product. Maybe bake in some tribal knowledge.

In another instance let them work raw.

In both instances, capture:

     - Time to completion
     - Tokens spent
     - Ability to meet original spec
     - Subjective quality 
     - Number of errors and categorize between the layers, to state something like "raw-claude's backend kept failing with raw-codex's frontend" etc

I imagine this benchmark working well in your favor.

austinbaggio · 2025-12-30T02:59:52 1767063592

What are you both looking for? What is the problem you want solved?

ggm · 2025-12-30T04:10:51 1767067851

Is a series of postings all in the form of questions an indication somebody hooked "eliza" up as an input device?

morkalork · 2025-12-30T04:44:35 1767069875

Nah, just another one of those spam bots on all the small-business, finance and tradies sub-reddits: "Hey fellow users, have you ever suffered from <use case>? What is the problem you want solved? Tell me your honest opinions below!"

austinbaggio · 2025-12-30T02:31:08 1767061868

If you give it a try, I think that use case should work, but if not, I would be grateful if you told us what broke.

And also - I genuinely worry about vendor lock-in, do you?

austinbaggio · 2025-12-30T02:25:13 1767061513

I can't find it, but there's a good graph that shows Google search decline in share to GPT, but it excludes Gemini. With Gemini, it stays relatively on par. That's pretty much the answer with where one goes. LLMs are higher intent than search could ever be, and they are closer to you selling to yourself than a store selling to you since they have all of your user context

austinbaggio · 2025-12-30T02:10:49 1767060649

Thanks everyone for the comments, really, I wasn't expecting this.

Quite a few of you have mentioned that you store a lot of your working context across sessions in some md file - what are you actually storing? What data do you actually go back to and refer to as you're building?

EMM_386 · 2025-12-30T03:43:11 1767066191

> in some md file

1a directly from Anthropic on agentic coding and Claude Code best practices.

"Create CLAUDE.md files"

https://www.anthropic.com/engineering/claude-code-best-pract...

It works great. You can put anything you want in there. Coding style, architecture guidelines, project explanation.

Anything the agent needs to know to work properly with your code base. Similar to an onboarding document.

Tools (Claude Code CLI, extensions) will pick them up hierarchically too if you want to get more specific about one subdirectory in your project.

AGENTS.md is similar for other AI agents (OpenAI Codex is one). It doesn't even have to be those - you can just @ the filename at the start of the chat and that information goes in the context.

The naming scheme just allows for it to be automatic.

austinbaggio · 2025-12-30T02:09:18 1767060558

You store your project context in an ignored tmp folder? Share more plz - what does it look like? What do you store?

jswny · 2025-12-30T02:40:33 1767062433

Not memory, I just instruct it to freely experiment with temporary scripts and artifacts in a specific folder.

This helps it organize temporary things it does like debugging scripts and lets it (or me) reference/build on them later, without filling the context window. Nothing fancy, just a bit of organization that collects in a repo (Git ignored)

austinbaggio · 2025-12-30T02:07:51 1767060471

It sounds like you've used quite a few. What programs are you expecting? Assuming you're talking about doing some inference on the data? Or optimizing for some RAG or something?

AndyNemmity · 2025-12-30T02:09:55 1767060595

An example of a skill i gave, adding image generation to nano banana.

another is one claude code ships with, using rip grep.

Those are actual features. It's adding deterministic programs that the llm calls when it needs something.

austinbaggio · 2025-12-30T02:38:17 1767062297

Oh got it - tool use

AndyNemmity · 2025-12-30T02:41:05 1767062465

Exactly. That adds actual value. Some of the 1000s of projects do this. Those pieces add value, if the tool adds value which also isn’t a given

austinbaggio · 2025-12-30T02:04:49 1767060289

In our experience, a lot of it is feel and dev preference. After talking to quite a few developers, we've found the skill was the easiest to get started with, but we also have a CLI tool and an MCP server too. You can check out the docs if you'd prefer to try those - feedback welcome: https://www.ensue-network.ai/docs#cli-tool

dnautics · 2025-12-30T10:13:46 1767089626

yeah but a skill without the mcp server is just going to be super inefficient at certain things.

again going to my example, a skill to do a dependency graph would have to do a complex search. and in some languages the dependency might be hidden by macros/reflection etc which would obscure a result obtained by grep

how would you do this with a skill, which is just a text file nudging the llm whereas the MCP's server goes out and does things.

AndyNemmity · 2025-12-30T16:52:46 1767113566

A skill is not just a text file nudging the llm. You group scripts and programming to the skill, and the skill calls it.

dnautics · 2025-12-31T05:55:32 1767160532

that seems token inefficient. why have the llm do a full round trip. load the skill which contains the potentially hundreds of lines code then copy and paste the code back into the compiler when it could just run it?

not that i care too too much about small amounts of tokens but depleting your context rapidly seems bad. what is the positive tradeoff here?

AndyNemmity · 2025-12-31T17:54:17 1767203657

I don't understand. The Skill runs the tools. In the cases there are problems where you can have programs replace the LLM, I think we should maximully do that.

That uses less tokens. The LLM is just calling the script, and getting the response, and then using that to continue to reason.

So I'm not exactly following.

dnautics · 2026-01-03T02:18:27 1767406707

what you are proposing is functionally equivalent to "wrapping an mcp in a cli" which is what I mentioned in my root comment.