After reading hundreds of papers on agentic memory and trying out every possible tool, I came to the simple conclusion that maybe we're looking at memory wrong.
Memory is just... learning. Learning about the user, the task at hand, learning insights and patterns, learning from decisions - good and bad, the feedback received. Learning from every interaction. Everything else is integration (how the agent uses these learnings) and curation (decay, pruning, deduplication).
So I built Learning Machines: A system that helps agents continuously learn from every interaction.
I started working on it dec 31, and got a basic working version yesterday. Here's the PR for those interested: [learning-machine-v0](https://github.com/agno-agi/agno/pull/5897)
In general we instantiate one or even multiple agents per request (to limit data and resource access). At moderate scale, like 10,000 requests per minute, even small delays can impact user experience and resource usage.
Another example: there a large, fortune 10 company that has built an agentic system to sift through data in spreadsheets, they create 1 agent per row to validate everything in that row. You might be able to see how that would scale to thousands of agents per minute.
You’re right, inference is typically the bottleneck and it’s reasonable to think the framework’s performance might not be critical. But here’s why we care deeply about it:
- High Performance = Less Bloat: As a software engineer, I value lean, minimal-dependency libraries. A performant framework means the authors have kept the underlying codebase lean and simple. For example: with Agno, the Agent is the base class and is 1 file, whereas with LangChain you'll get 5-7 layers of inheritance. Another example: when you install crewai, it installs the kubernetes library (along with half of pypi). Agno comes with a very small (i think <10 required dependencies).
- While inference is one part of the equation, parallel tool executions, async knowledge search and async memory updates improve the entire system's performance. Because we're focused on performance, you're guaranteed top of the line experience without thinking about it, its a core part of our philosophy.
- Milliseconds Matter: When deploying agents in production, you’re often instantiating one or even multiple agents per request (to limit data and resource access). At moderate scale, like 10,000 requests per minute, even small delays can impact user experience and resource usage.
- Scalability and Cost Efficiency: High-performance frameworks help reduce infrastructure costs, enabling smoother scaling as your user base grows.
I'm not sure why you would NOT want a performant library, sure inference is a part of it (which isn't in our control) but I'd definitely want to use libraries from engineers that value performance.
Agree that the cookbooks have gotten messy. Not an excuse but sharing the root case behind it: we're building very, very fast and putting examples out for users quickly. We maintain backwards compatibility so sometimes you see 2 examples doing the same thing.
I'll make it a point to clean up the cookbooks and share more examples under this comment. Here are 2 to get started:
Memory is just... learning. Learning about the user, the task at hand, learning insights and patterns, learning from decisions - good and bad, the feedback received. Learning from every interaction. Everything else is integration (how the agent uses these learnings) and curation (decay, pruning, deduplication).
So I built Learning Machines: A system that helps agents continuously learn from every interaction.
I started working on it dec 31, and got a basic working version yesterday. Here's the PR for those interested: [learning-machine-v0](https://github.com/agno-agi/agno/pull/5897)
This post digs into the technical details.