Hacker Newsnew | past | comments | ask | show | jobs | submit | ClintEhrlich's commentslogin

You could definitely build a coding agent that way, and it sounds like you've done it. We store the conversation history because:

1. In our use of coding agents, we find that there are often things referenced earlier in the conversation (API keys, endpoint addresses, feedback to the agent, etc.) that it's useful to have persist.

2. This is a general-purpose LLM memory system, which we've just used here to build a coding agent. But it is also designed for personal assistants, legal LLMs, etc.


By construction, individual summaries are not typically large enough to overload the context window when expanded.

The reason that the volume is potentially arbitrarily large is that one sub-agent can call lcm_expand multiple times - either vertically or horizontally. But that's a process that occurs gradually as the tool is used repeatedly.

This has not been a problem in our testing, but if it were a problem it would be easy to prevent sub-agents from invoking lcm_expand once their context buffer has reached a specified threshold.


Suppose there's the following situation:

    Summary A = summarise(message 1 to P)
    Summary B = summarise(Summary A, message P+1 to Q)
    Summary C = summarise(Summary B, message Q+1 to R)
What does calling lcm_expand(Summary C) do? Does it unroll all messages from message 1 to message R or does it unroll to Summary B and message Q+1 to R?

> volume is potentially arbitrarily large is that one sub-agent can call lcm_expand multiple times - either vertically or horizontally

I'm assuming from this that it's the latter? In that case, that addresses my concern about not blowing up the context window immediately.


Thanks for the kind words! Looks cool!


Hi NWU,

We don't have any other materials yet, but let's see if this lands for you. I can run you through a couple simpler versions of the system, why they don't work, and how that informs our ultimate design.

The most basic part of the system is "two layers". Layer 1 is the "ground truth" of the conversation - the whole text the user sees. Layer 2 is what the model sees, i.e., the active context window.

In a perfect world, those would be the same thing. But, as you know, context lengths aren't long enough for that, so we can't fit everything from Layer 1 into Layer 2.

So instead we keep a "pointer" to the appropriate part of Layer 1 in Layer 2. That pointer takes the form of a summary. But it's not a summary designed to contain all information. It's more like a "label" that makes sure the model knows where to look.

The naive version of the system would allow the main model to expand Layer 2 summaries by importing all of the underlying data from Layer 1. But this doesn't work well, because then you just end up re-filling the Layer 2 context window.

So instead you let the main model clone itself, the clone expands the summary in its context (and can do this for multiple summaries, transforming each into the original uncompressed text), and then the clone returns whatever information the main thread requires.

Where this system would not fully match the capabilities of RLMs is that, by writing a script that calls itself e.g. thousands of times, an RLM has the ability to make many more recursive tool calls than can fit in a context window. So we fix that using operator-level recursion, i.e., we give the LLM a tool, map, that executes arbitrary recursion, without the LLM having to write a custom script to accomplish that.

Hope this helps!

- Clint


I am in the process of trying to integrate LCM in to my own personal assistant agent for its context management system. The main human facing agent will not be a coding agent so ill be modifying the system prompt and some other things quite heavily but core concepts of the system will be as the backbone. Now that I am paying around with it, I am hoping you can answer some questions. I notice that the system prompt of the agent mutates as local time is injected in to the system prompt itself. If that's whats happening, you are destroying any hopes of caching from the provider are you not? Am I reading this correctly or was this a deliberate choice for some reason... instead of appending at the end of the users turn like a system metadata info that way you preserve the head? Thanks.


Thanks for the reply. That does help.


Our system uses sub-agents as a core part of its architecture.

That terminology can be confusing, because in other cases (and sometimes in our own architecture, like when executing thousands of operations via MAP) a sub-agent may be a smaller model given less complex individual tasks.

But the core mechanism we use for simulating unlimited context is to allow the main model to spin up instances of itself (sub-agents) with the previously summarized portion of the context expanded into its full, uncompressed state.

Expanding summaries into full text in sub-agents rather than the main thread is a critical part of our architecture, because it prevents the main context window from filling up.


Hi, I'm Clint, one of the co-authors of this paper.

I'd like to quickly summarize what is different about our approach and why it matters.

Our work was inspired by brilliant research done at MIT CSAIL on "Recursive Language Models" (RLMs). One of the controversies has been whether these models are just a formalization of what agents like Claude Code already do vs. whether they bring new capabilities to the table.

By outperforming Claude on the major long-context benchmark, we provide a strong signal that something fundamentally new is happening. (In other words, it's not "just Claude Code" because it demonstrably outperforms Claude Code in the long-context regime.)

Where our contribution, LCM, differs from RLMs is how we handle recursion. RLMs use "symbolic recursion" -- i.e., they have an LLM write a script to recursively call itself in order to manipulate the context, which is stored in a REPL. This provides maximum flexibility... but it often goes wrong, since the LLM may write imperfect scripts.

LCM attempts to decompose the recursion from RLMs into deterministic primitives so that the control flow can be managed by an engine rather than left to the whims of the LLM. In practice, this means we replace bespoke scripts with two mechanisms: (1) A DAG-based context management system that works like paged virtual memory, except for managing conversations and files; and (2) Operator-level recursion, like "Map" for LLMs, which lets one tool call process thousands of tasks.

An analogy we draw in the paper is the evolution from GO-TO statements (of Dijkstra's "Considered Harmful" fame) to structured programming. RLMs are maximally expressive, but all of that power comes with the risk of things going awry. We have built a more mechanistic system, which can provide stronger guarantees when deployed in production with today's models.

Happy to answer any questions! Thanks for taking a look at the paper!


Thank you so much for your work!

I've echoed the sentiment here on HN (and elsewhere) that these kinds of mechanisms seem to be a pathway to extending context longer and longer and longer and I wish I could toy around with this technology right now (can I?). I'm so excited!!

Your work is the shoulders-built-on-shoulders upon which other giants shall keep on building. Thank you so much.


Thanks for the kind words.

Yes, we think there is a ton of low-hanging fruit from taking lessons from OS/PL theory and applying them to LLM tooling.

This is our first contribution in that direction. There will be more!


Oh and to be clear YES you can try it!!!

Just bring an API key. :)

github.com/voltropy/volt


This looks super useful! And it’s intellectually appealing to think that the LLM will have the ability to think back precisely and we can rely on DAG tooling to reason about and keep track of history (and correct history).

Have you considered making an openclaw plugin/PR for it? I understand you have your own coding CLI tool, but I don’t think this looks so hard to implement that it can’t be implemented elsewhere.

Either way, thanks for sharing this.


Yes, that is actually the next thing we are shipping!

We have heard from a ton of OpenClaw users that the biggest barrier to them getting everything they want out of their agents is that memory is not a solved problem.

LCM could be a great solution to that. Stay tuned -- will ship it ASAP.


Riffing on this a little, there’s a few things that would be useful:

1 - global namespace - for the gateway agent/coordinator - would make inspecting results of subagent tasks much more safe and efficient, and all the benefits of precision across compaction boundaries for the main chat thread. I could see giving the subagents access to it, or just prompting them fresh and storing results in the global memory - probably the second is better.

2 - permissioned memory spaces - stuff that a given subagent should know without giving them global memory access. Then a gateway could mark some stuff ‘available’ as part of prompting.

This would be a super useful set of primitives - from reading the paper, I think you could do this relatively cheaply, maybe a tagging system for branches/nodes in the DAG. openclaw keeps some sort of track of what subagents should have access to already in the form of skills, but I haven’t looked into the actual permissions architecture.


Did somebody say 'global namespace'? I spent years working on one of those as part of Urbit... In general, I think you're right. Each conversation is an append-only log at the lowest layer, and I see no reason not to expose that fact as a global namespace, as long as permissions are handled gracefully.

Of course getting permissions to work well might be easier said than done, but I like this direction.


Just passed this on to my co-author who is working on the plug-in. Really appreciate the suggestions!

We will probably ship a fairly basic version to start, but I think there are a lot of cool things that can be added.


Awesome! Looking forward to it


Love it. Yes, compaction is a huge pain point in openclaw, and it EATS tokens.


Cool. I agree (consistent with your GOTO analogy) that imposing structure on the model (or a human) can constrain the search space and lead to better choosing given a fixed decision budget.

> deterministic primitives

Are agent-map and LLM-map the only two options you've given the model for recursive invocations? No higher-level, er, reduction operators to augment the map primitives?


Hi, I'm the other author on this paper. You've asked a good question. I had originally planned on writing an agentic_reduce operator to complement the agentic_map operator, but the more I thought about it, the more I realized I couldn't come up with a use case for it that wasn't contrived. Instead, having the main agent write scripts that perform aggregations on the result of an agentic_map or llm_map call made a lot more sense.

It's quite possible that's wrong. If so, I would write llm_reduce like this: it would spawn a sub-task for every pair of elements in the list, which would call an LLM with a prompt telling it how to combine the two elements into one. The output type of the reduce operation would need to be the same as the input type, just like in normal map/reduce. This allows for a tree of operations to be performed, where the reduction is run log(n) times, resulting in a single value.

That value should probably be loaded into the LCM database by default, rather than putting it directly into the model's context, to protect the invariant that the model should be able to string together arbitrarily long sequences of maps and reduces without filling up its own context.

I don't think this would be hard to write. It would reuse the same database and parallelism machinery that llm_map and agentic_map use.


Cool! It'll be interesting to follow your work. I've been thinking, as well, about quorum and voting systems that might benefit from some structure. The primitives you've described are great for the "do N things one time each" case, but sometimes I (and the AI) want "do one thing N times: pick the best somehow". (I mean, you can express that with map/reduc over small integers or something, but still: different flavor.) You can even bring public choice theory into it.


Clint thanks heaps for this. Really good to see a lot of old school CS/Graph theory applied in a nice way.

On an unrelated note - I did notice you were also a lawyer - So umm how what is next for you re this? What should we gear up for :)


Thanks for the questions. I'll make sure to expand the FAQ when I get a chance.

1. The USD that you can use to unlock USDf (forked dollars/digital gold) is limited to deposits at commercial banks and credit unions. The public does not have access to electronic base money, so it's not included.

2. The quantity of USDf you can unlock is based on your historic bank balances, as verified during a defined window in the past. New credit money created after that point in time doesn't affect those past balances. To the extent that the owners of new USD wish to acquire weight for them, they'd need to purchase USDf, increasing its value. That is sort of the whole point: if governments keep inflating their money, the "forked" version with guaranteed scarcity will gradually increase in value.

3. USDf will trade at a different value than USD. At first, a much lower value. The idea is for them to be employed in a hybrid unit of account, USDw, where 1 USDw = 1 USD + 1 USDf.

A crypto-weighted dollar (i.e., USDw) will trade at a premium over a USD, since it is a USD + cryptographic weight. Think of it like a stablecoin that also comes with Bitcoin-like inflation protection. However, it's also possible to use USDf as an independent asset, and that will be convenient in use cases where transferring USD on existing payment rails isn't practical.


I would argue that smart savers at the moment are saving in cash in a safe. They would not get any USDf, right? So USDf is for the less informed. Are you expecting them to actively go through an unlocking process? Or will their banks do it based on public demand somehow? How is demand going to happen without people using it? Why would people use it if they don't have an advantage?


That is precisely the problem we're trying to solve: allowing the public to take advantage of blockchain-based digital scarcity without anyone being forced to invest in speculative assets.


I'm actually trying to convince people that, if we accept the premise that blockchain technology can create digital gold (as the market has, to the tune of hundreds of billions of dollars) then we should harness that digital gold to protect the value of the money that everyone already owns, rather than setting the world on fire by launching new currencies that function like pyramid schemes.


While noble, your digital gold is worse than usd not being fully backed by gold though. There is no intrinsic value in a database of hashes in the event that a currentcy fails. This solution is as much a fiat solution, if not more so than usd. The gold standard was as much if not more about securing the value of money as it was about controlling inflation as a result of over printing.


There's two nuances I would respectfully suggest that you're overlooking.

First, KRNC is designed to be employed as a supplement to USD. Most transactions would be executed with both USD and a corresponding blockchain asset. Technically, this is a digital analogue of the "symetallic standard", in which base money is comprised of both gold and silver in a specified ratio. The point is risk diversification: if fiat money implodes, or if crypto fails, you aren't wiped out.

Second, the concept of "intrinsic value" is misleading/confused when it comes to money. Things that trade at their consumption/production value are not monetized. Treating something as money involves attaching symbolic value to it: accepting it as proof of goods or services rendered in the past, and as a token that can be used to acquire goods or services in the future. Even gold would lose most of its value if it were suddenly priced based only on demand for use in industrial applications.

Money has always been valuable because everyone else treats it as money, whatever it is. It's a Schelling point that enables abstracted barter. Nothing less, nothing more.


I've been waiting for Scott Aaronson to put all of this into perspective since the first leaks about Google's quantum supremacy started appearing in popular media.

He has exceeded my expectations with this post, which cuts through all the hype to communicate exactly what the results of this experiment mean for the field. It's worth reading and sharing.


On second reading, I have but one trivial gripe:

"Enormity" implies moral reprobation, so it's a poor way of describing the significance of a computational discovery.


Isn't it just

Enormous : enormity :: huge : hugeness?


Both enormous and enormity come from the same etymological roots, enormis "irregular, huge", carrying a connotation of abnormal or irregular, in a negative or bad sense. Enormity specifically came to mean "extreme wickedness" in English, though that meaning is increasingly obscured by usage to mean simply "very big".

Most discerning usage would be that both me "very large scale", but that enormity preserves the meaning of bad at a very large scale.

That's become something of a losing battle as the synonymous usage has become an enormity.

https://www.etymonline.com/word/enormous

https://grammarist.com/usage/enormity-enormousness/


Not quite.

Enormous : enormousness :: huge : hugeness.

Enormity = Immense scale of evil (e.g., the "enormity of the holocaust")


It can be used in that way, but the neutral usage is valid too. I'd even argue the "evil" undertones of the usage you describe borders on archaic.

"Enormousness" isn't a word in common usage that I'm aware of.


I have no dog in the fight, and wish "enormity" had never developed a normative undertone, but I strongly disagree that said usage is anything close to archaic.

If you Google "enormity," the dictionary definitions it displays before the results are: 1.the great or extreme scale, seriousness, or extent of something perceived as bad or morally wrong. "a thorough search disclosed the full enormity of the crime" 2. a grave crime or sin. "the enormities of the regime"

Merriam-Webster claims this is not the exclusive usage, and that enormity can mean "immensity" without normative implications when the size is unexpected. But the very example it cites, from Steinbeck, involves the "enormity" of a situation in which a fire was started.

That said, I agree that "enormousness" is an awkward word, which I do not use. I'm left to ponder the enormity of my own pedantry.


As a native English speaker, I can't say I've found this to be the case. "Enormity" does tend to be used for dramatic effect, most often on moral issues, but I don't think that makes Scott wrong to use it here.

I don't know if I've seen "enormousness" before this thread.


Since enormous is from Latin, stems tend to be Latin. `ness` generally only is morphologically productive with Germanic roots, kindness, happiness, etc.

When I visited Iceland, I remember a sign in English that said a cliff was insafe [sic]. `in` being a Latin morpheme, and safe being Germanic.


Ah, I never realized in/un would be used with corresponding Latin/Germanic words.

But then, it seems there are quite a few un+latin (unreal, unbalanced, unadulterated, uncertain etc), even if I can't think of in+Germanic.


Good counter examples. Etymonline will break roots down for you.

https://www.etymonline.com/word/unbalanced#etymonline_v_2490...

Like most things in linguistics, the 'rules' are more a rule of thumb than the mathimatical sort.

Germanic pre/post fixes seem like they stick to pre-inkhorn roots better.

https://en.wikipedia.org/wiki/Inkhorn_term


Thanks for the enlightening nitpick, this is one of those terrible/terrific things about English that I love/hate.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: