This is also the idea behind sub-agents. Claude Code answers questions about things like "where is the code that does X" by firing up a separate LLM running in a fresh context, posing it the question and having it answer back when it finds the answer. https://simonwillison.net/2025/Jun/2/claude-trace/
I'm playing with that too (everyone should write an agent; basic sub-agents are incredibly simple --- just tool calls that can make their own LLM calls, or even just a tool call that runs in its own context window). What I like about Eternal Sunshine is that the LLM can just make decisions about what context stuff matters and what doesn't, which is a problem that comes up a lot when you're looking at telemetry data.
I keep wondering if we're forgetting the fundamentals:
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
Recursion and memoization only as a general approach to solving "large" problems.
I really want to paraphrase kernighan's law as applied to LLMs. "If you use your whole context window to code a solution to a problem, how are you going to debug it?".
By checkpointing once the agent loop has decided it's ready to hand off a solution, generating a structured summary of all the prior elements in the context, writing that to a file, and then marking all those prior context elements as dead so they don't occupy context window space.
Look carefully at a context window after solving a large problem, and I think in most cases you'll see even the 90th percentile token --- to say nothing of the median --- isn't valuable.
However large we're allowing frontier model context windows to get, we've got integer multiple more semantic space to allocate if we're even just a little bit smart about managing that resource. And again, this is assuming you don't recurse or divide the problem into multiple context windows.