Memory will be the defining factor of agentic systems

Andrej Karpathy wrote an article about his AI wiki which exploded in popularity. Nate B Jones created Open Brain to solve a similar problem. One is LLM on write and the other is LLM on read. Nate discusses the differences and tradeoffs with both approaches. I'm building Munin - my take on the second brain concept - using LLM on both ingestion and extraction. And in between.

Nate's post is an interesting discussion about two novel implementations that each try to solve what I think will be the ultimate differentiating factor in harnessing AI in the time to come: memory.

What they are building

Nate started building Open Brain a couple of months ago:

A personal database for thoughts, notes, and observations, designed to be queried by AI agents. You store raw captures; structure and metadata make them searchable and relatable. Intelligence is applied at the moment you ask a question, synthesizing fresh answers from whatever is relevant in the database at that point in time.

While Andrej recently wrote about how he had made an LLM Wiki:

A personal knowledge base where an AI does the writing. You supply raw material; the AI synthesizes, organizes, and maintains a set of interlinked prose pages that represent your accumulated understanding of a topic. Intelligence is applied at the moment information comes in, so retrieval requires no AI at all.

The notion of self-organizing knowledge that offers easy retrieval is immediately compelling for most people. So, they have both triggered a lot of interest. Thousands of people have built Nate's Open Brain. Me included.

Solving memory will be infinitely important in all AI related domains. At their core, LLMs do not have memory. Every time you query them, they have no memory of earlier conversations. Or even earlier messages in the current conversation. They are dependent on the context providing everything they need. Every time.

For most people, this context consists of .md files that contains important details you don't want to repeat every time you write a message. (If you're using something like Claude AI, there is an additional memory layer that is automatic and stored on Anthropic's servers.)

But this does not scale, and keeping lots of .md files updated and relevant is a lot of work. And you cannot just include everything at all times. The context should be as lean as possible, so you only want to include things that are relevant.

Solving this is hard. But the LLM providers have incentives to build great solutions for this: to keep you in their ecosystem. They will provide this seamlessly, and "for free". The more an agent learns about you, the better results you get from using it. And the more pain it will be to switch provider.

Unless you own the memory.

The birth of Munin

For the last few weeks, I've been working on Munin: my take on a second brain. It was born after I built Nate's Open Brain and found that my ADHD brain was struggling with it on a few levels. It was showing so much promise (and I was quite hyped, to be honest), but I found the bar for capturing "thoughts" into it was too high. I would forget to have Claude capture important things. The cognitive load of making sure whatever I captured was sufficient was too taxing. Looking up links or references to include was too much work. It was simply growing way too slowly to become what I wanted it to be. So I started expanding upon Nate's idea to achieve what would make this concept really useful to me personally.

The fork, as Nate frames it

Every knowledge system with an AI at its core has to answer one question:
when does the AI do the hard thinking - when information comes in, or when you ask about it?
This is the fork. Everything else follows from it. - Nate B Jones

Nate makes an argument for what types of cases where AI on ingestion (Andrej's model) shines vs when AI on extraction (Nate's model) shines. One isn't necessarily better than the other. They both have strengths and weaknesses. It depends on the context in which it is meant to be used.

AI on ingestion means, especially the way Andrej does it, that the AI is making editorial choices on what to include and what to leave out. What ends up being stored is what you receive every time you want to extract something from the wiki.

AI on extraction means that whatever is stored must first be curated by you upon ingestion, so it greatly increases the friction of ingestion. But whenever you want to extract from this source, AI is applied to do the thinking.

Andrej's wiki is a write-time system. When a new source arrives, the AI reads it, synthesizes what matters, and updates a set of organized prose pages. By the time you ask a question, the thinking has already been done. The wiki is a compiled artifact of understanding. You browse it, follow links, read summaries that already connect last week's reading to this week's.

Nate's Open Brain is a query-time system. New information is stored faithfully - tagged, embedded, made searchable - but nobody synthesizes anything yet. When you ask a question, the AI does the thinking fresh, from the raw material. The database is pristine. The understanding is always current. And the cost of that understanding is paid every time.

Munin was conceived as I struggled with Open Brain not working as I had hoped, and has grown to be something else. But it retains its central aspect of query-time behaviour.

Write-time, but not editorial

However, unlike Open Brain, Munin uses LLM at write-time as well. But not like LLM wiki.

The distinction I have arrived at is that write-time intelligence has two very different jobs, and conflating them causes confusion.

In Andrej's LLM wiki, the AI at write-time is a writer. It makes editorial choices. It decides what matters in a source, how to frame the connections between ideas, and what synthesis to produce. The output is prose that a human can read and trust - or subtly misread and over-trust, which is Nate's most pointed critique.

In Munin, the AI at write time is closer to a cartographer. When a new thought arrives, the pipeline's job isn't to synthesize it into prose. It's to place it on the map: embed it, find which existing thoughts it should link to, assign it to the right conceptual cluster. Instead of the output being a readable document, it's a richer row in a database, with relationships to other rows and a position in the semantic space of everything already stored. The editorial judgment stays mine, but the structural work happens automatically.

This matters because the failure mode of editorial write-time systems - I think Nate is right about this - is that the AI's framing becomes invisible. You stop questioning the wiki because it reads like something you already know. Structural write-time work doesn't have this problem. There's no prose to over-trust. The thought you captured is the thought that's stored. The AI's contribution is in the connections and positions it assigns, and those are queryable, auditable, correctable.

The third layer

Munin also has a third layer of intelligence, a layer neither system in Nate's comparison has. Similar to, but not quite like Andrej's ingestion intelligence: a deliberate synthesis pass that runs off the hot path.

Nate proposes a compilation agent that runs periodically and generates wiki pages. What Munin has instead is a nightly expansion pass that enriches new thoughts in place - deepening context by strengthening or creating links between related thoughts, and updating confidence weights. The output isn't a separate prose artifact. It's the same rows, but richer. The synthesis stays in the schema.

The reason this matters architecturally is that it separates three distinct moments of intelligence:
- capture time (structural placement)
- expansion time (deliberate off-path enrichment)
- query time (fresh synthesis for the specific question being asked)

These have different cost profiles, different latency tolerances, and different purposes.

Collapsing them - doing everything at write time like the wiki, or everything at query time like the original Open Brain - means every operation is paying for work that doesn't belong to it.

I'll be honest: this three-stage pipeline is more complex, and complexity has its own costs. Whether the added richness justifies the added surface area is something I'm continuously assessing.

The browseability problem

The thing Munin shares with Open Brain, and where both fall short compared to a wiki, is lack of browseability. There's no artifact you can open and wander through. The knowledge is there. The connections are there. But to surface them you have to ask a question, which means you have to already know what you're looking for. Discovery - the kind that comes from following a link somewhere unexpected - isn't naturally supported.

I've been thinking about this for some time because it is something I've been missing. The solution that feels most right to me is one I haven't seen described elsewhere:

Just-in-time prose generation as a navigation interface

The idea is simple. You open a discovery page and enter a topic. Munin runs a search against the corpus and generates a prose document from the results - something readable, with links embedded in the text. Not links to other stored documents, because most of the time those do not exist. Links that are themselves queries. You click the phrase "Munin started out as Open Brain" and Munin runs a new search on that term (enriched in the underlying link), finds what the corpus knows about that thread, and generates a new document for you to read.

You follow the semantic shape of the material rather than a pre-built document structure. So the knowledge base isn't something you browse. It's something you walk through, one generated step at a time.

Several things about this appeals to me. One is that it never drifts. Because each page is generated fresh from the living corpus, it reflects whatever knowledge is contained in Munin right now, not a compiled snapshot from last Tuesday. And because nothing is pre-built, there's no token cost for artifacts you never read. You pay for what you actually explore.

Not without caveats

There are caveats, of course. Consistency across sessions is gone: ask the same question a second time and get a different document. For discovery, I think that's acceptable, good even. For anything you want to share or cite, it's a problem. And the experience lives or dies on the quality of the links embedded in the generated prose. If those links are shallow keyword queries, the navigation quickly runs out of interesting places to go. The model has to know, while generating the page, what adjacent searches would be worth following. That's a design problem I don't have a clean answer to yet. These issues are certainly solvable. It remains to be seen if I can solve them in a satisfying way.

Some pages will also be generated from thin corpus coverage - areas where Munin doesn't have a lot stored - but the prose will still read fluently. This is the exact failure mode Nate identifies in wiki systems. I think the right answer is to surface that in the generated text rather than hide it: a document that says "Munin has little on this" is more useful than one that fills the gap with confident paraphrase.

On saving generated pages: my instinct is that the right default is not to store them. They're views, not documents, and storing them creates a secondary corpus with no provenance guarantees. But that's not an absolute principle. A particularly useful page could be saved, the same way you'd clip something you found while reading the web. The same could apply to things you want to share. The staleness objection applies here, but staleness applies to almost anything. Blog posts like this are stale artifacts and we treat them as perfectly valid. The design of where and how such artifacts might be saved is something to work out when it actually matters.

But also so much more

There are many details I did not discuss in this post. Especially around artifact linking and corpus quality. I have taken inspiration from other projects like agentmemory, and have included thought evolution through confidence weighting, thought-linking through relationship graphs (where a thought can be PART_OF, DERIVED_FROM, RELATED_TO other thoughts, SUPPORT or CONTRADICT them etc).

I'm also doing thought clustering through dynamic topics (or tags) and thought types (FACT, DECISION, OBSERVATION, HYPOTHESIS etc) which all contribute to the confidence weights. The confidence score is a constantly evolving property of each thought, increasing and decreasing continuously by the appearance of corroborating or contradicting artifacts, and even time. The corpus is continuously changing, slowly hiding thoughts with decreasing confidence while thoughts with increasing confidence bubbles up to the surface.

Relying only on embeddings does not scale. Thoughts can be closely connected semantically without being related in context. The more Munin grows, the more noise semantic searches will produce.

Where Munin currently stands

This post is a snapshot of an evolving build, not a position paper. Nate's post bumped it on my urgency list, and so I chose to finish writing it now rather than after the third post in the Comparing AI models series. The three moments of intelligence feel right to me, but Munin is a way from being finished and some of these ideas haven't been tested against real use. The browseability interface is currently a concept, not an implementation.

What Nate's piece clarified for me is that the write-time vs query-time question isn't really about when computation happens. It's about what kind of trust you're placing in the AI, and at what moment in your relationship with the knowledge that trust is hardest to revoke. Editorial trust at write time is the most invisible kind. Structural trust at write time is auditable. Query-time trust is always fresh, but always costly.

Munin is a bet that structural write-time intelligence, combined with deliberate periodic synthesis and on-demand query-time generation, gives better guarantees than any single-mode architecture. I think that's right, so I'm building toward it.

Whether it is the right bet or not, my findings will end up in a blog post.