ChanlChanl
Knowledge & Memory

Your Agent Remembers Everything Except What Matters

ICLR 2026 MemAgents research reveals when AI agents need episodic memory (what happened) vs semantic memory (what's true). Covers MAGMA, Mem0, AdaMem papers, comparison of Mem0 vs Letta vs Zep, and architecture patterns with TypeScript examples.

DGDean GroverCo-founderFollow
March 20, 2026
18 min read read
Abstract neural pathways splitting into two branches representing episodic and semantic memory systems

Our agent remembered that the customer called twice last week. It had perfect episodic recall: timestamps, transcript snippets, the exact tool calls from each session. But when the customer called a third time and said "I'm the one who always has billing problems," the agent drew a blank. It had the episodes. It never distilled the pattern.

Two types of memory. One was missing. And that gap cost us a fifteen-minute call that should have taken three. Conventional wisdom says "more memory is better." The research says the type of memory matters more than the amount.

This isn't a theoretical problem. It's the central architectural question facing every team building customer-facing AI agents in 2026. The ICLR 2026 MemAgents workshop in Rio de Janeiro, the first major venue dedicated entirely to agent memory, put it in formal terms: episodic memory stores what happened; semantic memory stores what's true. Most production agents have one or the other. The research says you need both, and more importantly, you need to know when each one matters.

Table of contents

Two memory systems, one agent

The distinction comes from cognitive science. Endel Tulving proposed it in 1972, and it has held up through fifty years of neuroscience research. The MemAgents workshop proposal explicitly bridges these perspectives, calling out episodic, semantic, and working memory as the three architectural pillars, alongside neuroscience-inspired consolidation as a design pattern.

Episodic memory records specific events with temporal context. "On March 12 at 2:47 PM, the customer called about a double charge on invoice #8291. They were frustrated. The agent issued a $47.50 refund and the customer accepted."

Semantic memory stores distilled facts without event context. "This customer prefers email follow-ups. They have a history of billing disputes. They're on the Enterprise plan."

The difference isn't just academic. It determines what your agent can do:

CapabilityEpisodicSemantic
"What happened last time?"Direct recall of the eventCannot answer
"What do we know about this customer?"Must search all past episodesDirect lookup
"Why did we give them a refund?"Full context with reasoning chainCannot answer
"Does this customer prefer phone or email?"Must infer from multiple episodesDirect answer
Audit trail and complianceComplete event logInsufficient
Personalization at scaleToo slow to search everythingFast, pre-computed

Human brains run both systems simultaneously. The hippocampus rapidly encodes episodes. The neocortex slowly consolidates patterns into semantic knowledge, mostly during sleep. The MemAgents workshop calls this complementary learning systems theory, and it's now a first-class design pattern for agents.

What the research says

The first quarter of 2026 produced more agent memory research than all of 2024 combined. Here are the papers that matter for practitioners.

MAGMA: four graphs, one memory

MAGMA (Multi-Graph based Agentic Memory Architecture) treats each memory item as a node that lives simultaneously in four orthogonal graphs: semantic, temporal, causal, and entity. When retrieving, a policy-guided traversal walks across whichever graph dimensions the query needs.

Ask "what happened after the customer complained?" and the traversal follows temporal and causal edges. Ask "what do we know about this customer's preferences?" and it follows semantic and entity edges. Same memory store, different retrieval paths.

The results on the LoCoMo benchmark are striking: a judge score of 0.70, outperforming full-context baselines (0.481) by 45.5% and beating prior memory systems like A-MEM (0.58) and Nemori (0.59) by 18-20%. The insight isn't just that multi-graph works. It's that explicitly separating the semantic and temporal views of the same memory enables fundamentally better retrieval.

Mem0: production-scale memory with a 26% accuracy edge

Mem0's paper on the LOCOMO benchmark demonstrated that combining episodic and semantic extraction with graph-based representations delivers a 26% relative improvement in accuracy over OpenAI's built-in memory (66.9% vs 52.9% overall LLM-as-Judge score). The graph-enhanced variant adds another 2% on top.

Beyond accuracy, the production numbers matter: 91% lower p95 latency and 90% token savings compared to stuffing full conversation history into context. That's the difference between a memory system that works in a demo and one that works at scale.

Mem0 explicitly categorizes memories into episodic (interaction-specific events with temporal markers) and semantic (extracted knowledge without event context). The system scores each memory using a composite of semantic similarity (60%), extraction confidence (20%), recency (10%), and access frequency (10%).

E-mem: episodic context without compression loss

E-mem took a different angle on episodic memory. Instead of compressing episodes into summaries (which loses detail), it uses a hierarchical multi-agent architecture where subordinate agents each maintain uncompressed episode windows while a master agent orchestrates retrieval across them.

The result: 54%+ F1 on LoCoMo (7.75% above the previous state of the art) while reducing token cost by 70%. The lesson for practitioners: if your use case requires high-fidelity episodic recall (compliance, legal, healthcare), compression-based approaches may sacrifice too much detail.

MEM-alpha: teaching agents to manage their own memory

MEM-alpha, under review at ICLR 2026, frames memory construction as a reinforcement learning problem. The agent learns when to store, update, summarize, or discard memories through interaction and feedback, using separate episodic, semantic, and core memory components with specialized tools for each.

The most impressive finding: despite training on sequences of only 30K tokens, the agents generalize to sequences exceeding 400K tokens, over 13x the training length. The memory management policy transfers because the episodic-semantic distinction itself is generalizable. An agent that learns "distill repeated patterns into semantic facts" applies that strategy regardless of conversation length.

AdaMem: four memory types in concert

AdaMem, published March 2026 from Tsinghua and Tencent, organizes dialogue history into working memory, episodic memory, persona memory, and graph-based memory. The key innovation is question-conditioned retrieval: the system examines each query and decides which memory types to activate. A "what happened" question triggers episodic retrieval. A "what kind of person" question triggers persona memory. A "how are X and Y related" question triggers graph expansion.

This achieved state-of-the-art results on both LoCoMo and PERSONAMEM benchmarks, confirming that the right memory type for a query is as important as the quality of any single memory type.

The consolidation problem

Here's where the storyline from our opening comes back. Our agent had excellent episodic memory. It stored every interaction faithfully. But it never consolidated those episodes into semantic facts.

The customer called three times about billing issues. Each episode was stored independently. When the customer said "I always have billing problems," the agent couldn't confirm or deny. It would need to search all episodes, find the pattern, and synthesize it in real-time. That's expensive, slow, and often inaccurate.

Memory consolidation is the process of converting episodic memories into semantic knowledge. In neuroscience, this happens during sleep: the hippocampus replays recent experiences for the neocortex. In AI agents, it's a background process that periodically scans recent episodes and extracts durable facts.

Episode: Called March 5billing dispute, frustrated ConsolidationProcess Episode: Called March 9billing error, requested email Episode: Called March 14refund request, billing again Semantic: Recurringbilling issues Semantic: Prefersemail follow-up Semantic: Frustrationescalation risk
Memory consolidation: episodes become facts

The A-MAC paper (Adaptive Memory Admission Control) from the MemAgents workshop formalizes this with five factors for deciding what to consolidate: future utility, factual confidence, semantic novelty, temporal recency, and content type prior. Not every episode deserves to become a semantic fact. The customer mentioning the weather doesn't consolidate. The customer mentioning they're switching to a competitor does.

Without consolidation, episodic memory grows unbounded and retrieval quality degrades. With naive consolidation, you lose the source episodes and can't answer "when did this happen?" or "who told us this?" The production answer is both: consolidate into semantic facts while retaining episodic sources as references.

Architecture patterns

Three architectural approaches have emerged from the 2026 research. Each makes a different trade-off between episodic fidelity and semantic efficiency.

Pattern 1: Dual-store with consolidation

Separate stores for episodes and facts, with a background consolidation process. This is closest to the biological model and is what Mem0 and AWS AgentCore implement.

typescript
interface EpisodicMemory {
  id: string;
  timestamp: Date;
  // Full event context -- who, what, when, why
  event: string;
  participants: string[];
  sentiment: 'positive' | 'neutral' | 'negative';
  embedding: number[];
}
 
interface SemanticMemory {
  id: string;
  // Distilled fact without temporal context
  fact: string;
  confidence: number;
  // Link back to supporting episodes for auditability
  sourceEpisodeIds: string[];
  lastUpdated: Date;
  embedding: number[];
}
 
async function consolidate(
  episodes: EpisodicMemory[],
  existingFacts: SemanticMemory[]
): Promise<SemanticMemory[]> {
  // Ask the LLM to extract durable facts from recent episodes
  // that aren't already captured in existing semantic memory
  const prompt = `Given these recent interactions:\n${
    episodes.map(e => `[${e.timestamp}] ${e.event}`).join('\n')
  }\n\nAnd these known facts:\n${
    existingFacts.map(f => f.fact).join('\n')
  }\n\nExtract NEW durable facts about this customer.
  Only include facts likely to be true in future interactions.
  Do not include one-time events or transient states.`;
 
  const response = await llm.generate(prompt);
  return parseFactsFromResponse(response, episodes);
}

When to use: Customer-facing agents where you need both personalization (semantic) and accountability (episodic). Most production use cases land here.

Pattern 2: Multi-graph unified store

MAGMA's approach: one memory store, multiple graph views. Every memory has semantic, temporal, causal, and entity edges. Retrieval traverses whichever dimensions the query requires.

typescript
interface UnifiedMemoryNode {
  id: string;
  content: string;
  // Semantic edges: "similar to" relationships
  semanticNeighbors: { nodeId: string; similarity: number }[];
  // Temporal edges: "happened before/after"
  temporalEdges: { nodeId: string; relation: 'before' | 'after' }[];
  // Causal edges: "caused by" / "resulted in"
  causalEdges: { nodeId: string; relation: 'cause' | 'effect' }[];
  // Entity edges: "involves" specific people, products, accounts
  entityEdges: { entityId: string; role: string }[];
}
 
// Query routing decides which graph dimensions to traverse
function routeQuery(query: string): GraphDimension[] {
  if (query.includes('what happened') || query.includes('when'))
    return ['temporal', 'causal'];
  if (query.includes('who') || query.includes('customer'))
    return ['entity', 'semantic'];
  if (query.includes('why') || query.includes('because'))
    return ['causal', 'semantic'];
  // Default: semantic similarity
  return ['semantic'];
}

When to use: Complex reasoning tasks where queries require navigating relationships between events. Think enterprise CRM, legal research, medical history analysis.

Pattern 3: Self-editing memory blocks

Letta's approach, inspired by MemGPT: the agent maintains structured memory blocks inside its context window and directly edits them during conversation. No separate retrieval step. Memory is always present.

typescript
// Memory lives in the system prompt, updated by tool calls
const coreMemory = {
  // Agent edits these blocks directly during conversation
  userProfile: "Name: Sarah Chen. Plan: Enterprise. Preference: email.",
  conversationState: "Third call this month. All about billing.",
  knownIssues: "Recurring billing discrepancies. Last refund: $47.50.",
};
 
// Agent uses tool calls to update memory in real-time
const memoryTools = [
  {
    name: "update_memory",
    description: "Update a memory block with new information",
    // Agent decides what to remember and how to phrase it
    parameters: { block: "string", content: "string" }
  },
  {
    name: "search_archival",
    description: "Search long-term storage for older memories",
    parameters: { query: "string" }
  }
];

When to use: Agents that need to actively reason about their own knowledge state. Personal assistants, tutoring systems, agents that explain their reasoning. The trade-off is smaller total memory capacity (limited by context window) but zero retrieval latency for core facts.

Platform comparison

The memory platform landscape in 2026 has consolidated around three major open-source options plus managed services. Here's how they map to the episodic-semantic decision.

FeatureMem0Zep (Graphiti)LettaAWS AgentCore
Memory modelHybrid vector + graphTemporal knowledge graphSelf-editing blocks + archivalManaged extraction
Episodic supportEvents with temporal markersFull temporal graph with validity windowsRecall memory (conversation history)Conversation summaries
Semantic supportExtracted facts, composite scoringEntity facts with temporal evolutionCore memory blocks, agent-editedUser preferences and facts
ConsolidationAutomatic extraction + graph linkingAutonomous graph updates with temporal trackingAgent-driven via tool callsAutomatic extraction
Temporal reasoningBasic recency scoringNative (tracks when facts became true/false)Limited to agent's own editsBasic
LoCoMo score66.9% (LLM-as-Judge)N/A (different benchmark: 94.8% DMR)N/A (benchmark requested via GitHub issue)N/A
Latency91% lower p95 vs full-contextSub-100ms retrievalZero for core memory (in-context)AWS-managed
Self-hostedYes (open source)Yes (Graphiti is open source)Yes (open source)No (AWS only)
Best forGeneral-purpose agentsEnterprise with evolving relationshipsAgents that reason about knowledgeAWS-native stacks
Customer service representative

Customer Memory

4 memories recalled

Sarah Chen
Premium
Last call
2 days ago
Prefers
Email follow-up
Session Memory

“Discussed upgrading to Business plan. Budget approved at $50k. Follow up next Tuesday.”

85% relevance

Choosing between them: For a deeper build-vs-buy analysis of these platforms, see our memory platform comparison. The short version: if your agent handles customer support or sales where facts about customers matter most, Mem0's composite scoring gives you the best accuracy-latency trade-off. If your agent operates in domains where facts change over time (insurance, healthcare, finance), Zep's temporal knowledge graph tracks when facts were true and when they were superseded. If your agent needs to actively manage what it knows (tutoring, personal assistants), Letta's self-editing model gives the agent direct control.

For teams already on AWS, AgentCore Memory provides a managed path. The March 2026 streaming notifications feature pushes memory changes to Kinesis, enabling downstream workflows without polling. It's available in 15 regions and handles both episodic (conversation summaries) and semantic (extracted preferences and facts) memory automatically.

When each type matters

Not every agent needs both memory types equally. The 2026 research points to clear guidelines based on your use case.

Episodic-heavy use cases need high-fidelity event recall. Compliance and audit requirements. Customer support where "what happened last time" is the most common question. Legal or medical agents where the source and context of information matters as much as the information itself. E-mem's uncompressed episodic architecture specifically targets these, preserving full event detail instead of summarizing it away.

Semantic-heavy use cases need fast personalization at scale. Sales agents that need to know preferences instantly. Recommendation engines that synthesize patterns across hundreds of interactions. Onboarding flows that adapt to what the system already knows. Here, Mem0's composite scoring and 90% token savings matter most. You need the answer, not the audit trail.

Both equally is the most common production scenario. Customer-facing agents that personalize (semantic) while maintaining accountability (episodic). Contact center agents that greet returning customers by name and preference (semantic) but can explain exactly when and why a previous decision was made (episodic). This is where the dual-store architecture from Pattern 1 delivers, and it's what most of the 2026 research optimizes for.

The AdaMem paper gives a practical heuristic: let the query decide. Build both memory types, then route each retrieval to the appropriate store based on what the question is actually asking. A "what happened" question goes to episodic. A "what kind of person" question goes to semantic. A "how are X and Y related" question goes to the graph layer.

Implementation guide

Here's a minimal dual-store implementation that handles both memory types with consolidation. This connects to any vector store and any LLM. The pattern is what matters.

typescript
import { openai } from './clients';
 
// Store an episode after each conversation turn
async function storeEpisode(
  customerId: string,
  event: string,
  metadata: Record<string, unknown>
): Promise<EpisodicMemory> {
  const embedding = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: event,
  });
 
  return db.episodicMemories.insertOne({
    customerId,
    timestamp: new Date(),
    event,
    metadata,
    embedding: embedding.data[0].embedding,
  });
}
 
// Query the right memory type based on the question
async function recall(
  customerId: string,
  query: string
): Promise<{ episodic: EpisodicMemory[]; semantic: SemanticMemory[] }> {
  const queryEmbedding = await embed(query);
 
  // Always retrieve relevant semantic facts -- cheap, fast
  const semanticResults = await db.semanticMemories.vectorSearch({
    filter: { customerId },
    vector: queryEmbedding,
    limit: 5,
    minScore: 0.3,
  });
 
  // Retrieve episodic memories for temporal/causal queries
  // or when semantic results are insufficient
  const needsEpisodic = isTemporalQuery(query)
    || semanticResults.length < 2;
 
  const episodicResults = needsEpisodic
    ? await db.episodicMemories.vectorSearch({
        filter: { customerId },
        vector: queryEmbedding,
        limit: 10,
        minScore: 0.25,
      })
    : [];
 
  return { episodic: episodicResults, semantic: semanticResults };
}
 
// Simple heuristic -- production systems use a classifier
function isTemporalQuery(query: string): boolean {
  const temporalSignals = [
    'when', 'last time', 'previously', 'before',
    'after', 'history', 'what happened', 'timeline',
  ];
  return temporalSignals.some(s =>
    query.toLowerCase().includes(s)
  );
}

The consolidation process runs on a schedule: hourly, daily, or triggered by a threshold of new episodes.

typescript
async function runConsolidation(customerId: string) {
  // Get episodes not yet consolidated
  const recentEpisodes = await db.episodicMemories.find({
    customerId,
    consolidated: { $ne: true },
    timestamp: { $gte: daysAgo(7) },
  });
 
  if (recentEpisodes.length < 3) return; // Not enough signal
 
  const existingFacts = await db.semanticMemories.find({ customerId });
 
  const newFacts = await consolidate(recentEpisodes, existingFacts);
 
  // Store new semantic facts with source references
  for (const fact of newFacts) {
    await db.semanticMemories.insertOne({
      ...fact,
      customerId,
      // Retain provenance -- which episodes support this fact
      sourceEpisodeIds: recentEpisodes.map(e => e.id),
    });
  }
 
  // Mark episodes as consolidated (don't delete them)
  await db.episodicMemories.updateMany(
    { _id: { $in: recentEpisodes.map(e => e._id) } },
    { $set: { consolidated: true } }
  );
}

The key detail: marking episodes as consolidated, not deleting them. You need both the distilled fact ("customer has recurring billing issues") and the source episodes ("specifically, calls on March 5, 9, and 14") for the system to be auditable. If you've worked through building a memory system from scratch, this consolidation layer is what sits on top of the persistent + semantic stores from that tutorial.

The admission control problem

Even with both memory types working, you face a harder question: what deserves to be remembered at all?

The A-MAC paper from the MemAgents workshop frames this as memory admission control. Current systems either store everything (expensive, noisy) or use opaque LLM-driven policies (costly, hard to audit). A-MAC decomposes memory value into five interpretable factors:

  1. Future utility: will this information matter in future interactions?
  2. Factual confidence: is this information reliable?
  3. Semantic novelty: do we already know this?
  4. Temporal recency: is this current?
  5. Content type prior: does this category of information tend to be useful?

This maps directly to the episodic-semantic decision. High future utility + high confidence + low novelty = don't store (we already know it). High future utility + high confidence + high novelty = store as semantic fact. Moderate utility + high temporal relevance = store as episode (might matter in context, not worth promoting to a fact yet).

The practical implementation is a scoring function that runs before every memory write:

typescript
function shouldAdmit(candidate: MemoryCandidate): {
  admit: boolean;
  type: 'episodic' | 'semantic' | 'discard';
} {
  const score = (
    candidate.futureUtility * 0.35 +
    candidate.factualConfidence * 0.25 +
    candidate.semanticNovelty * 0.20 +
    candidate.temporalRecency * 0.10 +
    candidate.contentTypePrior * 0.10
  );
 
  if (score < 0.3) return { admit: false, type: 'discard' };
 
  // High-confidence, durable facts go to semantic store
  if (candidate.factualConfidence > 0.8 && candidate.futureUtility > 0.7)
    return { admit: true, type: 'semantic' };
 
  // Everything else starts as an episode
  // (may consolidate to semantic later)
  return { admit: true, type: 'episodic' };
}

Without admission control, agents accumulate hallucinated facts, obsolete preferences, and conversational noise. With it, memory stays focused on what actually matters for future interactions. This is especially critical for privacy-first memory design. Admission control is your first line of defense against storing information you shouldn't.

What this means for production

The 2026 research converges on a clear message: the episodic-semantic split isn't optional. It's the foundation that every other memory capability builds on. MAGMA's four-graph architecture works because it explicitly separates temporal and semantic views. Mem0's 26% accuracy gain works because it extracts and scores both memory types. AdaMem's query routing works because different questions need different memory stores.

If you're building memory into an agent today, start with the dual-store pattern. Store episodes from every conversation. Run consolidation to extract semantic facts. Route queries to the appropriate store based on what's being asked. That's the architecture that 2026's best-performing systems all share.

The agent from our opening? We added a nightly consolidation job. Three episodes about billing disputes became one semantic fact: "Customer has recurring billing issues, prefers email resolution, escalation risk." The next time they called, the agent knew them before they said a word. The call lasted four minutes.

Two types of memory. Both present. That's what makes an agent feel like it actually knows you.

Give your agents memory that learns

Chanl's memory system stores episodic events and distills semantic facts automatically. Your agents remember what happened and what it means.

Explore Memory
DG

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.

500+ engineers subscribed

Frequently Asked Questions