What is graph memory for AI agents?

Graph memory stores information as entities and relationships rather than flat text chunks. Instead of embedding 'Sarah switched from coffee to tea,' graph memory creates nodes for Sarah, coffee, and tea, then connects them with typed, timestamped edges like 'prefers' and 'formerly preferred.' This lets agents answer relationship questions that vector search misses.

Why isn't vector search enough for agent memory?

Vector search finds semantically similar text, but it can't track how facts change over time, model multi-hop relationships, or resolve contradictions. If a customer said they like coffee in January and tea in March, vector search returns both memories with equal confidence. Graph memory knows the tea preference is newer and the coffee preference is outdated.

How does entity extraction work for graph memory?

Entity extraction uses an LLM to identify people, products, preferences, and other named entities from conversation text, then outputs structured JSON with entity names, types, and the relationships between them. The extracted triples (subject, predicate, object) become nodes and edges in the knowledge graph.

What is temporal memory tracking?

Temporal memory records when facts were true, not just what the facts are. Each relationship edge carries timestamps for when it was first observed and last confirmed, plus an optional invalidation date. This lets agents distinguish current preferences from historical ones and detect when information is stale.

How does Mem0 compare to Zep and Letta for agent memory?

Mem0 combines vector and graph memory with automatic entity extraction, achieving 26% accuracy improvement over vector-only approaches. Zep specializes in temporal knowledge graphs that track how facts evolve over time. Letta gives agents self-editing memory blocks they can modify during conversations. Choose based on your primary need: general accuracy (Mem0), temporal reasoning (Zep), or agent autonomy (Letta).

Can I build graph memory without a dedicated graph database?

Yes. For prototypes and small-scale agents, you can store entities and relationships in MongoDB, PostgreSQL with JSONB, or even an in-memory Map. The data model is simple: a nodes collection and an edges collection with timestamps. You only need Neo4j or a dedicated graph database when you're running multi-hop traversals across millions of relationships.

How do you handle contradictions in graph memory?

When new information contradicts existing relationships, you don't delete the old edge. Instead, mark it with an invalidatedAt timestamp and create a new edge with the updated fact. This preserves history for audit and debugging while ensuring the agent uses current information. The resolution logic checks edge timestamps and picks the most recent active relationship.

How much does graph memory cost compared to vector-only memory?

Graph memory adds one LLM call per conversation turn for entity extraction, typically 200-500 tokens of input and 100-300 tokens of output. At GPT-4.1-mini pricing, that's roughly $0.0001 per extraction. The storage overhead is minimal since nodes and edges are small JSON documents. The accuracy gains from relationship-aware retrieval usually justify the cost for customer-facing agents.

Graph memory for AI agents: when vector search isn't enough

Your agent knows the customer likes coffee. It also knows they like tea. It has no idea they switched from coffee to tea three weeks ago, or that the switch happened because they're pregnant and avoiding caffeine.

Vector search retrieved both facts. Both scored above the similarity threshold. Both got injected into the context window. The agent, unable to tell which is current, hedged: "I see you enjoy both coffee and tea!" The customer, who specifically told the agent about the switch last month, hung up.

This is the wall you hit when vector memory is your only memory. In the previous tutorial, we built a complete memory system: session context, persistent storage, semantic search. That system works well for agents handling simple recall. But the moment your agent needs to understand how facts relate to each other, how they change over time, or why one piece of context invalidates another, flat vector embeddings fall apart.

The fix isn't more embeddings. It's a different data structure. Knowledge graphs store entities and the relationships between them, with timestamps on every edge. Your agent doesn't just know "coffee" and "tea." It knows Sarah prefers tea (as of March 12) and Sarah formerly preferred coffee (January through February) and the preference change is linked to a health reason. That's graph memory.

In this tutorial, you'll build it from scratch. Entity extraction from conversations, relationship storage with temporal tracking, graph-aware retrieval that resolves contradictions. Then we'll integrate Mem0's graph memory API and compare it against Zep and Letta. By the end, you'll know exactly when vector search is enough and when you need the graph.

What does graph memory actually look like?

Graph memory represents knowledge as a network of entities connected by typed, timestamped relationships. Instead of storing flat text like "Sarah prefers tea and works at Acme Corp," you decompose it into nodes and edges that the agent can traverse and reason about.

Here's the difference visually. Vector memory stores text chunks as points in embedding space. Graph memory stores the same information as a structured network.

Vector memory vs graph memory: same information, different structure

The vector side has three isolated chunks. A similarity search for "what does Sarah drink?" returns both the coffee and tea chunks with similar scores. The agent has to guess which is current.

The graph side has four entities and five relationships. A query for "what does Sarah drink?" traverses from Sarah to her active prefers edge, finds Tea, and can even explain why through the reason for switch edge. The invalidated coffee edge is still there for history, but it's clearly marked as superseded.

This isn't theoretical. Mem0's research on the LOCOMO benchmark showed a 26% relative accuracy improvement when combining graph memory with vector search. The accuracy gain comes specifically from queries that require relationship understanding, temporal reasoning, or contradiction resolution. For simple "find similar text" queries, vector search alone is fine.

Let's define the data structures and build it.

How do you extract entities from conversations?

Entity extraction converts unstructured conversation text into structured graph triples (subject, predicate, object). You send conversation turns to an LLM with a structured output schema, and it identifies entities, their types, and the relationships between them. The quality of your extraction prompt determines the quality of your entire graph.

First, set up both TypeScript and Python projects. You'll need an LLM for extraction and an embedding model for hybrid search later.

bash

# TypeScript
npm install openai @anthropic-ai/sdk
 
# Python
pip install openai anthropic pydantic

Here's the core extraction in TypeScript. The key is the structured output schema that forces the LLM to produce clean triples instead of free text.

typescript

import OpenAI from 'openai';
 
const openai = new OpenAI();
 
// The shape of a single extracted relationship
interface ExtractedTriple {
  subject: string;
  subjectType: 'person' | 'organization' | 'product' | 'preference' | 'event' | 'location';
  predicate: string;
  object: string;
  objectType: 'person' | 'organization' | 'product' | 'preference' | 'event' | 'location';
  confidence: number;
  temporal?: string; // "currently", "previously", "since March 2026"
}
 
interface ExtractionResult {
  entities: { name: string; type: string }[];
  relationships: ExtractedTriple[];
}
 
async function extractEntities(conversationText: string): Promise<ExtractionResult> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4.1-mini',
    temperature: 0.1,
    response_format: { type: 'json_object' },
    messages: [
      {
        role: 'system',
        content: `Extract entities and relationships from this conversation.
Return JSON with:
- entities: array of {name, type} where type is one of: person, organization, product, preference, event, location
- relationships: array of {subject, subjectType, predicate, object, objectType, confidence, temporal}
 
Rules:
- Normalize entity names (e.g., "Sarah" and "sarah" become "Sarah")
- Use specific predicates: "prefers", "works_at", "purchased", "switched_from", "complained_about"
- Include temporal markers when the conversation implies time ("just started", "used to", "since last month")
- Confidence is 0.0-1.0 based on how explicitly the relationship is stated
- Extract implicit relationships too: "I switched to tea" implies both "prefers tea" and "previously preferred" something else`,
      },
      {
        role: 'user',
        content: conversationText,
      },
    ],
  });
 
  const content = response.choices[0].message.content;
  return JSON.parse(content!) as ExtractionResult;
}

The extraction prompt does heavy lifting. Notice the rules section. Without explicit instructions to normalize entity names, you'll end up with separate nodes for "Sarah", "sarah", and "Sarah M." in your graph. Without temporal markers, you lose the time dimension that makes graph memory valuable. Without implicit relationship extraction, you miss the fact that "I switched to tea" also tells you something about a previous preference.

Here's the same extraction in Python, using Pydantic for schema validation.

python

from openai import OpenAI
from pydantic import BaseModel
 
client = OpenAI()
 
 
class Entity(BaseModel):
    name: str
    type: str  # person, organization, product, preference, event, location
 
 
class Triple(BaseModel):
    subject: str
    subject_type: str
    predicate: str
    object: str
    object_type: str
    confidence: float
    temporal: str | None = None
 
 
class ExtractionResult(BaseModel):
    entities: list[Entity]
    relationships: list[Triple]
 
 
def extract_entities(conversation_text: str) -> ExtractionResult:
    response = client.chat.completions.create(
        model="gpt-4.1-mini",
        temperature=0.1,
        response_format={"type": "json_object"},
        messages=[
            {
                "role": "system",
                "content": (
                    "Extract entities and relationships from this conversation. "
                    "Return JSON with: "
                    "- entities: [{name, type}] "
                    "- relationships: [{subject, subject_type, predicate, object, "
                    "object_type, confidence, temporal}] "
                    "Normalize names. Use specific predicates. Include temporal markers."
                ),
            },
            {"role": "user", "content": conversation_text},
        ],
    )
 
    content = response.choices[0].message.content
    return ExtractionResult.model_validate_json(content)

Let's test extraction with a realistic conversation snippet. This is the kind of input that breaks vector memory because it contains both a current fact and an implicit historical one.

typescript

const conversation = `
Customer: Hi, I'm Sarah from Acme Corp. I wanted to update my preferences.
Agent: Of course, Sarah. What would you like to change?
Customer: I've switched from coffee to tea for my morning meetings. Health reasons.
Agent: Got it. I'll update that. Anything else?
Customer: Also, we're moving our team from Slack to Microsoft Teams next month.
`;
 
const result = await extractEntities(conversation);
console.log(JSON.stringify(result, null, 2));

The extraction produces something like this.

json

{
  "entities": [
    { "name": "Sarah", "type": "person" },
    { "name": "Acme Corp", "type": "organization" },
    { "name": "Coffee", "type": "preference" },
    { "name": "Tea", "type": "preference" },
    { "name": "Slack", "type": "product" },
    { "name": "Microsoft Teams", "type": "product" }
  ],
  "relationships": [
    {
      "subject": "Sarah",
      "subjectType": "person",
      "predicate": "works_at",
      "object": "Acme Corp",
      "objectType": "organization",
      "confidence": 1.0,
      "temporal": "currently"
    },
    {
      "subject": "Sarah",
      "subjectType": "person",
      "predicate": "prefers",
      "object": "Tea",
      "objectType": "preference",
      "confidence": 0.95,
      "temporal": "since recently"
    },
    {
      "subject": "Sarah",
      "subjectType": "person",
      "predicate": "previously_preferred",
      "object": "Coffee",
      "objectType": "preference",
      "confidence": 0.9,
      "temporal": "previously"
    },
    {
      "subject": "Acme Corp",
      "subjectType": "organization",
      "predicate": "switching_to",
      "object": "Microsoft Teams",
      "objectType": "product",
      "confidence": 0.9,
      "temporal": "next month"
    },
    {
      "subject": "Acme Corp",
      "subjectType": "organization",
      "predicate": "currently_uses",
      "object": "Slack",
      "objectType": "product",
      "confidence": 0.85,
      "temporal": "currently, until next month"
    }
  ]
}

Five relationships from five sentences of conversation. The LLM caught the implicit "previously preferred coffee" relationship and the temporal nuance that the Slack-to-Teams migration is planned for next month, not yet complete. A vector embedding of this conversation would have captured none of that structure.

How do you store relationships with timestamps?

A graph memory store needs two collections: nodes (entities) and edges (relationships). Each edge carries timestamps that record when the relationship was first observed, last confirmed, and optionally when it was invalidated. This temporal layer is what separates a useful knowledge graph from a static one.

Here's the storage layer in TypeScript. We'll use a simple in-memory implementation first, then discuss production storage options.

typescript

interface GraphNode {
  id: string;
  name: string;
  type: string;
  properties: Record<string, unknown>;
  createdAt: Date;
  updatedAt: Date;
}
 
interface GraphEdge {
  id: string;
  sourceId: string;
  targetId: string;
  predicate: string;
  confidence: number;
  temporal: string | null;
  validFrom: Date;
  validUntil: Date | null; // null means currently active
  lastConfirmedAt: Date;
  source: string; // which conversation produced this
  properties: Record<string, unknown>;
}
 
class GraphMemoryStore {
  nodes: Map<string, GraphNode> = new Map();
  private edges: Map<string, GraphEdge> = new Map();
  private edgeIndex: Map<string, Set<string>> = new Map(); // nodeId -> edgeIds
 
  // Upsert a node. If it exists, update the timestamp. If not, create it.
  upsertNode(name: string, type: string, properties: Record<string, unknown> = {}): GraphNode {
    const id = this.normalizeId(name);
    const existing = this.nodes.get(id);
 
    if (existing) {
      existing.updatedAt = new Date();
      existing.properties = { ...existing.properties, ...properties };
      return existing;
    }
 
    const node: GraphNode = {
      id,
      name,
      type,
      properties,
      createdAt: new Date(),
      updatedAt: new Date(),
    };
 
    this.nodes.set(id, node);
    return node;
  }
 
  // Add a relationship between two entities.
  // If a conflicting active edge exists, invalidate it first.
  addEdge(
    sourceName: string,
    predicate: string,
    targetName: string,
    options: {
      confidence?: number;
      temporal?: string | null;
      source?: string;
    } = {},
  ): GraphEdge {
    const sourceId = this.normalizeId(sourceName);
    const targetId = this.normalizeId(targetName);
 
    // Check for conflicting active edges with the same predicate
    this.resolveConflicts(sourceId, predicate, targetId);
 
    const edge: GraphEdge = {
      id: `${sourceId}-${predicate}-${targetId}-${Date.now()}`,
      sourceId,
      targetId,
      predicate,
      confidence: options.confidence ?? 0.8,
      temporal: options.temporal ?? null,
      validFrom: new Date(),
      validUntil: null,
      lastConfirmedAt: new Date(),
      source: options.source ?? 'extraction',
      properties: {},
    };
 
    this.edges.set(edge.id, edge);
 
    // Index for fast lookup
    if (!this.edgeIndex.has(sourceId)) this.edgeIndex.set(sourceId, new Set());
    if (!this.edgeIndex.has(targetId)) this.edgeIndex.set(targetId, new Set());
    this.edgeIndex.get(sourceId)!.add(edge.id);
    this.edgeIndex.get(targetId)!.add(edge.id);
 
    return edge;
  }
 
  // Find all active relationships for an entity
  getActiveEdges(entityName: string): GraphEdge[] {
    const entityId = this.normalizeId(entityName);
    const edgeIds = this.edgeIndex.get(entityId) || new Set();
 
    return Array.from(edgeIds)
      .map((id) => this.edges.get(id)!)
      .filter((edge) => edge.validUntil === null);
  }
 
  // Find relationships by predicate type, e.g., all "prefers" edges for a person
  queryByPredicate(entityName: string, predicate: string): GraphEdge[] {
    return this.getActiveEdges(entityName).filter((e) => e.predicate === predicate);
  }
 
  // When new information contradicts an existing relationship,
  // invalidate the old edge instead of deleting it
  private resolveConflicts(sourceId: string, predicate: string, newTargetId: string): void {
    const edgeIds = this.edgeIndex.get(sourceId) || new Set();
 
    for (const edgeId of edgeIds) {
      const edge = this.edges.get(edgeId)!;
 
      // Same source, same predicate, different target, still active?
      // That's a contradiction. The old fact is now outdated.
      if (
        edge.sourceId === sourceId &&
        edge.predicate === predicate &&
        edge.targetId !== newTargetId &&
        edge.validUntil === null
      ) {
        edge.validUntil = new Date();
      }
    }
  }
 
  private normalizeId(name: string): string {
    return name.toLowerCase().replace(/\s+/g, '_');
  }
}

The critical design choice here is resolveConflicts. When Sarah switches from coffee to tea, we don't delete the coffee edge. We set its validUntil timestamp and create a new tea edge. This means the graph always has a complete history. You can answer "what does Sarah prefer now?" (follow active edges) and "what did Sarah prefer before?" (follow invalidated edges) from the same data structure.

Here's the equivalent Python implementation with dataclasses.

python

from dataclasses import dataclass, field
from datetime import datetime
 
 
@dataclass
class GraphNode:
    id: str
    name: str
    type: str
    properties: dict = field(default_factory=dict)
    created_at: datetime = field(default_factory=datetime.now)
    updated_at: datetime = field(default_factory=datetime.now)
 
 
@dataclass
class GraphEdge:
    id: str
    source_id: str
    target_id: str
    predicate: str
    confidence: float = 0.8
    temporal: str | None = None
    valid_from: datetime = field(default_factory=datetime.now)
    valid_until: datetime | None = None  # None = currently active
    last_confirmed_at: datetime = field(default_factory=datetime.now)
    source: str = "extraction"
    properties: dict = field(default_factory=dict)
 
 
class GraphMemoryStore:
    def __init__(self):
        self.nodes: dict[str, GraphNode] = {}
        self.edges: dict[str, GraphEdge] = {}
        self.edge_index: dict[str, set[str]] = {}
 
    def upsert_node(self, name: str, node_type: str, **properties) -> GraphNode:
        node_id = self._normalize_id(name)
        if node_id in self.nodes:
            self.nodes[node_id].updated_at = datetime.now()
            self.nodes[node_id].properties.update(properties)
            return self.nodes[node_id]
 
        node = GraphNode(id=node_id, name=name, type=node_type, properties=properties)
        self.nodes[node_id] = node
        return node
 
    def add_edge(
        self,
        source_name: str,
        predicate: str,
        target_name: str,
        confidence: float = 0.8,
        temporal: str | None = None,
        source: str = "extraction",
    ) -> GraphEdge:
        source_id = self._normalize_id(source_name)
        target_id = self._normalize_id(target_name)
        self._resolve_conflicts(source_id, predicate, target_id)
 
        edge_id = f"{source_id}-{predicate}-{target_id}-{datetime.now().timestamp()}"
        edge = GraphEdge(
            id=edge_id,
            source_id=source_id,
            target_id=target_id,
            predicate=predicate,
            confidence=confidence,
            temporal=temporal,
            source=source,
        )
        self.edges[edge_id] = edge
        self.edge_index.setdefault(source_id, set()).add(edge_id)
        self.edge_index.setdefault(target_id, set()).add(edge_id)
        return edge
 
    def get_active_edges(self, entity_name: str) -> list[GraphEdge]:
        entity_id = self._normalize_id(entity_name)
        edge_ids = self.edge_index.get(entity_id, set())
        return [self.edges[eid] for eid in edge_ids if self.edges[eid].valid_until is None]
 
    def _resolve_conflicts(self, source_id: str, predicate: str, new_target_id: str):
        for edge_id in self.edge_index.get(source_id, set()):
            edge = self.edges[edge_id]
            if (
                edge.source_id == source_id
                and edge.predicate == predicate
                and edge.target_id != new_target_id
                and edge.valid_until is None
            ):
                edge.valid_until = datetime.now()
 
    def _normalize_id(self, name: str) -> str:
        return name.lower().replace(" ", "_")

Now let's wire extraction into storage. This is the ingestion pipeline that runs after each conversation.

typescript

async function ingestConversation(
  store: GraphMemoryStore,
  conversationText: string,
  conversationId: string,
): Promise<void> {
  // Step 1: Extract entities and relationships from the conversation
  const extraction = await extractEntities(conversationText);
 
  // Step 2: Upsert all entities as nodes
  for (const entity of extraction.entities) {
    store.upsertNode(entity.name, entity.type);
  }
 
  // Step 3: Add all relationships as edges with temporal context
  for (const rel of extraction.relationships) {
    // If the relationship is explicitly about the past, mark it as invalidated
    const isPastRelationship = rel.temporal?.includes('previously') ||
      rel.temporal?.includes('formerly') ||
      rel.temporal?.includes('used to');
 
    const edge = store.addEdge(rel.subject, rel.predicate, rel.object, {
      confidence: rel.confidence,
      temporal: rel.temporal,
      source: conversationId,
    });
 
    // Historical relationships get invalidated immediately
    if (isPastRelationship) {
      edge.validUntil = edge.validFrom;
    }
  }
}

Test it with our Sarah conversation.

typescript

const store = new GraphMemoryStore();
await ingestConversation(store, conversation, 'conv_001');
 
// What does Sarah currently prefer?
const preferences = store.queryByPredicate('Sarah', 'prefers');
console.log('Current preferences:', preferences.map((e) => ({
  item: store.nodes.get(e.targetId)?.name,
  since: e.validFrom,
})));
// Output: [{ item: "Tea", since: "2026-04-01T..." }]
 
// What's Sarah's full relationship history?
const allEdges = store.getActiveEdges('Sarah');
console.log('Active relationships:', allEdges.map((e) => ({
  predicate: e.predicate,
  target: store.nodes.get(e.targetId)?.name,
  temporal: e.temporal,
})));

The store correctly tracks that Sarah's tea preference is active while the coffee preference is invalidated. Two mechanisms work together here: the resolveConflicts method catches cases where the same predicate points to a new target, and the isPastRelationship check in the ingestion function catches edges that the LLM explicitly marked as historical (like "previously_preferred"). If you had stored these as vector embeddings instead, both "Sarah prefers coffee" and "Sarah prefers tea" would sit in your vector database with similar similarity scores, and your agent would have no way to know which is current.

How do you query a graph at retrieval time?

Storing entities and edges is half the problem. The other half is retrieving the right subgraph when your agent needs context for a conversation. You traverse relationships, follow chains of connections, and filter by temporal validity, all before the information hits the LLM's context window. This is where graph memory pulls ahead of vector search.

Graph retrieval works in three steps. First, identify which entities in the user's current message map to nodes in the graph. Second, traverse outward from those nodes to collect relevant relationships. Third, format the subgraph as natural language context for the LLM.

typescript

async function retrieveContext(
  store: GraphMemoryStore,
  userMessage: string,
  options: { maxHops?: number; maxEdges?: number } = {},
): Promise<string> {
  const maxHops = options.maxHops ?? 2;
  const maxEdges = options.maxEdges ?? 15;
 
  // Step 1: Extract entity mentions from the current message.
  // Reuse the extraction function from earlier, but only pull out entity names.
  const extraction = await extractEntities(userMessage);
  const mentioned = extraction.entities.map((e) => e.name);
 
  // Step 2: Traverse the graph outward from mentioned entities
  const relevantEdges: GraphEdge[] = [];
  const visited = new Set<string>();
 
  for (const entityName of mentioned) {
    traverseFrom(store, entityName, maxHops, relevantEdges, visited);
  }
 
  // Step 3: Deduplicate and rank by confidence + recency
  const ranked = relevantEdges
    .filter((e, i, arr) => arr.findIndex((x) => x.id === e.id) === i)
    .sort((a, b) => {
      // Active edges first, then by confidence, then by recency
      if (a.validUntil === null && b.validUntil !== null) return -1;
      if (a.validUntil !== null && b.validUntil === null) return 1;
      if (b.confidence !== a.confidence) return b.confidence - a.confidence;
      return b.lastConfirmedAt.getTime() - a.lastConfirmedAt.getTime();
    })
    .slice(0, maxEdges);
 
  // Step 4: Format as natural language context
  return formatGraphContext(store, ranked);
}
 
function traverseFrom(
  store: GraphMemoryStore,
  entityName: string,
  hopsRemaining: number,
  collected: GraphEdge[],
  visited: Set<string>,
): void {
  const entityId = entityName.toLowerCase().replace(/\s+/g, '_');
  if (visited.has(entityId) || hopsRemaining <= 0) return;
  visited.add(entityId);
 
  const edges = store.getActiveEdges(entityName);
  collected.push(...edges);
 
  // Continue traversal to connected entities
  for (const edge of edges) {
    const nextId = edge.sourceId === entityId ? edge.targetId : edge.sourceId;
    const nextNode = store.nodes.get(nextId);
    if (nextNode) {
      traverseFrom(store, nextNode.name, hopsRemaining - 1, collected, visited);
    }
  }
}
 
function formatGraphContext(store: GraphMemoryStore, edges: GraphEdge[]): string {
  const lines: string[] = ['Known facts about this customer:'];
 
  for (const edge of edges) {
    const source = store.nodes.get(edge.sourceId);
    const target = store.nodes.get(edge.targetId);
    if (!source || !target) continue;
 
    const status = edge.validUntil ? '[outdated]' : '[current]';
    const temporal = edge.temporal ? ` (${edge.temporal})` : '';
    lines.push(`- ${status} ${source.name} ${edge.predicate} ${target.name}${temporal}`);
  }
 
  return lines.join('\n');
}

The traverseFrom function does a breadth-limited walk from each mentioned entity. With maxHops: 2, if the user mentions "Sarah," the retrieval finds Sarah's direct relationships (prefers tea, works at Acme Corp) and also the relationships one hop further out (Acme Corp is switching to Microsoft Teams). This multi-hop traversal is something vector search simply cannot do. You'd need to embed every possible combination of related facts, which doesn't scale.

Here's a simpler Python version of the retrieval.

python

def retrieve_context(
    store: GraphMemoryStore,
    entity_names: list[str],
    max_hops: int = 2,
) -> str:
    """Traverse graph from mentioned entities and format as context."""
    relevant_edges: list[GraphEdge] = []
    visited: set[str] = set()
 
    def traverse(name: str, hops: int):
        node_id = store._normalize_id(name)
        if node_id in visited or hops <= 0:
            return
        visited.add(node_id)
        edges = store.get_active_edges(name)
        relevant_edges.extend(edges)
        for edge in edges:
            next_id = edge.target_id if edge.source_id == node_id else edge.source_id
            next_node = store.nodes.get(next_id)
            if next_node:
                traverse(next_node.name, hops - 1)
 
    for name in entity_names:
        traverse(name, max_hops)
 
    # Format as readable context
    lines = ["Known facts about this customer:"]
    seen_ids = set()
    for edge in relevant_edges:
        if edge.id in seen_ids:
            continue
        seen_ids.add(edge.id)
        src = store.nodes.get(edge.source_id)
        tgt = store.nodes.get(edge.target_id)
        if src and tgt:
            status = "[current]" if edge.valid_until is None else "[outdated]"
            lines.append(f"- {status} {src.name} {edge.predicate} {tgt.name}")
 
    return "\n".join(lines)

The formatted context that gets injected into the agent's system prompt looks like this.

text

Known facts about this customer:
- [current] Sarah prefers Tea (since recently)
- [current] Sarah works_at Acme Corp (currently)
- [current] Acme Corp switching_to Microsoft Teams (next month)
- [current] Acme Corp currently_uses Slack (currently, until next month)

Compare that to what vector search would produce: two separate chunks about coffee and tea, both with high similarity scores, no indication of which is current, and no connection between Sarah's employer and their upcoming platform migration. The graph context gives the agent structured, time-aware, relationship-rich information that directly improves response quality.

How do you handle temporal knowledge?

Temporal memory tracks when facts were true, not just what the facts are. This is the piece most memory systems miss entirely, and it's where Zep's architecture is particularly strong. A customer's preferences, account status, team composition, and tool choices all change over time. An agent that treats every stored fact as equally current will contradict itself.

The temporal layer we built into our edge model already handles the basics. Every edge has validFrom, validUntil, and lastConfirmedAt timestamps. But there are three temporal patterns that need explicit handling in production.

Pattern 1: Scheduled future changes. "We're switching to Teams next month" is a fact about the future. The relationship isn't active yet, but the agent should know about it. Add a scheduledFor field to edges.

typescript

interface TemporalEdge extends GraphEdge {
  scheduledFor: Date | null; // When this relationship will become active
  expiresAt: Date | null;    // When this relationship automatically invalidates
}
 
function addScheduledChange(
  store: GraphMemoryStore,
  sourceName: string,
  predicate: string,
  targetName: string,
  scheduledDate: Date,
): GraphEdge {
  // Create the edge but mark it as not yet active
  const edge = store.addEdge(sourceName, predicate, targetName, {
    temporal: `scheduled for ${scheduledDate.toISOString().split('T')[0]}`,
    confidence: 0.85,
  });
 
  // The edge exists in the graph but retrieval can filter by scheduledFor
  (edge as TemporalEdge).scheduledFor = scheduledDate;
  return edge;
}

Pattern 2: Confidence decay. Facts get stale. A preference stated six months ago is less reliable than one stated yesterday. Instead of hard-coding expiration rules, apply a decay function during retrieval.

typescript

function decayedConfidence(edge: GraphEdge, now: Date = new Date()): number {
  const ageMs = now.getTime() - edge.lastConfirmedAt.getTime();
  const ageDays = ageMs / (1000 * 60 * 60 * 24);
 
  // Half-life of 90 days: confidence halves every 3 months
  const halfLife = 90;
  const decay = Math.pow(0.5, ageDays / halfLife);
 
  return edge.confidence * decay;
}

With a 90-day half-life, a preference stated with 0.95 confidence three months ago decays to 0.475. Six months out, it's 0.237. The agent still knows the fact exists, but it's weighted lower in retrieval ranking. If the customer confirms the preference again, lastConfirmedAt resets and the decayed confidence jumps back up.

Pattern 3: Contradiction detection. When an ingested conversation contains information that contradicts an existing active edge, you need to decide what to do. Our resolveConflicts method handles simple cases (same predicate, different target), but real conversations produce subtler contradictions.

python

def detect_contradictions(
    store: GraphMemoryStore,
    new_triples: list[Triple],
) -> list[dict]:
    """Find new triples that conflict with existing active edges."""
    contradictions = []
 
    # Define predicate groups where only one can be active
    exclusive_predicates = {
        "prefers": ["prefers", "switched_to", "chose"],
        "works_at": ["works_at", "employed_by", "joined"],
        "uses": ["uses", "currently_uses", "adopted"],
    }
 
    for triple in new_triples:
        source_id = store._normalize_id(triple.subject)
        active_edges = store.get_active_edges(triple.subject)
 
        for edge in active_edges:
            # Check if the new triple's predicate conflicts with an existing one
            for group_name, predicates in exclusive_predicates.items():
                if triple.predicate in predicates and edge.predicate in predicates:
                    if edge.target_id != store._normalize_id(triple.object):
                        contradictions.append({
                            "existing": {
                                "predicate": edge.predicate,
                                "target": store.nodes[edge.target_id].name,
                                "since": edge.valid_from,
                            },
                            "new": {
                                "predicate": triple.predicate,
                                "target": triple.object,
                            },
                            "resolution": "invalidate_existing",
                        })
 
    return contradictions

This contradiction detector groups related predicates together. "prefers," "switched_to," and "chose" are all in the same group, so if an active "prefers Coffee" edge exists and a new "switched_to Tea" triple arrives, the system flags it as a contradiction and invalidates the old edge. This is the temporal reasoning that Zep's architecture handles natively with its temporal knowledge graph.

How does Mem0's graph memory work in practice?

Building from scratch teaches you the mechanics. For production, you'll probably want a battle-tested implementation. Mem0's graph memory layer combines vector search with automatic entity extraction and relationship tracking. Their benchmarks show a 26% accuracy improvement over vector-only approaches on the LOCOMO benchmark.

Here's how to integrate Mem0's graph memory in both languages. You'll need a Mem0 API key, which you can get from their dashboard.

typescript

import { MemoryClient } from 'mem0ai';
 
const mem0 = new MemoryClient({ apiKey: process.env.MEM0_API_KEY! });
 
// Add a conversation to memory.
// Mem0 automatically extracts entities, builds the graph,
// and creates vector embeddings for hybrid search.
async function addToMemory(
  userId: string,
  conversation: { role: string; content: string }[],
): Promise<void> {
  await mem0.add(conversation, {
    user_id: userId,
    // Mem0 graph memory is enabled via project config or the API version flag.
    // Check docs.mem0.ai for the latest parameter names.
    version: 'v2',
  });
}
 
// Search memory using hybrid vector + graph retrieval.
// Mem0 automatically traverses the knowledge graph
// and combines results with vector similarity scores.
async function searchMemory(userId: string, query: string) {
  const results = await mem0.search(query, {
    user_id: userId,
    version: 'v2',
  });
 
  return results;
}
 
// Example usage
await addToMemory('sarah_acme', [
  { role: 'user', content: "I've switched from coffee to tea. Health reasons." },
  { role: 'assistant', content: "Updated your preference to tea. Is there anything else?" },
  { role: 'user', content: "We're moving from Slack to Microsoft Teams next month." },
]);
 
// Later, in a new conversation:
const context = await searchMemory('sarah_acme', 'What does this customer drink?');
// Returns: tea preference (current), with graph-derived context about the switch from coffee

The Python integration follows the same pattern.

python

from mem0 import MemoryClient
 
mem0 = MemoryClient(api_key="your-api-key")
 
# Add conversation with automatic graph extraction
mem0.add(
    messages=[
        {"role": "user", "content": "I switched from coffee to tea. Health reasons."},
        {"role": "assistant", "content": "Updated. Anything else?"},
        {"role": "user", "content": "Moving from Slack to Teams next month."},
    ],
    user_id="sarah_acme",
    version="v2",  # hybrid graph + vector
)
 
# Search with graph-aware retrieval
results = mem0.search(
    query="What does this customer prefer to drink?",
    user_id="sarah_acme",
    version="v2",
)
 
for memory in results["results"]:
    print(f"Memory: {memory['memory']}")
    print(f"Score: {memory['score']}")

Mem0's v2 API does three things under the hood that our scratch implementation handles manually. First, it runs entity extraction on every add call, identifying people, products, and preferences. Second, it maintains a knowledge graph alongside the vector store, with automatic conflict resolution when new facts contradict old ones. Third, it combines graph traversal with vector similarity during search, returning results ranked by both semantic relevance and relationship proximity.

The key difference from our scratch implementation is scale. Mem0 handles deduplication across thousands of conversations, graph merging when the same entity appears in different contexts, and efficient storage with their hosted infrastructure. For a prototype with one agent and a hundred memories, the scratch version works fine. For a production system processing thousands of conversations per day across multiple agents, you want the managed version.

If you've already built the memory system from the previous tutorial, adding Mem0's graph layer is an upgrade path, not a replacement. Your existing vector memories continue to work. The graph layer adds relationship awareness on top.

Mem0 vs Zep vs Letta: which one fits?

All three platforms solve agent memory, but they approach it from different architectural directions. The right choice depends on which memory problem is your biggest bottleneck. Here's a comparison based on what each system actually does well.

Capability	Mem0	Zep	Letta
Core architecture	Hybrid vector + graph	Temporal knowledge graph	Self-editing memory blocks
Entity extraction	Automatic on ingest	Automatic with temporal tracking	Agent-controlled
Contradiction handling	Graph-based resolution	Temporal versioning (when facts were true)	Agent decides what to keep
Temporal reasoning	Timestamps on relationships	First-class temporal edges with validity periods	Manual, agent-managed
Multi-hop queries	Yes, via graph traversal	Yes, via graph traversal	No, memory is block-structured
Token efficiency	90% reduction vs full context (claimed)	Significant reduction via graph summarization	Agent controls context assembly
Accuracy improvement	26% over vector-only (LOCOMO)	Strong on temporal queries	Depends on agent's memory strategy
Self-hosted option	Yes (open source)	Yes (open source)	Yes (open source)
Best for	General-purpose agent memory	Enterprise agents needing audit trails	Agents that reason about their own knowledge

Choose Mem0 when you need a drop-in upgrade from vector-only memory and want the broadest accuracy improvement. Mem0's hybrid approach works well across question types: simple recall, preference tracking, relationship queries, and temporal questions. The 26% accuracy improvement on LOCOMO isn't specific to one query type. It's a general lift. If you followed the memory tutorial and want to add graph capabilities without rebuilding, Mem0 is the most straightforward path.

Choose Zep when your agents operate in domains where facts change frequently and you need to track the full history of those changes. Think enterprise account management, healthcare, financial services. The question "what was the customer's plan last quarter?" is just as important as "what's their plan now?" Zep's temporal knowledge graph stores validity periods natively. Every edge has explicit valid_from and valid_to fields, not as an afterthought (like our scratch implementation) but as a core storage primitive. If your evaluation framework includes temporal accuracy as a criterion, Zep will score highest.

Choose Letta when you're building agents that need to actively reason about what they know and don't know. Letta (formerly MemGPT) gives agents direct read/write access to their own memory blocks. The agent can decide "this fact is outdated, I should update it" or "I don't have enough information about this customer's technical setup, I should note that as a gap." This is powerful for agents that handle complex, evolving situations, like a technical support agent managing an enterprise deployment over weeks. The tradeoff is that the agent's memory quality depends on how well the agent manages its own memory, which is a prompt engineering challenge on top of the memory engineering challenge.

For most teams starting out, the progression is: vector-only (the previous tutorial), then Mem0 for graph augmentation, then Zep or Letta if you have specific temporal or agent-autonomy requirements.

What does the retrieval pipeline look like end to end?

The pipeline runs four stages per message: retrieve graph context, build the system prompt, call the LLM, then ingest new facts asynchronously. Here's the complete flow from user message to context-enriched response, combining everything we've built into a single function you can drop into an existing agent.

typescript

import Anthropic from '@anthropic-ai/sdk';
 
const anthropic = new Anthropic();
 
async function handleMessage(
  store: GraphMemoryStore,
  customerId: string,
  userMessage: string,
  conversationHistory: { role: 'user' | 'assistant'; content: string }[],
): Promise<string> {
  // 1. Retrieve graph context for the current message
  const graphContext = await retrieveContext(store, userMessage, {
    maxHops: 2,
    maxEdges: 15,
  });
 
  // 2. Build the system prompt with injected memory context
  const systemPrompt = `You are a helpful customer service agent.
 
${graphContext}
 
Use the known facts above to personalize your response. If a fact is marked [outdated],
do not reference it unless the customer asks about history. Prefer [current] facts.
If you learn new information from the customer, acknowledge it naturally.`;
 
  // 3. Call the LLM with context-enriched prompt
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-5-20250514',
    max_tokens: 1024,
    system: systemPrompt,
    messages: [
      ...conversationHistory.map((m) => ({
        role: m.role as 'user' | 'assistant',
        content: m.content,
      })),
      { role: 'user', content: userMessage },
    ],
  });
 
  const assistantMessage =
    response.content[0].type === 'text' ? response.content[0].text : '';
 
  // 4. After responding, ingest the conversation turn into the graph
  //    This runs asynchronously so it doesn't block the response
  const turnText = `Customer: ${userMessage}\nAgent: ${assistantMessage}`;
  ingestConversation(store, turnText, `${customerId}_${Date.now()}`).catch(
    (err) => console.error('Graph ingestion failed:', err),
  );
 
  return assistantMessage;
}

The pipeline runs in four stages. Retrieval happens before the LLM call, giving the agent structured context about the customer's relationships, preferences, and history. The LLM generates a response using that context. Then ingestion runs asynchronously after the response is sent, extracting any new entities or relationships from the conversation turn and updating the graph.

The ingestion step is fire-and-forget on purpose. If extraction fails, the customer still gets their response. The graph misses one update, which will likely get captured on the next conversation. For production systems, you'd want a retry queue for failed extractions, but the agent's response latency should never depend on graph writes.

Here's how the same pipeline looks in Python.

python

from anthropic import Anthropic
 
anthropic_client = Anthropic()
 
 
def handle_message(
    store: GraphMemoryStore,
    customer_id: str,
    user_message: str,
    conversation_history: list[dict],
) -> str:
    # 1. Retrieve graph context
    # In production, use entity extraction on user_message to find mentioned entities.
    # For simplicity, we search for the customer directly.
    entity_names = [customer_id]  # Expand this with NER on user_message
    graph_context = retrieve_context(store, entity_names, max_hops=2)
 
    # 2. Build system prompt with memory context
    system_prompt = f"""You are a helpful customer service agent.
 
{graph_context}
 
Use the known facts above to personalize your response. Prefer [current] facts.
Do not reference [outdated] facts unless the customer asks about history."""
 
    # 3. Call LLM
    response = anthropic_client.messages.create(
        model="claude-sonnet-4-5-20250514",
        max_tokens=1024,
        system=system_prompt,
        messages=[
            *conversation_history,
            {"role": "user", "content": user_message},
        ],
    )
 
    assistant_message = response.content[0].text
 
    # 4. Async ingestion (simplified, synchronous here)
    turn_text = f"Customer: {user_message}\nAgent: {assistant_message}"
    try:
        extraction = extract_entities(turn_text)
        for entity in extraction.entities:
            store.upsert_node(entity.name, entity.type)
        for rel in extraction.relationships:
            store.add_edge(
                rel.subject, rel.predicate, rel.object,
                confidence=rel.confidence, temporal=rel.temporal,
            )
    except Exception as e:
        print(f"Ingestion failed: {e}")
 
    return assistant_message

When this pipeline handles the "what does Sarah drink?" question, the agent gets structured context showing that Sarah currently prefers tea (with temporal context), that the switch was recent, and that it was health-related. The response comes back as something like "I have you down for tea at your morning meetings, Sarah" instead of the confused "I see you enjoy both coffee and tea!" that vector-only memory produces.

When should you stick with vector search?

Graph memory isn't always worth the added complexity. If your agent handles short-lived conversations, stores facts that don't change, or operates at a small scale, vector search is simpler, cheaper, and good enough. Here's the breakdown of when each approach earns its place.

Vector search is enough when:

Your agent handles single-session interactions where cross-conversation recall isn't critical. A customer support chatbot that resolves tickets in one conversation doesn't need a knowledge graph.
Memories are self-contained facts that don't relate to each other. "Customer prefers email" and "Customer is on Enterprise plan" are independent facts that vector search retrieves well.
Your dataset is small enough (under 1,000 memories per entity) that retrieval precision isn't a bottleneck. At that scale, top-5 vector results usually contain what you need.
Temporal reasoning isn't required. If all your stored facts are permanent truths (company name, account type, product purchased), timestamps don't add value.

You need graph memory when:

Facts contradict each other over time and the agent needs to know which is current. Any preference, status, or relationship that changes requires temporal tracking.
Your agent needs multi-hop reasoning. "What communication tool does Sarah's company use?" requires traversing Sarah to Acme Corp to Slack/Teams. Vector search on "Sarah's communication tool" won't find that connection.
You're building agents for enterprise accounts where relationship mapping matters. Who reports to whom, which team owns which product, what vendor relationships exist. These are inherently graph problems.
You need audit trails and explainability. Graph memory with temporal edges gives you a complete history of what the agent knew and when, making it possible to explain why the agent said something in a particular conversation.
Your evaluation framework reveals that your agent frequently fails on questions involving preferences that changed, relationships between entities, or historical context.

The cost difference is one LLM extraction call per conversation turn (roughly $0.0001 at GPT-4.1-mini pricing) plus the graph storage overhead. For customer-facing agents where accuracy directly impacts satisfaction, that's a rounding error. For internal tools processing thousands of low-stakes conversations, the extraction cost might not justify the accuracy gain.

If you've already built the RAG pipeline from the earlier tutorial, graph memory is a complementary layer, not a replacement. Vector search over your knowledge base documents (product docs, policies, FAQs) handles factual retrieval. Graph memory over customer conversations handles relationship and preference retrieval. Run both, merge the contexts, and your agent gets the best of both.

What changes when you take graph memory to production?

The in-memory implementation above works for prototyping. Production graph memory needs five additional pieces.

Storage backend. Our in-memory Map doesn't survive process restarts. For production, store nodes and edges in MongoDB (with indexes on sourceId, targetId, and validUntil) or PostgreSQL with JSONB columns. You only need Neo4j or a dedicated graph database when you're doing complex multi-hop traversals across millions of edges. For most agent memory graphs (thousands of entities, tens of thousands of edges), document databases handle the query patterns just fine.

Deduplication. Entity extraction from different conversations will produce slight variations: "Acme Corp," "Acme," "ACME Corporation." Our normalizeId function handles case and whitespace, but production systems need fuzzy matching, potentially backed by embeddings of entity names. Mem0 handles this automatically.

Access control. In multi-agent systems, not every agent should see every relationship. A support agent shouldn't access sales pipeline relationships, even if they're about the same customer. Scope your graph queries by agent role, and use tool-level access control to enforce boundaries.

Graph pruning. Knowledge graphs grow without bound. Old, low-confidence, never-accessed edges accumulate. Implement a pruning job that removes edges older than your retention period, edges with decayed confidence below a threshold, and edges that have never been accessed (which means they were extracted but never useful). This is the graph equivalent of the TTL and access-count patterns from the memory tutorial.

Evaluation. How do you know your graph memory is actually improving agent quality? Track two metrics. First, retrieval precision: when the agent uses graph context in a response, was that context relevant? You can measure this with scenario testing where you script conversations that require temporal reasoning and check whether the agent handles them correctly. Second, contradiction rate: how often does the agent surface outdated information? Graph memory should drive this toward zero.

Wrapping up

Vector memory gives your agent recall. Graph memory gives it understanding.

The progression goes like this: start with session and persistent memory. Add vector search for semantic retrieval using the RAG patterns. Then, when your agents start failing on temporal questions or relationship reasoning, add graph memory with the patterns from this tutorial. Mem0 makes that last step a single API integration instead of a custom build.

The data model is the same everywhere: nodes, edges, timestamps. The extraction pipeline is the same: LLM structured output producing triples. The retrieval logic is the same: traverse from mentioned entities, rank by recency and confidence, inject as context. Whether you build it yourself or use Mem0, Zep, or Letta, you now understand what's happening underneath.

Remember Sarah? Your agent shouldn't just know she likes coffee. It should know she switched to tea three weeks ago because she's pregnant. And it definitely shouldn't say "I see you enjoy both coffee and tea!"

Give your agents memory that understands relationships

Chanl handles entity extraction, graph storage, temporal tracking, and privacy controls. Your agents remember what matters, forget what they should, and never confuse yesterday's facts with today's.

Explore Chanl Memory

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

learning-ai memory knowledge-graph ai-agents typescript python rag customer-experience

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Aprende IA Agéntica

Una lección por semana: técnicas prácticas para construir, probar y lanzar agentes IA. Desde ingeniería de prompts hasta monitoreo en producción. Aprende haciendo.