A customer calls your support line on Monday to discuss a billing dispute. They spend twelve minutes explaining their account history, the charge they're contesting, and their preferred resolution. The agent resolves it — partially. On Wednesday, the customer reaches out again — this time via chat. A different agent picks up. "Can you start by telling me what this is about?"
Sound familiar? It should. Research shows that 84% of consumers report repeating themselves "often" or "all the time" when contacting a brand, and roughly 32% of all inbound contacts come from repeat customers who reach out an average of 3.4 times per month — often across different channels. That's not just annoying — it's expensive. Every minute spent re-establishing context is a minute not spent solving the actual problem.
The fix isn't better call notes or smarter CRM lookups. It's memory — persistent, semantic memory baked into the AI agent itself. And in 2026, you've got real options for how to get there. The question is whether you should grab something off the shelf or build your own.
The Memory Gold Rush
Here's some context for how fast this space is moving. The agentic AI memory market hit $6.27 billion in 2025 and is projected to reach $28.45 billion by 2030 — a 35% compound annual growth rate. Frameworks that didn't exist two years ago now have tens of thousands of developers building on them.
But here's the catch: Gartner projects that more than 40% of agentic AI projects will be canceled by the end of 2027. Many of those failures will trace back to a foundational architecture decision made too early or too casually — and memory is often that decision. Pick the wrong approach and you're either locked into a vendor that doesn't fit your latency budget, or you're maintaining infrastructure that drains engineering bandwidth for years.
“More than 40% of agentic AI projects will find themselves canceled by the end of 2027.”
So let's look at what's actually out there, what it costs you (in money and in flexibility), and when it makes sense to roll your own.
The Memory Landscape in 2026
Four distinct approaches have emerged, each with a different philosophy about what memory should be and who should control it.
Mem0 is the most popular purpose-built option, with over 50,000 developers on the platform. It takes a hybrid approach — vector-based semantic search combined with an optional graph layer for entity relationships. The headline feature is an intelligent compression engine that claims up to 80% prompt token reduction by distilling raw conversation history into optimized memory representations. You can self-host the open-source version or use their managed cloud.
Zep goes deeper on temporal reasoning. Instead of treating memories as static facts, Zep stores them in a temporal knowledge graph that tracks how information changes over time. When a customer says "I moved from Austin to Seattle," Zep doesn't just record the new city — it knows the old one, when it changed, and can reason about the transition. That's powerful for enterprise scenarios where relationship modeling matters. Zep claims sub-100ms retrieval times and offers Python, TypeScript, and Go SDKs.
Letta (formerly MemGPT) pioneered the idea that agents should actively edit their own memory. Rather than passively logging facts, Letta agents maintain structured memory blocks within their context window and update them in real-time. When a user mentions switching from Python to TypeScript, the agent doesn't append a log entry — it rewrites its understanding of that user's tech stack. It's a fundamentally different mental model: memory as living state, not append-only storage.
Build-your-own is the fourth path, and it's more viable than you might think. MongoDB's vector search, Postgres with pgvector, or even pure cosine similarity over embeddings stored alongside your existing data. No new infrastructure, no new vendor, full control.
Mem0 Developers
Zep Retrieval
Memory Market
Off-the-Shelf: What You Get and What You Give Up
The appeal is obvious. Drop in a SDK, call mem0.add() or zep.memory.add(), and you've got persistent memory in an afternoon. Mem0's compression engine genuinely reduces token costs. Zep's temporal graph catches nuances that simple vector search misses. Letta's self-editing blocks feel almost magical when you first see an agent rewrite its own understanding of a user.
But let's talk about what you're trading away.
Scoring opacity. When Mem0 ranks memories by relevance, you're trusting their algorithm. You can't tune the weighting between semantic similarity, recency, and confidence. For a simple chatbot, that's probably fine. For an agent handling insurance claims, medical intake, or financial advisory — "probably fine" isn't good enough. You need to know exactly why memory A was injected over memory B, and you need to adjust those weights when your domain demands it.
Latency budgets. Every agent has a latency budget, but the constraints vary wildly by channel. Voice agents have roughly 300 milliseconds before the silence feels awkward. Chat agents can tolerate a second. But even for text-based agents, memory retrieval that adds 200ms to every response compounds fast when customers are used to instant replies. Zep's sub-100ms retrieval sounds great until you add the round trip to their cloud, the embedding generation, and the prompt assembly.
Data residency. Where do those memories live? If you're in healthcare (HIPAA), finance (SOX/PCI), or serving European customers (GDPR), the answer matters enormously. Mem0's cloud stores data on their infrastructure. Zep offers a cloud product too. Self-hosting either one is possible but adds operational overhead that partially defeats the purpose of buying. The EU's AI Act and GDPR now converge on AI systems handling personal data — organizations must conduct Data Protection Impact Assessments for memory systems that store customer information, and the "right to be forgotten" applies to every memory your system creates.
Vendor dependency at scale. Memory is not a commodity service you can swap easily. Once your agents are tuned to a specific memory format, scoring algorithm, and retrieval pattern, migration is painful. Cognee's benchmarks showed that different memory frameworks produce meaningfully different results on multi-hop reasoning — outperforming competitors by 15-20% on exact match scores. That means switching frameworks doesn't just require code changes; it changes what your agents remember and how well they reason.
Building Your Own: The Full Picture
Now here's where it gets interesting for teams that need tight control.
Building a memory layer from scratch sounds intimidating, but the core architecture is surprisingly straightforward if you already have a database and an embedding model. You need four things: a way to store memories with embeddings, a similarity search function, a scoring algorithm, and an injection mechanism that gets memories into your agent's prompt.
Here's what a production-grade memory injection looks like. The agent's system prompt gets enriched with customer context before the conversation begins:
// Memory injection at session start
const memories = await memoryService.search(workspaceId, {
entityType: 'customer',
entityId: customerId,
query: 'customer context preferences history',
limit: 10,
minScore: 0.3,
});
if (memories.length > 0) {
const bullets = memories.map(m => `- ${m.content}`).join('\n');
// XML-fenced to reduce prompt injection risk
const context = `<customer_memories>\n${bullets}\n</customer_memories>`;
if (systemPrompt.includes('{{customerMemories}}')) {
systemPrompt = systemPrompt.replace(
/\{\{customerMemories\}\}/g, context
);
} else {
systemPrompt += `\n\nCustomer Context:\n${context}`;
}
}The key design decisions in a custom build aren't about the vector math — that's commoditized. They're about scoring, scoping, and lifecycle.
Composite scoring lets you control exactly how memories are ranked. A typical production setup weights semantic similarity at 60%, extraction confidence at 20%, recency at 10%, and access frequency at 10%. Those numbers aren't arbitrary — they reflect the reality that a highly relevant memory from last week matters more than a vaguely relevant memory from yesterday, but a memory the agent keeps pulling up is probably important regardless of when it was created. Try getting that level of control from Mem0's API.
Entity scoping means different types of memories have different lifetimes. Customer memories (preferences, account details) persist forever. Session memories (current issue context) expire in 24 hours. Conversation summaries stick around for 90 days. Agent-learned patterns (escalation thresholds, common resolutions) persist indefinitely. This isn't just organization — it's how you prevent memory bloat from degrading search quality over time.
Security fencing is critical and often overlooked. When you inject memories into a prompt, you're potentially giving the LLM content that originated from previous conversations — some of which might contain adversarial input. XML-fencing the memory block (wrapping it in <customer_memories> tags) and setting confidence floors for sensitive use cases are production necessities, not nice-to-haves.
The Risk Matrix
Let's put the trade-offs side by side. This isn't theoretical — these are the actual pain points teams report after 6-12 months in production.
| Risk | Off-the-Shelf (Mem0/Zep/Letta) | Build Your Own |
|---|---|---|
| Time to first memory | Hours to days | Weeks to months |
| Compliance control | Limited — data on vendor infra (self-host possible but complex) | Full — your database, your rules, your DPIAs |
| Latency budget | Extra network hop adds 50-200ms | Co-located with your agent, minimal overhead |
| Scoring tunability | Black box or limited params | Full control over every weight |
| Maintenance burden | Vendor handles infra, you handle integration | You own everything — updates, scaling, monitoring |
| Migration risk | High — memories encoded in vendor-specific format | Low — it's your schema, your embeddings |
| Graph/relationship modeling | Zep and Mem0 have it built in | You'd build it yourself (significant effort) |
| Team size needed | 1-2 developers | 2-4 developers for initial build, 1 for ongoing |
The pattern is clear: off-the-shelf wins on speed, custom wins on control. But "control" isn't abstract — it translates to concrete latency savings, compliance guarantees, and scoring precision that directly affect the quality of every conversation your agents have.
What catches teams off guard is the middle ground. You can start with Mem0's open-source self-hosted version and get 80% of the way there quickly. But the moment you need custom scoring weights, channel-specific latency optimizations, or compliance certifications, you're forking their codebase — and now you're maintaining a custom build anyway, just one that started from someone else's architecture.
A Decision Framework
Here's a practical way to think about the choice. Don't start with the technology — start with your constraints.
- Define your latency budget per channel: voice (under 300ms), chat (under 1s), async (flexible)
- Map your compliance requirements: HIPAA, SOX, GDPR, state privacy laws
- Estimate your memory volume: under 10K memories = any solution works, over 100K = architecture matters
- Count your channels: do agents need shared memory across voice, chat, email, SMS?
- Assess your team: do you have 2+ engineers who understand embeddings and vector search?
- Check your database: MongoDB, Postgres with pgvector, or will you need a new data store?
- Identify your scoring needs: generic relevance ok, or domain-specific weighting required?
- Plan for deletion: can you honor right-to-be-forgotten requests with your chosen approach?
Choose Mem0 if you're building a text-based agent, need to ship fast, don't have strict compliance requirements, and want the largest community and ecosystem. Their compression engine is genuinely useful for reducing costs.
Choose Zep if you're in an enterprise environment where temporal reasoning matters — financial services, healthcare, or any domain where knowing how facts change over time is as important as knowing the facts themselves. The Go SDK is a nice bonus for backend teams.
Choose Letta if your agents need to actively manage their own understanding — think long-running agents that evolve their behavior over weeks or months. The self-editing memory model is uniquely powerful for agents that need to develop "personality" or domain expertise.
Build your own if you need agents that share memory across channels, operate in regulated industries, require strict latency control, or already have a MongoDB/Postgres infrastructure you don't want to leave. The upfront investment pays off in operational simplicity and the ability to tune every parameter to your specific use case.
How Chanl Approaches Memory
We went the custom route — not because the off-the-shelf options are bad, but because we were already managing the full conversation lifecycle. When you're handling call scoring, conversation analytics, and scenario testing, adding a memory layer that extracts insights from those same conversations is a natural extension, not a separate infrastructure problem.
That's the key advantage of building memory into a platform that already understands conversations: you don't need a separate pipeline to extract facts. Every interaction your agents handle — whether it's a phone call, a chat message, or an email thread — is already flowing through systems that analyze intent, track outcomes, and score quality. Memory extraction rides on top of that, pulling out the preferences, context, and commitments that matter for the next conversation regardless of which channel it happens on.
Four design principles shaped our approach:
Memory flows across channels. Your customers don't think in channels — they think in relationships. A customer who explains their setup over a phone call shouldn't have to repeat it when they follow up via chat the next day. Because Chanl manages conversations across voice, chat, and messaging, memories extracted from one channel are automatically available to agents on every other channel. That's the kind of continuity that turns a collection of disconnected interactions into an actual relationship.
Memory has a lifecycle. Not all memories should live forever. Customer preferences persist indefinitely — if someone says they prefer email, that's relevant next year. But session context ("currently discussing billing issue #12345") should expire in hours, not clutter up future conversations. Conversation summaries are useful for weeks, then fade. We built configurable expiry into every memory type because stale context is worse than no context.
Every agent is different. A billing agent handling sensitive financial data needs different memory behavior than a general support agent. Confidence thresholds, how many memories get injected, what kind of context gets surfaced — all of that should be tunable per agent, not locked into a one-size-fits-all configuration. That flexibility means teams can start conservative and dial up memory as they build trust in the system.
Conversations feed each other. Here's what gets really interesting: memory from one conversation can inform the next, even across completely different agents. A sales agent learns that a prospect is evaluating three competitors. When that prospect later talks to a technical support agent, that context is already there — no handoff notes, no CRM lookup, no "let me check your account." The memory layer connects the dots automatically because it sits underneath every conversation in the platform.
What excites us most is where this goes next. When memory lives inside a platform that already handles prompt management, agent tools, and live monitoring, the possibilities compound. Imagine agents that automatically adjust their approach based on patterns across hundreds of conversations, or memory that feeds back into scenario testing to validate how well agents use what they remember. That's the kind of integration you can't bolt on — it has to be built in from the start.
Making the Call
The memory layer you choose today will shape your AI's personality for years. That's not hyperbole — it's the nature of persistent state. Memories accumulate, scoring algorithms create implicit biases in what gets recalled, and your agents develop conversational patterns based on what context they're given.
If you're early stage and iterating fast, grab Mem0 or Zep and ship. You can always migrate later (it'll hurt, but it's possible). If you're building production agents in a regulated industry — especially ones that need memory across multiple channels — seriously consider building your own. The ecosystem is mature enough now — embeddings are cheap, vector search is built into every major database, and the architecture patterns are well-documented.
Either way, don't skip this decision. The difference between an agent that says "Can you start by telling me what this is about?" and one that says "Welcome back — how are you settling in to Seattle?" isn't just a feature. It's the entire customer relationship.
- Mem0 vs Zep vs LangMem vs MemoClaw: AI Agent Memory Comparison 2026 — DEV Community
- Survey of AI Agent Memory Frameworks — Graphlit Blog
- From Models to Memory: The Next Big Leap in AI Agents — ASAPP
- Memory for AI Agents: A New Paradigm of Context Engineering — The New Stack
- Top 10 AI Memory Products 2026 — Medium
- Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory — arXiv
- Agent Memory: Letta vs Mem0 vs Zep vs Cognee — Letta Forum
- ODEI vs Mem0 vs Zep: Choosing Agent Memory Architecture — DEV Community
- AI Privacy Rules: GDPR, EU AI Act, and U.S. Law — Parloa
- Memory Becomes a Meter: Why Memory Is Now First-Class Infrastructure — GenAI Tech
- 50+ Conversational AI Statistics for 2026 — Nextiva
- Customer Experience Predictions for 2026 — CX Today
- How to Reduce Repeat Calls Fast — Scorebuddy
- Personalized Shopping Experience Statistics 2026 — Envive AI
- AI Agents 2025: Expectations vs. Reality — IBM
- Voice AI's Big Year — Scalable CX in 2026 — Zendesk
- State of Conversational AI: Trends and Statistics — Master of Code
- AI Agent Memory: Why 2026 Is the Year of Persistent Context — Serenities AI
Dean Grover
Co-founder
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
Get AI Agent Insights
Subscribe to our newsletter for weekly tips and best practices.



