Why do Pipecat and LiveKit voice agents forget between sessions?

Both frameworks are stateless by design. Pipecat pipelines and LiveKit agent sessions process audio in real time but don't persist anything after the session ends. The LLM receives a fresh system prompt each time — no conversation history, no customer context, no learned preferences. Memory must be added externally.

What's the difference between session memory, entity memory, and context memory?

Session memory tracks the current conversation (minutes). Entity memory stores persistent facts about a specific customer across conversations (days/months) — preferences, account details, past issues. Context memory is the optimized, token-aware injection layer that combines memories, knowledge, and prompt into a single system prompt payload.

How does Chanl compare to Mem0 or Zep for voice agent memory?

Mem0 and Zep solve memory specifically — storing and retrieving facts across sessions. Chanl provides memory plus knowledge base (RAG), tool management (MCP), prompt versioning, automated testing, and conversation scoring. One SDK replaces the stack of separate memory, RAG, tool, and monitoring services.

Can I use the Chanl SDK with frameworks other than Pipecat and LiveKit?

Yes. The core SDK (memory, knowledge, tools, prompts) is framework-agnostic — it's a standard async Python HTTP client. The Pipecat and LiveKit integrations are convenience wrappers. You can use the base Chanl class with any voice or chat framework, or directly in scripts and serverless functions.

How much latency does memory retrieval add to a voice agent?

Memory search via the Chanl SDK adds 50-150ms depending on the number of stored memories and network conditions. The build_system_prompt() call runs memory and knowledge searches in parallel, so the total is closer to the max of the two rather than the sum. For voice agents where sub-300ms response time matters, this fits within typical latency budgets.

What happens if memory retrieval fails during a live call?

The Chanl integrations handle errors gracefully — both ChanMemoryProcessor (Pipecat) and ChanMemoryPlugin (LiveKit) catch exceptions and return empty strings on failure. The agent continues with its base system prompt. Memory loading is non-fatal by design because a degraded experience beats a crashed call.

How does the extract_and_save method work?

It sends the conversation transcript to Chanl's server-side extraction endpoint, which uses an LLM to identify structured facts (preferences, account details, decisions, action items). Each fact gets a content string, optional key-value pair, and confidence score. Facts above the confidence threshold are automatically saved as memories scoped to the customer entity.

Do I need to manage a vector database for memory search?

No. Chanl handles embedding generation, vector storage, and similarity search server-side. When you call memory.create(), the content is automatically embedded. When you call memory.search(), it runs cosine similarity against stored embeddings. You don't provision or manage any vector infrastructure.

Your Voice Agent Forgets Everything. Here's How to Fix That | Chanl Blog

A customer calls your AI agent for the third time this week. "Hi, how can I help you today?" Again. No memory of the billing dispute from Monday. No recollection of the discount your agent promised on Wednesday. The customer sighs, starts from the top, and wonders why they're talking to a machine that can't remember a three-day-old conversation.

This isn't a hypothetical edge case. It's the default behavior of every voice agent built on Pipecat and LiveKit. Both frameworks are excellent at what they do — real-time audio processing, LLM orchestration, natural-sounding speech. But when the call ends, everything evaporates. The next session starts from absolute zero.

You've probably already noticed this gap. Maybe you've started looking at Mem0, Zep, or Supermemory for a fix. They solve the memory problem. But memory is just the beginning of what a production voice agent needs — and assembling five separate services to cover the rest creates its own mess.

Here's how to give your Pipecat or LiveKit agent persistent memory, knowledge retrieval, managed tools, and post-call analysis with a single SDK.

Why Are Voice Agents Stateless by Default?

Voice agents are stateless because the frameworks that power them are designed for real-time audio processing, not data persistence. Pipecat pipelines and LiveKit agent sessions handle speech-to-text, LLM orchestration, and text-to-speech within a single session — but nothing survives after the session ends.

This is a deliberate design choice, not a bug. Pipecat's pipeline architecture processes audio frames through a chain of processors: STT captures speech, the LLM generates a response, TTS speaks it back. Each frame is handled and discarded. LiveKit's Agent framework follows the same pattern — your Agent class receives room events, processes them, and the session state lives only in memory.

The LLM itself compounds the problem. Every API call to GPT-4o, Claude, or Gemini is independent. The model has no awareness of prior turns unless you explicitly pass them in the messages array. A voice agent that handled 50 calls today has exactly the same knowledge at call 51 as it did at call 1: none.

The OpenAI Completions API requires resending the full conversation history with every request. For a single session, this works fine — you accumulate messages and the context window holds them. But across sessions? You'd need to store every conversation transcript and somehow decide which ones to include in the next call. Token costs grow linearly. Context windows fill up. And the lost-in-the-middle problem means the model starts ignoring information buried in long contexts anyway.

Without external memory, every voice session starts from zero

This is the gap. Your framework handles the conversation. Nobody handles the continuity.

What's Wrong with the Mem0/Zep/Supermemory Fix?

Memory-specific tools like Mem0, Zep, and Supermemory solve the recall problem — storing facts across sessions and retrieving them by semantic similarity. If memory is your only gap, they work. But production voice agents have more gaps than memory, and each one requires another service.

Mem0's Pipecat integration adds a MemoryManager that stores and retrieves facts between sessions. Zep's LiveKit integration does something similar with their knowledge graph. These are genuine solutions to a real problem.

But here's what happens once you ship memory and start getting real traffic:

You need a knowledge base. Customers ask about your return policy, product specs, pricing tiers. Memory stores what you've learned about the customer. Knowledge stores what the agent needs to know about your business. That's a RAG pipeline — embeddings, vector search, chunking strategy, freshness management. So you add Pinecone or Weaviate.

You need managed tools. Your agent needs to check order status, schedule appointments, look up account balances. Each tool needs authentication, error handling, retries, and timeout management. The Model Context Protocol is emerging as the standard for this, but you still need a server to host and execute your MCP tools. So you build a custom tool server.

You need prompt management. A "small wording tweak" to your system prompt causes a spike in confused responses. There's no version history, no rollback, no staging environment. Prompt versioning matters in production. So you build or buy a prompt management layer.

You need post-call analysis. Is the agent hallucinating more this week? Are tool calls failing silently? Which customer segments are having bad experiences? You need scorecards, transcripts, and trend analysis. So you add a monitoring service.

Before you know it, your "simple" voice agent has this behind it:

text

# The 5-service stack               # What you actually wanted
├── Mem0 (memory)                    ├── One SDK
├── Pinecone (RAG / knowledge)       ├── One API key
├── Custom MCP tool server           ├── One billing account
├── .env with 9 API keys             └── Everything works together
└── Custom monitoring scripts

Five services, five API keys, five billing accounts, five sets of docs. Each service works fine in isolation, but they don't share context. Your memory system doesn't know what your knowledge base contains. Your monitoring can't see your tool execution logs. Your prompt management doesn't know which memories were injected.

The problem isn't that Mem0 or Zep are bad — they're focused and effective at what they do. The problem is that memory alone isn't enough for production, and the integration tax of assembling the rest adds up fast.

How Does Memory Actually Work for Voice Agents?

Production voice agent memory operates in three tiers — session memory for the current conversation, entity memory for persistent facts about a customer, and context memory for optimized injection into the LLM prompt. Each tier has different storage characteristics, access patterns, and failure modes.

Tier 1: Session Memory

Session memory is what the LLM context window already provides — the rolling history of the current conversation. Within a single call, your Pipecat or LiveKit agent maintains this automatically through the message array.

The challenge is scale. A 30-minute support call generates 8,000-15,000 tokens of conversation history. If your context window is 128K tokens and you're also injecting a system prompt, tool definitions, and knowledge context, that leaves surprisingly little room. Long calls need summary compression — condensing older messages into a summary while keeping recent turns verbatim.

Session memory is ephemeral by design. When the call ends, it's either discarded or compressed into entity memory.

Tier 2: Entity Memory

Entity memory stores persistent facts about a specific customer — preferences, account details, past issues, interaction history. This is the tier that solves the "your agent has amnesia" problem.

Each memory entry is:

Scoped to an entity (customer ID, phone number, email)
Embedded as a vector for semantic search
Stored with metadata (source, confidence, timestamp)
Subject to TTL expiration and deduplication

When a customer calls, the agent searches entity memory for relevant facts and injects them into the system prompt. "This customer prefers Spanish-speaking agents, has an enterprise account, and called twice last week about invoice #4821."

The hard part isn't storage — it's retrieval quality and deduplication. Without deduplication, you end up with 15 copies of "customer lives in Austin" wasting context window space. Without confidence scoring, a misheard fact from one call permanently corrupts the agent's understanding of the customer.

Tier 3: Context Memory

Context memory is the orchestration layer — it decides what to inject and how much room it has. Given a system prompt, a set of relevant memories, knowledge base results, and a token budget, context memory assembles the optimal prompt payload.

This is where the agents.context() pattern comes in. Instead of manually loading memories, searching the knowledge base, and concatenating strings, a single call returns the assembled context with token-aware truncation:

python

# One call: prompt + memories + KB + tools → ready-to-use context
context = await client.agents.context(
    agent_id="agent_support",
    customer_id="cust_123",
    max_tokens=4096,
)
# context["systemPrompt"] is the full, assembled prompt
# context["tools"] lists available tool definitions
# context["tokenUsage"] shows how the budget was allocated

The three tiers work together: session memory handles the current call, entity memory persists facts between calls, and context memory optimizes what gets injected into each new session.

Customer Memory

4 memories recalled

Sarah Chen

Premium

Last call

2 days ago

Prefers

Email follow-up

Session Memory

“Discussed upgrading to Business plan. Budget approved at $50k. Follow up next Tuesday.”

85% relevance

How Do You Add Memory to a Pipecat Agent?

You add memory to a Pipecat agent by initializing a ChanMemoryProcessor with a customer ID, calling build_system_prompt() before creating your pipeline, and calling extract_and_save() after the conversation ends. Three lines of integration code replace the manual memory management loop.

Here's a complete Pipecat voice agent with persistent memory. The Chanl-specific code is highlighted — everything else is standard Pipecat:

python

from chanl import Chanl
from chanl.integrations.pipecat import ChanMemoryProcessor
 
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
 
# Initialize Chanl + memory processor
chanl = Chanl(api_key="ak_...")
memory = ChanMemoryProcessor(chanl, customer_id="cust_123")
 
BASE_PROMPT = """You are a customer support agent for Acme Corp.
Be friendly, concise, and professional."""
 
async def main():
    # One call loads memories + KB → builds complete system prompt
    system_prompt = await memory.build_system_prompt(BASE_PROMPT)
 
    # Standard Pipecat pipeline setup
    transport = DailyTransport(ROOM_URL, None, "Support Agent",
        DailyParams(audio_out_enabled=True, vad_enabled=True))
    stt = DeepgramSTTService(api_key=DEEPGRAM_KEY)
    llm = OpenAILLMService(api_key=OPENAI_KEY, model="gpt-4o")
    tts = CartesiaTTSService(api_key=CARTESIA_KEY,
        voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22")
 
    # Context starts with the memory-enriched prompt
    messages = [{"role": "system", "content": system_prompt}]
    context = OpenAILLMContext(messages)
    context_aggregator = llm.create_context_aggregator(context)
 
    pipeline = Pipeline([
        transport.input(), stt,
        context_aggregator.user(), llm, tts,
        transport.output(), context_aggregator.assistant(),
    ])
 
    task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
 
    @transport.event_handler("on_participant_left")
    async def on_left(transport, participant, reason):
        # After call: extract facts from transcript, save to memory
        transcript = "\n".join(
            f"{m['role']}: {m['content']}" for m in messages[1:]
        )
        await memory.extract_and_save(transcript)
 
    runner = PipelineRunner()
    await runner.run(task)
    await chanl.close()

What's happening under the hood with build_system_prompt():

Memory search: Queries entity memory for cust_123, retrieves facts ranked by semantic relevance
Knowledge search: Searches the workspace knowledge base for context matching the agent's domain
Prompt assembly: Appends memory and KB sections to your base prompt, formatted as markdown headers

The assembled prompt looks something like:

text

You are a customer support agent for Acme Corp.
Be friendly, concise, and professional.
 
## Known facts about this customer
- Customer has an enterprise account (50+ seats)
- Prefers email follow-ups over phone callbacks
- Had a billing dispute on invoice #4821, resolved with $50 credit on March 10
- Primary contact: Sarah Chen, VP of Operations
 
## Reference knowledge
- Enterprise accounts have dedicated support with 4-hour SLA
- Billing disputes require manager approval above $100
- Current promotion: 20% off annual plans through March 31

When the call ends, extract_and_save() sends the full transcript to Chanl's extraction endpoint. An LLM analyzes the conversation and identifies structured facts — new preferences, account changes, decisions made, action items. Each fact is saved as an entity memory with a confidence score, key-value pair, and automatic deduplication against existing memories.

How Do You Add Memory to a LiveKit Agent?

The LiveKit integration follows the same pattern with LiveKit-native conventions — ChanMemoryPlugin instead of ChanMemoryProcessor, entity_id instead of customer_id, and lifecycle hooks via the Agent class's on_enter and on_exit methods.

python

from chanl import Chanl
from chanl.integrations.livekit import ChanMemoryPlugin
 
from livekit.agents import (
    Agent, AgentSession, AutoSubscribe,
    JobContext, WorkerOptions, cli, RoomInputOptions,
)
from livekit.plugins import cartesia, deepgram, openai
 
chanl = Chanl(api_key="ak_...")
 
BASE_PROMPT = """You are a customer support agent for Acme Corp.
Be friendly, concise, and professional."""
 
class SupportAgent(Agent):
    def __init__(self, *, entity_id: str):
        super().__init__(instructions=BASE_PROMPT)
        self.chanl_memory = ChanMemoryPlugin(chanl, entity_id=entity_id)
 
    async def on_enter(self):
        # Load memories + KB → update system prompt
        self.instructions = await self.chanl_memory.build_system_prompt(
            self.instructions
        )
        loaded = self.chanl_memory.loaded_memories
        if loaded:
            print(f"[chanl] Loaded {len(loaded)} memories for session")
 
    async def on_exit(self):
        # Extract and persist facts from conversation
        # In production: build transcript from session context
        # await self.chanl_memory.extract_and_save(transcript)
        await chanl.close()
 
 
async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
 
    # Map room name → customer entity for memory scoping
    entity_id = ctx.room.name or "unknown"
 
    agent = SupportAgent(entity_id=entity_id)
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o"),
        tts=cartesia.TTS(),
    )
    await session.start(
        room=ctx.room, agent=agent,
        room_input_options=RoomInputOptions(),
    )
 
 
if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

The entity_id mapping is worth highlighting. In the example, we use the LiveKit room name, but in production you'd map this to your actual customer identifier. If the caller is identified via phone number lookup, CRM match, or auth token, that ID becomes the entity_id — and all memories accumulate against that customer across every future session, regardless of which room or agent handles the call.

Both integrations — Pipecat and LiveKit — use the same underlying Chanl client. The memory storage, knowledge base, and extraction logic are identical. Only the framework-specific wrappers differ. If you later switch from Pipecat to LiveKit (or vice versa), your memory data carries over unchanged.

What Else Does the SDK Handle Beyond Memory?

Memory is the most visible gap, but a production voice agent needs six capabilities that typically require six separate services. The chanl Python SDK provides all of them through a single client with one API key.

Knowledge Base (RAG)

Your agent needs to answer questions about your products, policies, and procedures. That's retrieval-augmented generation — and it's harder than it looks for voice, where the latency budget is tighter and transcribed queries are noisier than typed search terms.

python

# Search knowledge base — hybrid vector + keyword search
results = await client.knowledge.search(
    query="enterprise account SLA and support hours",
    limit=5,
)
 
# Upload new knowledge
await client.knowledge.create(
    title="Q1 2026 Pricing Update",
    content="Enterprise plans now include dedicated support...",
    source="text",
)

No vector database to provision. No embedding pipeline to maintain. The SDK handles chunking, embedding, and retrieval server-side. Your agent gets relevant knowledge context without you managing Pinecone, Weaviate, or Qdrant.

Tool Execution (MCP)

Production agents need to do things — check order status, schedule appointments, process refunds. Each tool requires auth, error handling, retries, and structured input/output.

python

# Execute a registered tool
result = await client.tools.execute(
    "tool_check_order_status",
    arguments={"order_id": "ORD-7821"},
)
# result: {"status": "shipped", "tracking": "1Z999...", "eta": "March 17"}
 
# List available tools
tools = await client.tools.list(is_enabled=True)

Tools are registered once in the Chanl dashboard with their schemas, auth credentials, and error handling rules. The SDK executes them with automatic retries and structured responses. Your voice agent calls tools.execute() instead of managing webhook endpoints, API keys, and retry logic per integration.

For agents using the Model Context Protocol, Chanl provides an MCP gateway — your agent connects to a single MCP endpoint and gets access to all registered tools without running your own MCP server.

Prompt Management

Your agent's behavior lives in its system prompt. Changing it without version control is how "small tweaks" cause production incidents.

python

# Resolve the active prompt for an agent (respects versioning + overrides)
prompt = await client.prompts.resolve(agent_id="agent_support")
 
# List prompt versions
versions = await client.prompts.list(agent_id="agent_support")

Prompts are versioned, staged, and rollback-capable. When you update a prompt, the previous version is preserved. If something breaks, you roll back in seconds instead of trying to remember what the prompt said yesterday.

Context Builder

The agents.context() method is the aggregation layer — one call that assembles everything your agent needs:

python

# One call: system prompt + memories + KB + tools + token budget
context = await client.agents.context(
    agent_id="agent_support",
    customer_id="cust_123",
    max_tokens=4096,
    include=["memory", "knowledge", "prompt", "tools"],
)
 
system_prompt = context["data"]["systemPrompt"]
available_tools = context["data"]["tools"]
token_usage = context["data"]["tokenUsage"]

Instead of making three separate calls (memory search, KB search, prompt resolve) and manually concatenating results, agents.context() handles the orchestration. It respects your token budget, prioritizes recent memories over old ones, and assembles the prompt in a single round-trip.

Testing and Quality Scoring

After you ship, you need to know if your agent is actually performing well. Are responses accurate? Is the agent using tools correctly? Are customers satisfied?

python

# Run a test scenario against your agent
result = await client.scenarios.run("scenario_billing_dispute")
 
# Get quality scores for a specific interaction
scores = await client.scorecards.evaluate(
    interaction_id="int_abc",
    scorecard_id="sc_support_quality",
)

Automated testing with AI-powered personas catches regressions before they reach customers. Scorecards grade every interaction on accuracy, tone, compliance, and task completion — automatically, at scale. These aren't separate services with separate dashboards. They're part of the same SDK, using the same data, sharing the same context.

How Do You Get Started?

Five lines to persistent memory. Everything else is optional and additive.

bash

pip install chanl

python

from chanl import Chanl
 
client = Chanl(api_key="ak_...")
 
# Store a fact
await client.memory.create(
    entity_type="customer",
    entity_id="cust_123",
    content="Customer prefers morning callbacks before 10am",
    key="contact_preference",
    value="morning",
)
 
# Search memories
results = await client.memory.search(
    entity_type="customer",
    entity_id="cust_123",
    query="when does the customer prefer to be contacted?",
)

For Pipecat: from chanl.integrations.pipecat import ChanMemoryProcessor

For LiveKit: from chanl.integrations.livekit import ChanMemoryPlugin

Both integrations follow the same lifecycle — load context before the session, save facts after. The full examples are on GitHub, including complete Pipecat and LiveKit agent implementations you can clone and run.

The SDK is framework-agnostic at its core. If you're building with a different voice framework, VAPI webhooks, or even a text-based chat agent, the Chanl class works the same way — memory, knowledge, tools, and prompts are all accessible through the same async client.

Where Does This Leave Your Architecture?

Your voice agent framework — Pipecat, LiveKit, whatever comes next — handles the conversation. The audio pipeline, the real-time STT/TTS, the LLM orchestration. That's genuinely hard engineering and these frameworks do it well.

Everything behind the conversation — memory that persists between sessions, knowledge that grounds responses in facts, tools that let the agent take action, prompts that version-control behavior, testing that catches regressions, analytics that surface quality trends — that's a different problem. It's the agent backend.

You can build it yourself. We've written about what that takes — roughly 22-39 weeks of engineering across six capabilities. Or you can treat it like you treat your database, your auth provider, or your payment processor: use infrastructure that already exists so you can focus on the part that's actually unique to your product.

The voice frameworks will keep evolving. New models will drop. New STT and TTS providers will emerge. The infrastructure behind the agent — the memory, the tools, the knowledge, the quality framework — stays the same regardless of what voice layer sits on top. Build the foundation right, and switching frameworks is a weekend project instead of a rewrite.

Give your voice agent a memory

Persistent memory, knowledge retrieval, managed tools, and quality scoring for Pipecat, LiveKit, and any AI agent framework. One SDK, one API key.

Read the docs

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

memory pipecat livekit python-sdk voice tools rag mcp

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.

500+ engineers subscribed

Your Voice Agent Forgets Everything. Here's How to Fix That

Why Are Voice Agents Stateless by Default?

What's Wrong with the Mem0/Zep/Supermemory Fix?

How Does Memory Actually Work for Voice Agents?

Tier 1: Session Memory

Tier 2: Entity Memory

Tier 3: Context Memory

How Do You Add Memory to a Pipecat Agent?

How Do You Add Memory to a LiveKit Agent?

What Else Does the SDK Handle Beyond Memory?

Knowledge Base (RAG)

Tool Execution (MCP)

Prompt Management

Context Builder

Testing and Quality Scoring

How Do You Get Started?

Where Does This Leave Your Architecture?

Give your voice agent a memory

Learn Agentic AI

Frequently Asked Questions

Related Articles

AI Agent Memory: Build Your Own or Buy Off the Shelf?

AI Agent Memory: From Session Context to Long-Term Knowledge

Your Voice AI Platform Is Only Half the Stack