A customer calls your AI agent for the third time this week. "Hi, how can I help you today?" Again. No memory of the billing dispute from Monday. No recollection of the discount your agent promised on Wednesday. The customer sighs, starts from the top, and wonders why they're talking to a machine that can't remember a three-day-old conversation.
This isn't a hypothetical edge case. It's the default behavior of every voice agent built on Pipecat and LiveKit. Both frameworks are excellent at what they do — real-time audio processing, LLM orchestration, natural-sounding speech. But when the call ends, everything evaporates. The next session starts from absolute zero.
You've probably already noticed this gap. Maybe you've started looking at Mem0, Zep, or Supermemory for a fix. They solve the memory problem. But memory is just the beginning of what a production voice agent needs — and assembling five separate services to cover the rest creates its own mess.
Here's how to give your Pipecat or LiveKit agent persistent memory, knowledge retrieval, managed tools, and post-call analysis with a single SDK.
Why Are Voice Agents Stateless by Default?
Voice agents are stateless because the frameworks that power them are designed for real-time audio processing, not data persistence. Pipecat pipelines and LiveKit agent sessions handle speech-to-text, LLM orchestration, and text-to-speech within a single session — but nothing survives after the session ends.
This is a deliberate design choice, not a bug. Pipecat's pipeline architecture processes audio frames through a chain of processors: STT captures speech, the LLM generates a response, TTS speaks it back. Each frame is handled and discarded. LiveKit's Agent framework follows the same pattern — your Agent class receives room events, processes them, and the session state lives only in memory.
The LLM itself compounds the problem. Every API call to GPT-4o, Claude, or Gemini is independent. The model has no awareness of prior turns unless you explicitly pass them in the messages array. A voice agent that handled 50 calls today has exactly the same knowledge at call 51 as it did at call 1: none.
The OpenAI Completions API requires resending the full conversation history with every request. For a single session, this works fine — you accumulate messages and the context window holds them. But across sessions? You'd need to store every conversation transcript and somehow decide which ones to include in the next call. Token costs grow linearly. Context windows fill up. And the lost-in-the-middle problem means the model starts ignoring information buried in long contexts anyway.
This is the gap. Your framework handles the conversation. Nobody handles the continuity.
What's Wrong with the Mem0/Zep/Supermemory Fix?
Memory-specific tools like Mem0, Zep, and Supermemory solve the recall problem — storing facts across sessions and retrieving them by semantic similarity. If memory is your only gap, they work. But production voice agents have more gaps than memory, and each one requires another service.
Mem0's Pipecat integration adds a MemoryManager that stores and retrieves facts between sessions. Zep's LiveKit integration does something similar with their knowledge graph. These are genuine solutions to a real problem.
But here's what happens once you ship memory and start getting real traffic:
You need a knowledge base. Customers ask about your return policy, product specs, pricing tiers. Memory stores what you've learned about the customer. Knowledge stores what the agent needs to know about your business. That's a RAG pipeline — embeddings, vector search, chunking strategy, freshness management. So you add Pinecone or Weaviate.
You need managed tools. Your agent needs to check order status, schedule appointments, look up account balances. Each tool needs authentication, error handling, retries, and timeout management. The Model Context Protocol is emerging as the standard for this, but you still need a server to host and execute your MCP tools. So you build a custom tool server.
You need prompt management. A "small wording tweak" to your system prompt causes a spike in confused responses. There's no version history, no rollback, no staging environment. Prompt versioning matters in production. So you build or buy a prompt management layer.
You need post-call analysis. Is the agent hallucinating more this week? Are tool calls failing silently? Which customer segments are having bad experiences? You need scorecards, transcripts, and trend analysis. So you add a monitoring service.
Before you know it, your "simple" voice agent has this behind it:
# The 5-service stack # What you actually wanted
├── Mem0 (memory) ├── One SDK
├── Pinecone (RAG / knowledge) ├── One API key
├── Custom MCP tool server ├── One billing account
├── .env with 9 API keys └── Everything works together
└── Custom monitoring scriptsFive services, five API keys, five billing accounts, five sets of docs. Each service works fine in isolation, but they don't share context. Your memory system doesn't know what your knowledge base contains. Your monitoring can't see your tool execution logs. Your prompt management doesn't know which memories were injected.
The problem isn't that Mem0 or Zep are bad — they're focused and effective at what they do. The problem is that memory alone isn't enough for production, and the integration tax of assembling the rest adds up fast.
How Does Memory Actually Work for Voice Agents?
Production voice agent memory operates in three tiers — session memory for the current conversation, entity memory for persistent facts about a customer, and context memory for optimized injection into the LLM prompt. Each tier has different storage characteristics, access patterns, and failure modes.
Tier 1: Session Memory
Session memory is what the LLM context window already provides — the rolling history of the current conversation. Within a single call, your Pipecat or LiveKit agent maintains this automatically through the message array.
The challenge is scale. A 30-minute support call generates 8,000-15,000 tokens of conversation history. If your context window is 128K tokens and you're also injecting a system prompt, tool definitions, and knowledge context, that leaves surprisingly little room. Long calls need summary compression — condensing older messages into a summary while keeping recent turns verbatim.
Session memory is ephemeral by design. When the call ends, it's either discarded or compressed into entity memory.
Tier 2: Entity Memory
Entity memory stores persistent facts about a specific customer — preferences, account details, past issues, interaction history. This is the tier that solves the "your agent has amnesia" problem.
Each memory entry is:
- Scoped to an entity (customer ID, phone number, email)
- Embedded as a vector for semantic search
- Stored with metadata (source, confidence, timestamp)
- Subject to TTL expiration and deduplication
When a customer calls, the agent searches entity memory for relevant facts and injects them into the system prompt. "This customer prefers Spanish-speaking agents, has an enterprise account, and called twice last week about invoice #4821."
The hard part isn't storage — it's retrieval quality and deduplication. Without deduplication, you end up with 15 copies of "customer lives in Austin" wasting context window space. Without confidence scoring, a misheard fact from one call permanently corrupts the agent's understanding of the customer.
Tier 3: Context Memory
Context memory is the orchestration layer — it decides what to inject and how much room it has. Given a system prompt, a set of relevant memories, knowledge base results, and a token budget, context memory assembles the optimal prompt payload.
This is where the agents.context() pattern comes in. Instead of manually loading memories, searching the knowledge base, and concatenating strings, a single call returns the assembled context with token-aware truncation:
# One call: prompt + memories + KB + tools → ready-to-use context
context = await client.agents.context(
agent_id="agent_support",
customer_id="cust_123",
max_tokens=4096,
)
# context["systemPrompt"] is the full, assembled prompt
# context["tools"] lists available tool definitions
# context["tokenUsage"] shows how the budget was allocatedThe three tiers work together: session memory handles the current call, entity memory persists facts between calls, and context memory optimizes what gets injected into each new session.

Customer Memory
4 memories recalled
“Discussed upgrading to Business plan. Budget approved at $50k. Follow up next Tuesday.”
How Do You Add Memory to a Pipecat Agent?
You add memory to a Pipecat agent by initializing a ChanMemoryProcessor with a customer ID, calling build_system_prompt() before creating your pipeline, and calling extract_and_save() after the conversation ends. Three lines of integration code replace the manual memory management loop.
Here's a complete Pipecat voice agent with persistent memory. The Chanl-specific code is highlighted — everything else is standard Pipecat:
from chanl import Chanl
from chanl.integrations.pipecat import ChanMemoryProcessor
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
# Initialize Chanl + memory processor
chanl = Chanl(api_key="ak_...")
memory = ChanMemoryProcessor(chanl, customer_id="cust_123")
BASE_PROMPT = """You are a customer support agent for Acme Corp.
Be friendly, concise, and professional."""
async def main():
# One call loads memories + KB → builds complete system prompt
system_prompt = await memory.build_system_prompt(BASE_PROMPT)
# Standard Pipecat pipeline setup
transport = DailyTransport(ROOM_URL, None, "Support Agent",
DailyParams(audio_out_enabled=True, vad_enabled=True))
stt = DeepgramSTTService(api_key=DEEPGRAM_KEY)
llm = OpenAILLMService(api_key=OPENAI_KEY, model="gpt-4o")
tts = CartesiaTTSService(api_key=CARTESIA_KEY,
voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22")
# Context starts with the memory-enriched prompt
messages = [{"role": "system", "content": system_prompt}]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
pipeline = Pipeline([
transport.input(), stt,
context_aggregator.user(), llm, tts,
transport.output(), context_aggregator.assistant(),
])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
@transport.event_handler("on_participant_left")
async def on_left(transport, participant, reason):
# After call: extract facts from transcript, save to memory
transcript = "\n".join(
f"{m['role']}: {m['content']}" for m in messages[1:]
)
await memory.extract_and_save(transcript)
runner = PipelineRunner()
await runner.run(task)
await chanl.close()What's happening under the hood with build_system_prompt():
- Memory search: Queries entity memory for
cust_123, retrieves facts ranked by semantic relevance - Knowledge search: Searches the workspace knowledge base for context matching the agent's domain
- Prompt assembly: Appends memory and KB sections to your base prompt, formatted as markdown headers
The assembled prompt looks something like:
You are a customer support agent for Acme Corp.
Be friendly, concise, and professional.
## Known facts about this customer
- Customer has an enterprise account (50+ seats)
- Prefers email follow-ups over phone callbacks
- Had a billing dispute on invoice #4821, resolved with $50 credit on March 10
- Primary contact: Sarah Chen, VP of Operations
## Reference knowledge
- Enterprise accounts have dedicated support with 4-hour SLA
- Billing disputes require manager approval above $100
- Current promotion: 20% off annual plans through March 31When the call ends, extract_and_save() sends the full transcript to Chanl's extraction endpoint. An LLM analyzes the conversation and identifies structured facts — new preferences, account changes, decisions made, action items. Each fact is saved as an entity memory with a confidence score, key-value pair, and automatic deduplication against existing memories.
How Do You Add Memory to a LiveKit Agent?
The LiveKit integration follows the same pattern with LiveKit-native conventions — ChanMemoryPlugin instead of ChanMemoryProcessor, entity_id instead of customer_id, and lifecycle hooks via the Agent class's on_enter and on_exit methods.
from chanl import Chanl
from chanl.integrations.livekit import ChanMemoryPlugin
from livekit.agents import (
Agent, AgentSession, AutoSubscribe,
JobContext, WorkerOptions, cli, RoomInputOptions,
)
from livekit.plugins import cartesia, deepgram, openai
chanl = Chanl(api_key="ak_...")
BASE_PROMPT = """You are a customer support agent for Acme Corp.
Be friendly, concise, and professional."""
class SupportAgent(Agent):
def __init__(self, *, entity_id: str):
super().__init__(instructions=BASE_PROMPT)
self.chanl_memory = ChanMemoryPlugin(chanl, entity_id=entity_id)
async def on_enter(self):
# Load memories + KB → update system prompt
self.instructions = await self.chanl_memory.build_system_prompt(
self.instructions
)
loaded = self.chanl_memory.loaded_memories
if loaded:
print(f"[chanl] Loaded {len(loaded)} memories for session")
async def on_exit(self):
# Extract and persist facts from conversation
# In production: build transcript from session context
# await self.chanl_memory.extract_and_save(transcript)
await chanl.close()
async def entrypoint(ctx: JobContext):
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
# Map room name → customer entity for memory scoping
entity_id = ctx.room.name or "unknown"
agent = SupportAgent(entity_id=entity_id)
session = AgentSession(
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o"),
tts=cartesia.TTS(),
)
await session.start(
room=ctx.room, agent=agent,
room_input_options=RoomInputOptions(),
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))The entity_id mapping is worth highlighting. In the example, we use the LiveKit room name, but in production you'd map this to your actual customer identifier. If the caller is identified via phone number lookup, CRM match, or auth token, that ID becomes the entity_id — and all memories accumulate against that customer across every future session, regardless of which room or agent handles the call.
Both integrations — Pipecat and LiveKit — use the same underlying Chanl client. The memory storage, knowledge base, and extraction logic are identical. Only the framework-specific wrappers differ. If you later switch from Pipecat to LiveKit (or vice versa), your memory data carries over unchanged.
What Else Does the SDK Handle Beyond Memory?
Memory is the most visible gap, but a production voice agent needs six capabilities that typically require six separate services. The chanl Python SDK provides all of them through a single client with one API key.
Knowledge Base (RAG)
Your agent needs to answer questions about your products, policies, and procedures. That's retrieval-augmented generation — and it's harder than it looks for voice, where the latency budget is tighter and transcribed queries are noisier than typed search terms.
# Search knowledge base — hybrid vector + keyword search
results = await client.knowledge.search(
query="enterprise account SLA and support hours",
limit=5,
)
# Upload new knowledge
await client.knowledge.create(
title="Q1 2026 Pricing Update",
content="Enterprise plans now include dedicated support...",
source="text",
)No vector database to provision. No embedding pipeline to maintain. The SDK handles chunking, embedding, and retrieval server-side. Your agent gets relevant knowledge context without you managing Pinecone, Weaviate, or Qdrant.
Tool Execution (MCP)
Production agents need to do things — check order status, schedule appointments, process refunds. Each tool requires auth, error handling, retries, and structured input/output.
# Execute a registered tool
result = await client.tools.execute(
"tool_check_order_status",
arguments={"order_id": "ORD-7821"},
)
# result: {"status": "shipped", "tracking": "1Z999...", "eta": "March 17"}
# List available tools
tools = await client.tools.list(is_enabled=True)Tools are registered once in the Chanl dashboard with their schemas, auth credentials, and error handling rules. The SDK executes them with automatic retries and structured responses. Your voice agent calls tools.execute() instead of managing webhook endpoints, API keys, and retry logic per integration.
For agents using the Model Context Protocol, Chanl provides an MCP gateway — your agent connects to a single MCP endpoint and gets access to all registered tools without running your own MCP server.
Prompt Management
Your agent's behavior lives in its system prompt. Changing it without version control is how "small tweaks" cause production incidents.
# Resolve the active prompt for an agent (respects versioning + overrides)
prompt = await client.prompts.resolve(agent_id="agent_support")
# List prompt versions
versions = await client.prompts.list(agent_id="agent_support")Prompts are versioned, staged, and rollback-capable. When you update a prompt, the previous version is preserved. If something breaks, you roll back in seconds instead of trying to remember what the prompt said yesterday.
Context Builder
The agents.context() method is the aggregation layer — one call that assembles everything your agent needs:
# One call: system prompt + memories + KB + tools + token budget
context = await client.agents.context(
agent_id="agent_support",
customer_id="cust_123",
max_tokens=4096,
include=["memory", "knowledge", "prompt", "tools"],
)
system_prompt = context["data"]["systemPrompt"]
available_tools = context["data"]["tools"]
token_usage = context["data"]["tokenUsage"]Instead of making three separate calls (memory search, KB search, prompt resolve) and manually concatenating results, agents.context() handles the orchestration. It respects your token budget, prioritizes recent memories over old ones, and assembles the prompt in a single round-trip.
Testing and Quality Scoring
After you ship, you need to know if your agent is actually performing well. Are responses accurate? Is the agent using tools correctly? Are customers satisfied?
# Run a test scenario against your agent
result = await client.scenarios.run("scenario_billing_dispute")
# Get quality scores for a specific interaction
scores = await client.scorecards.evaluate(
interaction_id="int_abc",
scorecard_id="sc_support_quality",
)Automated testing with AI-powered personas catches regressions before they reach customers. Scorecards grade every interaction on accuracy, tone, compliance, and task completion — automatically, at scale. These aren't separate services with separate dashboards. They're part of the same SDK, using the same data, sharing the same context.
How Do You Get Started?
Five lines to persistent memory. Everything else is optional and additive.
pip install chanlfrom chanl import Chanl
client = Chanl(api_key="ak_...")
# Store a fact
await client.memory.create(
entity_type="customer",
entity_id="cust_123",
content="Customer prefers morning callbacks before 10am",
key="contact_preference",
value="morning",
)
# Search memories
results = await client.memory.search(
entity_type="customer",
entity_id="cust_123",
query="when does the customer prefer to be contacted?",
)For Pipecat: from chanl.integrations.pipecat import ChanMemoryProcessor
For LiveKit: from chanl.integrations.livekit import ChanMemoryPlugin
Both integrations follow the same lifecycle — load context before the session, save facts after. The full examples are on GitHub, including complete Pipecat and LiveKit agent implementations you can clone and run.
The SDK is framework-agnostic at its core. If you're building with a different voice framework, VAPI webhooks, or even a text-based chat agent, the Chanl class works the same way — memory, knowledge, tools, and prompts are all accessible through the same async client.
Where Does This Leave Your Architecture?
Your voice agent framework — Pipecat, LiveKit, whatever comes next — handles the conversation. The audio pipeline, the real-time STT/TTS, the LLM orchestration. That's genuinely hard engineering and these frameworks do it well.
Everything behind the conversation — memory that persists between sessions, knowledge that grounds responses in facts, tools that let the agent take action, prompts that version-control behavior, testing that catches regressions, analytics that surface quality trends — that's a different problem. It's the agent backend.
You can build it yourself. We've written about what that takes — roughly 22-39 weeks of engineering across six capabilities. Or you can treat it like you treat your database, your auth provider, or your payment processor: use infrastructure that already exists so you can focus on the part that's actually unique to your product.
The voice frameworks will keep evolving. New models will drop. New STT and TTS providers will emerge. The infrastructure behind the agent — the memory, the tools, the knowledge, the quality framework — stays the same regardless of what voice layer sits on top. Build the foundation right, and switching frameworks is a weekend project instead of a rewrite.
Give your voice agent a memory
Persistent memory, knowledge retrieval, managed tools, and quality scoring for Pipecat, LiveKit, and any AI agent framework. One SDK, one API key.
Read the docsCo-founder
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
Learn Agentic AI
One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.



