Can AI agents handle PCI-compliant banking calls?

Yes, but the agent must enforce strict boundaries. It never reads full card numbers aloud, never stores unmasked account data in conversation logs, and uses tool-level access controls so that account information is only retrieved after identity verification passes. The tools themselves handle PCI scope, not the AI model.

How does identity verification work with AI voice agents in banking?

The AI collects knowledge-based authentication factors (last 4 of SSN, date of birth, recent transaction amount) and passes them to the bank's identity verification API via a tool call. The tool returns a verified or unverified status. The agent never evaluates identity itself. It gates all account access behind the tool's response.

What banking regulations apply to AI-handled dispute calls?

Regulation E requires banks to acknowledge electronic fund transfer disputes within specific timelines, provide provisional credit information, and inform customers of investigation procedures. AI agents handling disputes must make these disclosures on every call. Scorecards that grade every conversation for compliance make this auditable.

How does AI fraud detection work during customer service calls?

The AI collects structured fraud intake data (unauthorized transaction details, dates, amounts) similar to FNOL in insurance. It also monitors for behavioral patterns: multiple card replacement requests within a short window, disputes on recently changed addresses, or transaction patterns that match known fraud signatures. Pattern matches trigger immediate escalation to the fraud team.

Can AI agents remember previous banking interactions across channels?

Yes. Persistent memory stores interaction history tied to the customer's account. A customer who called about a dispute yesterday and chats today for a follow-up gets continuity. The agent loads previous interaction context at call start without the customer repeating their issue. Memory works across phone, chat, and messaging channels.

What happens when the AI agent can't verify a caller's identity?

Unverified callers receive general information only: branch hours, current rates, product details from the knowledge base. They cannot access account balances, transaction history, or any account-level actions. The agent offers to transfer to a human agent for account-specific requests. This boundary is enforced at the tool level, not by the prompt alone.

How do banks test AI agents before deploying them to real customers?

Scenario testing simulates realistic customer interactions. Test personas represent different caller types: a confused retiree asking about a suspicious charge, an impatient business owner needing a wire transfer status, a first-time caller who gives wrong answers to verification questions. Each scenario runs through the full call flow and is graded against compliance and security scorecards.

What ROI can banks expect from AI customer service agents?

A mid-size bank handling 8,000 daily calls with 12-minute average wait times typically sees tier-1 call deflection rates of 40-60% within 90 days. The direct savings come from reduced call center staffing for routine inquiries, but the larger gain is customer retention: customers who reach resolution in under 2 minutes instead of waiting 12 have significantly higher satisfaction scores.

Banks Trust AI With Transactions. Why Not Customer Calls?

A bank's fraud detection system processes 50 million transactions per day. It evaluates each one in under 100 milliseconds. It flags suspicious patterns, blocks unauthorized charges, and adapts to new attack vectors without human intervention. Nobody questions whether AI should handle this. The system is faster, more consistent, and more accurate than any team of human analysts could be.

The same bank's customer service line has a 12-minute average wait time. A customer calls to check the status of a wire transfer. Another calls to report a suspicious charge on their debit card. A third wants to know if their card replacement shipped. All three sit on hold, listening to the same music, waiting for the same overtaxed agents.

The AI that protects the bank's money is state of the art. The AI that talks to the bank's customers is a phone tree from 2004.

The gap between back office and front office
Why banking took longer
Identity first, everything else second
Two tiers: verified and unverified
Fraud detection as structured intake
Regulation E and the disclosure problem
Memory across channels
Testing before a single customer hears it
The architecture in full
What the numbers look like

The gap between back office and front office

Banks have been deploying AI for over a decade. Fraud detection, credit scoring, anti-money laundering, algorithmic trading. These systems handle billions of dollars in decisions every day. The technology isn't experimental. It's load-bearing infrastructure.

But customer-facing AI at most banks still means an IVR that recognizes "balance" and "representative" as keywords. Press 1 for your balance. Press 2 for recent transactions. Anything more complex? Please hold for an agent.

The reason isn't technical capability. The same language models that power consumer chatbots can absolutely handle "What's the status of my wire transfer?" The reason is risk. A bad recommendation from a retail chatbot costs a customer some inconvenience. A bad answer from a banking agent could expose account data, mishandle a dispute, or violate federal regulations.

That risk calculus kept banks on the sideline while every other industry adopted conversational AI. But the calculus has changed, because the tools to manage that risk now exist.

Consider a mid-size bank: 500,000 customers, 150 branch locations, 8,000 customer service calls per day. Their call center runs 200 agents across three shifts. Most calls are routine. Balance checks. Transaction questions. Card replacements. Branch hours. Wire transfer status. The IVR deflects the simplest queries, but anything that requires context or judgment goes to a human. Wait times average 12 minutes. Customer satisfaction scores have been falling for three consecutive quarters.

They want AI to handle tier-1 calls. Not all calls. Not complex disputes or investment advice or loan origination. The routine 60% that don't require human judgment but currently require a human body in a seat.

Here's how that actually works when you build it with the right constraints.

Why banking took longer

Retail, healthcare, travel, e-commerce. They all adopted AI customer service before banking did. The delay wasn't about capability. It was about the specific intersection of constraints that banking imposes.

PCI DSS governs how payment card data is handled. An AI agent that reads a full card number aloud during a call is a compliance violation. An agent that stores unmasked account numbers in conversation logs is a compliance violation. The AI needs to handle payment data without ever exposing it in a way that violates PCI scope.

Regulation E covers electronic fund transfer disputes. When a customer reports an unauthorized transaction, the bank has specific obligations: acknowledge the dispute in writing within a defined timeline, investigate within a defined timeline, provide provisional credit under defined circumstances, and inform the customer of their rights at each stage. An AI agent that handles dispute intake but skips the required disclosures creates regulatory exposure.

KYC and identity verification require that account access is gated behind verified identity. A human agent verifies identity by asking questions and checking answers against the system. An AI agent needs to do the same thing, but the verification has to happen through a secure tool call, not through the language model "deciding" whether the answers sound right.

Audit requirements mean every customer interaction needs a complete, retrievable record. Not just a transcript. A record of what account data was accessed, what actions were taken, what disclosures were made, and what the customer consented to.

Each of these constraints is individually solvable. The challenge was solving all of them at once in a system that still sounds natural, handles edge cases gracefully, and actually reduces hold times instead of replacing one frustration with another.

Identity first, everything else second

Before a banking AI agent can do anything useful, it needs to know who it's talking to. This is the hard gate that separates banking from most other customer service deployments.

The verification flow works like this: the caller provides identifying information, the agent passes it to the bank's identity verification API through a tool call, and the tool returns a binary result. Verified or not verified. The agent never evaluates identity on its own. It never decides that the caller "sounds right" or "probably" is who they say they are. The tool makes the determination.

typescript

import { Chanl } from '@chanl/sdk'
 
const chanl = new Chanl({ apiKey: process.env.CHANL_API_KEY })
 
// Identity verification tool
await chanl.tools.create({
  name: 'verify_customer_identity',
  description: 'Verify caller identity using knowledge-based authentication. Must be called before any account access.',
  type: 'http',
  inputSchema: {
    type: 'object',
    properties: {
      phoneNumber: {
        type: 'string',
        description: 'Caller phone number from ANI'
      },
      lastFourSSN: {
        type: 'string',
        description: 'Last 4 digits of Social Security Number'
      },
      dateOfBirth: {
        type: 'string',
        description: 'Date of birth in YYYY-MM-DD format'
      },
      recentTransactionAmount: {
        type: 'number',
        description: 'Amount of a recent transaction for additional verification'
      }
    },
    required: ['phoneNumber', 'lastFourSSN', 'dateOfBirth']
  },
  configuration: {
    http: { method: 'POST', url: 'https://api.westfieldbank.internal/verify-identity' }
  }
})

The configured endpoint hits the bank's core identity system and returns a response the agent can act on:

typescript

// Response from verification tool
{
  verified: true,
  customerId: 'WB-882491',
  accountIds: ['CHK-4491', 'SAV-4492'],
  name: 'Margaret Chen',
  verificationLevel: 'full',   // full, partial, failed
  restrictions: []               // any account-level holds
}

If verification fails, the conversation shifts. The agent doesn't hang up. It doesn't get confused. It moves to the unverified tier, where it can still be useful, just not with account data.

The key architectural decision: verification status is enforced at the tool level, not the prompt level. The account lookup tool checks verification status before returning data. Even if a prompt injection somehow told the agent to skip verification, the tool itself would refuse to return account information for an unverified session.

Two tiers: verified and unverified

This is the boundary that makes banking AI safe to deploy. Every caller falls into one of two categories, and the agent behaves differently for each.

Verified callers have passed identity verification. They can access account balances, transaction history, and recent statements. They can take actions: lock a card, update an address, submit a dispute, request a card replacement. They can ask about the status of pending transactions, wire transfers, and previous service requests.

Unverified callers get general information only. Branch hours. Current rates on savings accounts, CDs, and loans. Product information. General process explanations. Anything that lives in the knowledge base and doesn't require account access.

typescript

// Knowledge base: general banking information (no auth required)
await chanl.knowledge.create({
  title: 'CD Rates - March 2026',
  source: 'text',
  content: `
    Current Certificate of Deposit rates effective March 15, 2026:
    6-month: 4.75% APY ($1,000 minimum)
    12-month: 4.90% APY ($1,000 minimum)
    24-month: 4.65% APY ($2,500 minimum)
    36-month: 4.50% APY ($2,500 minimum)
 
    Early withdrawal penalty: 90 days interest (6-12 month),
    180 days interest (24-36 month).
    Rates subject to change. FDIC insured up to $250,000.
  `,
  metadata: {
    category: 'rates',
    tags: ['cd', 'deposit-rates'],
  }
})

The knowledge base holds everything a customer might need that isn't account-specific: fee schedules, process guides, branch locations, ATM networks, wire transfer cutoff times, and regulatory disclosures. This content makes the unverified tier genuinely useful. A caller asking "What time does the downtown branch close on Saturdays?" doesn't need identity verification, and they shouldn't have to wait 12 minutes for a human to answer it.

The verified tier adds account-specific tools:

typescript

// Account lookup tool (requires prior verification)
await chanl.tools.create({
  name: 'get_account_summary',
  description: 'Retrieve account balances and recent transactions. Only callable after verify_customer_identity succeeds.',
  type: 'http',
  inputSchema: {
    type: 'object',
    properties: {
      customerId: {
        type: 'string',
        description: 'Customer ID from verification result'
      },
      accountId: {
        type: 'string',
        description: 'Specific account ID, or omit for all accounts'
      },
      includeTransactions: {
        type: 'boolean',
        description: 'Include recent transactions (last 10)'
      }
    },
    required: ['customerId']
  },
  configuration: {
    http: { method: 'GET', url: 'https://api.westfieldbank.internal/account-summary' }
  }
})

Notice what the tool returns and what it doesn't. Account balances use masked identifiers. Transaction history shows merchant names and amounts but never full card numbers. The API itself enforces PCI boundaries. The AI agent never sees, processes, or speaks a full card number because the data source never provides one.

Fraud detection as structured intake

When a customer calls to report an unauthorized transaction, that's not a simple question-and-answer. It's a structured intake process with specific data requirements, similar to a first notice of loss in insurance.

The AI agent needs to collect: which transaction, what amount, the date, whether the card was in the customer's possession, whether anyone else is authorized on the account, and whether the customer has recently traveled. This data feeds the bank's fraud investigation system.

typescript

// Fraud report intake tool
await chanl.tools.create({
  name: 'submit_fraud_report',
  description: 'Collect and submit fraud report details for unauthorized transaction',
  type: 'http',
  inputSchema: {
    type: 'object',
    properties: {
      customerId: { type: 'string' },
      accountId: { type: 'string' },
      transactionDate: { type: 'string', description: 'Date of unauthorized transaction' },
      transactionAmount: { type: 'number' },
      merchantName: { type: 'string', description: 'Merchant name as it appears on statement' },
      cardInPossession: { type: 'boolean', description: 'Does customer still have the physical card' },
      additionalAuthorizedUsers: { type: 'boolean' },
      recentTravel: { type: 'boolean', description: 'Has customer traveled recently' },
      customerStatement: { type: 'string', description: 'Customer description of the situation' }
    },
    required: ['customerId', 'accountId', 'transactionDate', 'transactionAmount', 'merchantName', 'cardInPossession']
  },
  configuration: {
    http: { method: 'POST', url: 'https://api.westfieldbank.internal/fraud/report' }
  }
})

The structured intake is the baseline. Where it gets more interesting is pattern detection.

The fraud report tool's backend doesn't just file a report. It checks the customer's recent activity against fraud indicators. Multiple card replacement requests in 30 days. Disputes filed on transactions made after a recent address change. A pattern of small-amount disputes that individually look like noise but collectively look like friendly fraud.

When the backend detects a pattern, it returns a flag that tells the agent to escalate immediately:

typescript

// Fraud tool response with escalation flag
{
  reportId: 'FR-2026-88412',
  status: 'submitted',
  escalationRequired: true,
  escalationReason: 'Third dispute in 45 days, address changed 2 weeks ago',
  provisionalCreditEligible: true,
  provisionalCreditAmount: 247.50
}

When escalationRequired comes back true, the agent doesn't try to resolve it. It tells the customer that their report has been filed, provides the report reference number, and transfers to the fraud investigation team with full context attached. The fraud team gets the structured data from the intake, the pattern flags from the system, and the conversation transcript from the call. No one starts from scratch.

The cases that don't trigger escalation follow the standard path: report filed, reference number provided, Reg E disclosures delivered, and the customer goes on their way. Which brings us to the regulatory side.

Regulation E and the disclosure problem

Regulation E isn't optional. When a customer reports an unauthorized electronic fund transfer, the bank must inform them of specific rights and timelines. The investigation window. The conditions for provisional credit. The customer's right to request documentation. The process for resolving the dispute.

Human agents are trained to deliver these disclosures. They have scripts. They have checklists. And they still miss them. Compliance teams audit a sample of calls every quarter and consistently find gaps. An agent who handled 200 calls that week forgot the provisional credit disclosure on call 147 because they were rushing through a queue.

AI doesn't forget. Once the disclosure logic is built into the conversation flow, it delivers it every time. The challenge is making sure the logic is correct and verifiable.

This is where scorecards become essential. A compliance scorecard evaluates every dispute call against the required disclosures:

typescript

// Create the scorecard
const { data: scorecard } = await chanl.scorecards.create({
  name: 'Banking Compliance - Dispute Handling',
  description: 'Evaluates Reg E disclosure compliance on every dispute call',
  status: 'active',
  passingThreshold: 80,
  scoringAlgorithm: 'weighted_average',
  industry: 'financial',
  useCase: 'compliance',
})
 
// Add categories and criteria via follow-up calls
// Security category: Identity Verification, PCI Data Handling
// Compliance category: Reg E Timeline, Provisional Credit, Customer Rights
//
// Each criterion is graded per call:
//   - Identity Verification: verified caller before accessing account data
//   - Reg E Timeline Disclosure: informed customer of 10-business-day
//     investigation window (extended to 45 for certain transactions)
//   - Provisional Credit Disclosure: informed customer about eligibility
//   - Customer Rights Notification: informed customer of right to request
//     copies of documents relied upon in investigation
//   - PCI Data Handling: never read full card numbers, account numbers,
//     or SSN aloud during the call

Every dispute call gets graded. Not a sample. Not 10% pulled for quarterly review. Every single one. The scorecard result is attached to the interaction record, creating an audit trail that compliance teams can review by date range, by score, by specific criterion.

When a scorecard flags a call that scored below threshold on Reg E disclosures, it means one of two things: the conversation flow has a gap that needs fixing, or an edge case surfaced that the flow doesn't handle yet. Either way, it's caught the same day, not three months later in an audit.

The difference between human compliance and AI compliance isn't that AI is perfect. It's that AI failures are detectable and fixable at the system level. A human agent who forgets a disclosure has a training problem that takes weeks to identify and months to remediate. An AI agent that misses a disclosure has a configuration problem that gets fixed once and applies to every future call.

Memory across channels

Margaret Chen called on Monday to report a suspicious charge. The agent verified her identity, collected the fraud report details, filed the report (FR-2026-88412), and informed her of the investigation timeline. On Wednesday, Margaret opens the bank's chat to ask about the status.

Without memory, this is a blank slate. The chat agent has no idea Margaret called Monday. It asks her to verify her identity again, explain her issue again, and provide the reference number she may or may not have written down. Margaret has now spent 25 minutes across two interactions repeating the same information.

With memory, the chat agent loads Margaret's recent interaction history at the start of the session:

typescript

// Memory auto-loads at conversation start
// Agent receives context about Margaret's recent interactions:
//
// - March 24: Fraud report filed (FR-2026-88412)
//   Unauthorized charge $247.50 at "ELECTRNX DEPOT"
//   Card locked, replacement ordered
//   Provisional credit: pending (eligible day 10)
//   Investigation status: open

The chat agent greets Margaret and immediately references the open fraud report. "I can see your fraud report from Monday. The investigation is still within the 10-business-day window. Would you like me to check the current status?"

That's the experience that retains customers. Not because the technology is impressive, but because it's respectful of their time.

Memory also handles the less dramatic cases. A customer who called last week to ask about CD rates and mentioned they have $50,000 to invest. When they call back this week, the agent remembers the context. "Last time we spoke, you were looking at CD options for about $50,000. Our 12-month rate is still 4.90% APY. Would you like to proceed with opening one?"

The memory lifecycle for banking follows a specific pattern:

At call start: auto-load the customer's memory profile. Previous interactions, open cases, preferences, account notes.

During the call: store relevant facts as they come up. New preferences, new service requests, important context the customer shares.

typescript

// During conversation, the agent stores relevant facts
await chanl.memory.create({
  agentId: 'westfield-bank-cs',
  entityType: 'customer',
  entityId: 'WB-882491',
  content: 'Customer prefers text message alerts over email for fraud notifications. Updated preference during dispute follow-up call.',
  metadata: {
    source: 'conversation',
    interactionId: 'INT-2026-03-26-4491'
  }
})

After the call: extract structured facts from the conversation. The system pulls out action items, preferences, and commitments made during the call and stores them as searchable memory entries. This happens automatically. The agent doesn't need to remember to take notes.

Across channels: memory is channel-agnostic. Facts stored during a phone call are available in chat, and vice versa. The customer's experience is unified regardless of how they contact the bank.

The security constraint on memory is straightforward: memory entries never contain full account numbers, card numbers, or SSN. They reference accounts by masked identifiers. The memory system stores "Customer disputed $247.50 charge at ELECTRNX DEPOT on account ending 4491" rather than the full account number. PCI compliance applies to stored memory the same way it applies to conversation logs.

Testing before a single customer hears it

A banking AI agent that hasn't been tested against edge cases is a liability, not an asset. Scenario testing simulates realistic customer interactions before the agent handles real calls.

The test suite for a banking agent covers the paths that matter most:

Identity verification edge cases. A caller who gets the last four of their SSN wrong on the first attempt. A caller who provides a date of birth that's off by one day (common with date format confusion). A caller whose phone number doesn't match the account record because they're calling from a different phone.

Compliance-critical conversations. A fraud dispute where the customer is agitated and wants to skip straight to getting their money back. The agent needs to collect the required information and deliver disclosures even when the caller is impatient and trying to rush through the conversation.

Escalation triggers. A customer whose request exceeds the AI agent's authority, such as a wire transfer reversal or a joint account dispute involving a domestic situation. The agent needs to recognize the boundary and transfer cleanly, with full context.

Social engineering attempts. A caller who claims to be calling on behalf of the account holder. A caller who provides partial information and tries to piece together enough to bypass verification. A caller who says "the other agent I spoke with already verified me."

Each scenario runs through the full conversation flow and is graded against scorecards. A scenario that passes the flow but fails on compliance disclosures is a failed test. A scenario that delivers disclosures but leaks PCI data in the conversation is a failed test. Both dimensions matter.

The testing loop is continuous. Every time the agent handles a real call that results in an escalation, a low scorecard score, or negative customer feedback, that interaction becomes a new test scenario. The test suite grows with the agent's experience. Edge cases that were missed in initial testing get captured from production and added to the regression suite.

This is the part that makes compliance teams comfortable. It's not "we tested it once and deployed." It's "every interaction is scored, every failure becomes a test, and the test suite never shrinks."

The architecture in full

Putting the pieces together, a production banking AI agent runs on five interconnected layers:

Knowledge base. Product information, rates, fee schedules, branch details, process guides, regulatory disclosures. Updated regularly. Accessible to all callers regardless of verification status.

Tools. Identity verification, account lookup, card management (lock/unlock/replace), dispute submission, fraud reporting, transfer routing to specialized teams. Each tool enforces its own authorization requirements.

Memory. Customer interaction history, preferences, open cases, and commitments. Auto-loaded at call start, updated during conversation, extracted after call. PCI-compliant storage with masked identifiers.

Scorecards. Compliance (Reg E disclosures, PCI data handling), accuracy (correct account information, accurate rate quotes), security (identity verification before access, social engineering resistance), and resolution quality.

Scenarios. Identity verification edge cases, dispute handling, fraud reporting, escalation triggers, social engineering attempts. Each graded against scorecards. Growing continuously from production data.

Banking AI Agent Architecture

The monitoring layer sits on top of everything. Real-time dashboards show call volume, verification success rates, deflection rates, scorecard scores, and escalation patterns. When the average compliance score drops below threshold, someone investigates that day, not next quarter.

What the numbers look like

The economics of banking AI customer service are not subtle. They're the reason every mid-size bank is evaluating this right now.

Call volume. 8,000 calls per day, 60% tier-1 (routine, no human judgment required). At a 50% AI deflection rate on tier-1 calls, that's 2,400 calls per day handled without a human agent. At an average handling time of 6 minutes per call and a fully loaded agent cost of $22/hour, that's roughly $5,280 per day in direct labor savings.

Wait times. With 2,400 fewer calls hitting the human queue, average wait times drop from 12 minutes to under 3 minutes. Customers who do need a human reach one faster. Customer satisfaction improves not just for AI-handled calls, but for human-handled calls too.

Compliance. 100% of calls graded versus the industry standard of 2-5% audit sampling. Compliance gaps identified in hours instead of quarters. The regulatory risk reduction is harder to quantify in dollars, but every compliance team understands the cost of a finding.

After-hours coverage. The AI agent handles calls 24/7. A customer who discovers a fraudulent charge at 11 PM on a Saturday can report it immediately, get their card locked, and receive a reference number. They don't have to wait until Monday morning. For fraud specifically, response time directly correlates with loss prevention.

Memory-driven retention. Customers who reach resolution without repeating themselves have measurably higher satisfaction scores. The memory system doesn't just improve individual interactions. It changes the customer's perception of the bank. The bank that remembers feels like a bank that cares.

The implementation isn't overnight. A realistic timeline is 8 to 12 weeks from kickoff to production for a Phase 1 deployment covering the top 5 call types: balance inquiries, transaction questions, card lock/replacement, general product information, and dispute intake. Phase 2 adds more complex scenarios over the following 8 weeks. Phase 3 brings multi-channel integration (chat, messaging) and cross-channel memory.

The banks that are deploying this now aren't the largest institutions. They're the mid-size banks with 200,000 to 2,000,000 customers who can't afford to staff a call center that matches their largest competitors, but can afford to build an AI layer that exceeds what those competitors offer.

The ones waiting for the technology to mature are going to discover that their competitors weren't waiting. The question stopped being "can AI handle banking customer service?" sometime in 2025. The question now is whether you build the compliance, security, and memory infrastructure to do it right.

Secure AI Conversations With a Complete Audit Trail

Identity verification, compliance scorecards, and fraud detection. Every call graded, every interaction stored.

See How It Works

Sources & References

Regulation E (Electronic Fund Transfer Act) - Consumer Financial Protection Bureau
PCI DSS v4.0 Requirements and Testing Procedures - PCI Security Standards Council
2025 Banking Customer Experience Report - J.D. Power
AI in Banking: State of the Industry 2026 - McKinsey & Company
Contact Center AI Adoption in Financial Services - Deloitte
FFIEC Guidance on AI in Financial Services - Federal Financial Institutions Examination Council
KYC and Identity Verification Standards for Digital Channels - Bank for International Settlements
The Economic Impact of AI in Customer Service - Gartner

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

banking enterprise compliance security scorecards fraud memory tools customer-experience voice

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.