A private equity firm in New York needs to know whether a $400 million acquisition target actually has the market position its management team claims. Normally, they'd hire McKinsey or BCG, wait six to eight weeks, and write a check for somewhere north of $500,000. Instead, they're using AI voice agents to call hundreds of the target's customers, suppliers, and industry experts — in multiple languages, across time zones — and getting a completed due diligence report for around $50,000.
That's not a hypothetical. That's DiligenceSquared, a YC-backed startup that just raised $5 million to do exactly this. And it's one of the clearest signals yet that voice AI has left the building — the call center building, specifically — and is showing up in places nobody expected.
The Market Shifted While Everyone Watched Customer Service
For years, "voice AI" meant one thing: deflecting support calls. IVR trees. Hold music. "Press 1 for billing." The technology got better, the conversations got more natural, but the use case stayed stubbornly narrow.
That's not the story anymore. VC investment in voice AI jumped from roughly $315 million in 2022 to $2.1 billion by 2024, according to CB Insights data cited by AgentVoice. The voice AI agents market is projected to grow from $2.4 billion in 2024 to $47.5 billion by 2034 — a 34.8% compound annual growth rate. And while a chunk of that money is still flowing toward contact centers, the fastest growth is happening in verticals that weren't on anyone's roadmap three years ago.
Voice AI VC investment growth
What changed? Two things happened simultaneously. First, the underlying models got good enough — low-latency, multilingual, emotionally aware — that voice agents could handle conversations more complex than password resets. Second, developers started asking a different question. Not "can we automate this call?" but "where else do humans talk to gather, verify, or transact information?"
The answers turned out to be everywhere. And that's reshaping not just who buys voice AI, but what it means to build it.
Billion-Dollar Deals, $50K Voice Agents
DiligenceSquared's story is worth unpacking because it represents a category of voice AI application that barely existed eighteen months ago: agents that don't serve customers, but interview them.
The company was founded by Frederik Hansen, a former principal at Blackstone, and Soren Biltoft-Knudsen, a former principal at BCG who led over 40 commercial due diligence projects for the firm's private equity practice. They'd sat on both sides of the table — commissioning million-dollar research reports and producing them — and realized the core workflow was fundamentally a phone-based research exercise. Talk to customers. Talk to industry insiders. Synthesize what you hear. Write a report.
Voice agents, it turns out, are terrifyingly good at that workflow. DiligenceSquared's system generates a research blueprint, identifies relevant interview targets, and then deploys multilingual AI agents to conduct structured conversations with customers and market experts. Human analysts handle quality assurance and synthesis, but the data collection — traditionally the most expensive and time-consuming part — is automated.
The result? Work that McKinsey bills at roughly $1 million, delivered for $50,000. And they're not the only ones. Bridgetown Research raised a $19 million Series A co-led by Accel and Lightspeed in early 2025, using a similar playbook. Their system can produce an initial due diligence analysis in 24 hours with inputs from hundreds of respondents.
Think about what's happening here. Voice agents aren't replacing customer service reps. They're replacing management consultants. That's a $400 billion-plus market built on the assumption that human conversations require human interviewers. DiligenceSquared has already completed engagements with PE firms representing more than $2 trillion in combined assets under management. The assumption is cracking.
9 Million Burger Orders and Counting
While Wall Street firms are deploying voice agents to evaluate acquisition targets, fast food chains are deploying them to take your order. And the scale is staggering.
SoundHound's voice AI platform now powers over 10,000 restaurant locations across the United States, handling ordering for brands you'd recognize: Panda Express, Jersey Mike's, IHOP, Chipotle, White Castle. Restaurant order activity crossed 9 million calls in Q4 2025 alone, up "strong double digits" year-over-year.
But this isn't just phone answering with extra steps. SoundHound's system is agentic — it understands what customers are ordering and acts on those requests automatically. It handles modifications ("no onions, extra cheese"), upsells ("would you like to add a drink?"), and routes orders directly into the POS system. The company has expanded beyond drive-thru into Call-to-Order, Text-to-Order, Scan-to-Order, and even In-Car Voice Ordering.
Jersey Mike's started with 50 locations and expanded from there. Panda Express rolled into dozens more locations in Q4 2025. What's notable isn't just the adoption — it's the retention. Restaurants aren't piloting and abandoning. They're piloting and scaling.
Why does a sub sandwich shop need voice AI? Because phone orders are a terrible experience for everyone involved. The employee juggling a phone while making sandwiches. The customer repeating "TURKEY" three times over background noise. The lost orders, the wrong toppings, the people who just hang up. SoundHound estimated its system had processed 100 million customer interactions by late 2024. That's 100 million conversations that used to require a human holding a phone in one hand and a pen in the other.
The Vertical Explosion Nobody Predicted
M&A research and restaurant ordering are just two data points in a much broader pattern. Voice AI is colonizing verticals at a pace that's hard to track.
Healthcare is arguably the furthest along. Hippocratic AI raised $141 million at a $1.64 billion valuation to build patient-facing voice agents. Their system has completed millions of patient calls with average satisfaction ratings near 9 out of 10. Medical model usage grew 15x year-over-year as developers moved from demos into live clinical workflows — patient intake, clinical documentation, insurance verification. Nearly half of US hospitals plan to implement some form of voice AI by 2026. Healthcare customers using these systems reported returning over 30 million minutes to their workforce and hitting 21x ROI on documentation workflows.
Education and language learning found a natural fit. Apps like Speak replaced the most expensive part of language education — a live conversation partner — with an always-available AI tutor. a16z's 2025 voice AI update specifically flagged this category, noting that voice agents "democratize services like language learning that were previously inaccessible." It's not a stretch to say voice AI tutors will be how most people learn their second language within the next five years.
Insurance is getting quietly overhauled. Voice agents now handle first notice of loss intake, policy questions, claims status updates, and follow-ups — with insurers seeing 35% reductions in average call duration and 28% increases in first-call resolution. Fraud detection systems using voice AI identify fraudulent claims 50% faster, reducing fraudulent payouts by up to 40%.
Real estate is another sleeper. If you've ever tried to reach a real estate agent on the weekend, you know the problem. Voice agents now handle lead qualification, property information, and showing scheduling for firms like CBRE and Greystar. With 62% of calls to small businesses going unanswered and 85% of those callers never calling back, the ROI case is straightforward.
And then there are the verticals that feel genuinely surprising. Voice agents as sales coaches and corporate trainers, running realistic role-play simulations. Voice agents conducting market research surveys. Voice agents doing debt collection — FDCPA and TCPA compliant, calling borrowers about upcoming payments 24/7. Manufacturing suppliers using voice agents for purchase order confirmations and logistics coordination.
The pattern? Anywhere humans use phone calls to gather, verify, or transact — voice AI can probably do it cheaper, faster, and at scale. And the verticals keep multiplying. Hospitality chains are deploying voice agents for reservation management and concierge services. Automotive dealerships use them for service appointment scheduling and recall notifications. Government agencies are piloting them for benefits enrollment and permit inquiries. Each new vertical validates the same thesis: voice was never just a customer service channel. It was the default interface for half the world's business processes. We just didn't have the technology to automate it until now.
Why General-Purpose Voice Agents Break in Specialized Domains
Here's where it gets tricky for developers. The same voice agent that handles "I'd like to cancel my subscription" falls apart when a cardiologist asks about contraindications for beta-blockers, or when a PE analyst needs to probe a customer about competitive switching behavior in the Nordic market.
Domain specialization isn't optional anymore. Speechmatics reported that medical models trained on 16 billion words of clinical conversations delivered keyword error rates 70% lower than general-purpose alternatives. That's not a marginal improvement — it's the difference between "metoprolol" and "metropolis" in a patient consultation.
The challenges stack up fast when you move beyond customer service:
Vocabulary and accuracy. Every vertical has its own jargon. Legal contracts, medical terminology, financial instruments, restaurant menus with 47 modifications. General-purpose speech models guess. Domain-specific models know. And in regulated industries, a guess can be a compliance violation.
Latency requirements vary wildly. A drive-thru order needs sub-300ms response times or the conversation feels broken. A due diligence interview can tolerate longer pauses because the context is more formal. A patient intake call falls somewhere in between. You can't tune once and deploy everywhere.
Compliance isn't one-size-fits-all. Healthcare needs HIPAA. Financial services need SOX and FINRA compliance. Insurance claims need audit trails. Debt collection has FDCPA and TCPA requirements. Each vertical brings its own regulatory framework, and "we'll add compliance later" is how companies get sued.
Multilingual complexity. DiligenceSquared's agents interview people across Europe and Asia. SoundHound handles orders in multiple languages at the same location. Code-switching mid-sentence — jumping between languages in a single utterance — is becoming the norm in production, not an edge case. A restaurant in Miami might need English-Spanish switching within a single order. A due diligence call with a Nordic executive might drift between English and Swedish. If your system can't handle that gracefully, you're losing data — or worse, misinterpreting it.
The upshot? Testing a voice agent against "how do I reset my password" scenarios tells you nothing about whether it'll survive a 20-minute structured interview with a CFO in German. You need scenario-based testing that mirrors the actual conversations your agent will face in its specific vertical, with domain-appropriate personas and evaluation criteria.
Building Voice Agents for Verticals You Haven't Touched Yet
If you're a developer looking at these new verticals, the playbook looks different from building a customer support bot. Here's what teams shipping vertical-specific voice agents are learning the hard way.
Start with conversation design, not model selection. The biggest mistake is jumping straight to "which LLM" or "which TTS provider." Map the actual conversations first. What does a 15-minute due diligence interview look like? What's the decision tree for a complex insurance claim? How does a restaurant order flow handle a customer who changes their mind three times? The conversation architecture determines everything downstream.
Build domain-specific evaluation. Generic metrics like word error rate don't capture what matters in specialized verticals. A medical voice agent needs scorecards that measure clinical accuracy, appropriate escalation, and regulatory compliance — not just "did it understand the words." A restaurant ordering agent needs to be evaluated on order accuracy, upsell success rate, and average handling time. Define your quality rubric before you write a line of code.
Invest in vertical-specific tool integration. A voice agent that can have a great conversation but can't actually do anything is a parlor trick. Restaurant agents need POS integration. Healthcare agents need EHR connectivity. Financial agents need CRM and compliance system hooks. The voice layer is the interface; the tool layer is where value gets created.
Memory matters more than you think. In customer service, most calls are one-shot interactions. In specialized verticals, conversations compound. A due diligence agent needs to remember what a previous interviewee said about market dynamics. A patient-facing agent needs to recall medication history across visits. Persistent memory that carries context across conversations is what separates a useful agent from a frustrating one.
Test against realistic chaos. Production conversations in specialized domains are messy. People ramble. They use idioms the model hasn't seen. They get emotional — a patient describing symptoms, a business owner worried about an acquisition, a restaurant customer who's hangry and impatient. Your test scenarios need to reflect this. Controlled lab environments produce controlled lab results, and those results lie.
Plan for regulatory audit from day one. In healthcare, financial services, and insurance, you don't just need your agent to work — you need to prove it works. That means conversation logging, decision tracing, and the ability to show an auditor exactly why your agent said what it said. Teams that bolt on compliance after launch end up rebuilding half their stack. Teams that bake it in from the start ship faster because they're not afraid of what production will reveal.
The Backend Problem Nobody's Talking About
The voice layer — speech-to-text, language models, text-to-speech — gets all the attention. And it should; the improvements over the past two years are genuinely remarkable. But as voice AI expands into high-stakes verticals, a different bottleneck is emerging.
It's the infrastructure beneath the voice.
How do you test whether your M&A research agent asks the right follow-up questions when an interviewee gives a vague answer? How do you measure whether your healthcare agent correctly identifies when to escalate versus when to provide information? How do you ensure your restaurant ordering agent handles a customer with severe allergies without error, every single time?
These aren't voice problems. They're quality assurance, testing, and observability problems. And they get harder as the stakes go up. A misrouted support ticket is annoying. A misunderstood medication is dangerous. A botched due diligence interview could influence a billion-dollar investment decision.
The teams succeeding in these new verticals aren't the ones with the best voice models. They're the ones with the best analytics and observability — the ability to understand what their agents are actually doing in production, catch failures before they compound, and continuously improve against domain-specific quality benchmarks.
Voice AI's escape from the call center is the easy part. Building the backend infrastructure that lets these agents operate reliably in high-stakes, specialized domains — that's what separates the demos from the deployments. Platforms like Chanl are built for exactly this problem: giving teams the testing, scoring, and monitoring infrastructure they need regardless of which vertical their voice agents serve.
The next wave of voice AI won't be defined by which company has the most natural-sounding voice. It'll be defined by who can ship agents that actually work — reliably, compliantly, at scale — in domains where getting it wrong has real consequences.
The call center was just the tutorial level. The real game starts now.
Chanl Team
AI Agent Testing Platform
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
Get AI Agent Insights
Subscribe to our newsletter for weekly tips and best practices.



