I upgraded a production agent from Claude Sonnet 4.5 to Claude Opus 4.6 on a Tuesday morning. By Tuesday afternoon, every request was returning 400 errors. Not some requests — all of them. The agent had been prefilling assistant messages to steer responses, and that pattern silently broke in 4.6. No deprecation warning in the 4.5 response. No heads-up in the migration guide I skimmed. Just a hard 400.
I wasn't alone. In February 2026, LiveKit filed GitHub issue #4907 — Claude 4.6's prefilling removal immediately broke their entire Claude integration for voice and video agent pipelines. The prefilling removal wasn't a deprecation with a grace period. It was a hard 400 error that crashed production agents on day one. If one of the largest real-time communication platforms got caught off guard, you can assume plenty of smaller teams did too.
That's the kind of thing this article exists for. Anthropic released Claude 4.6 with a list of features that sounds transformative — adaptive thinking, a million-token context window, automatic compaction, tool search — and most of it genuinely is. But if you're building AI agents in production, what matters is knowing what breaks, what costs what, and where the new features actually change your code.
Prerequisites and setup
You'll need Node.js 18+ or Python 3.10+, an Anthropic API key, and a terminal. Install the SDK for your language:
# TypeScript
npm install @anthropic-ai/sdk
# Python
pip install anthropicCreate a .env file with your API key:
ANTHROPIC_API_KEY=sk-ant-...The model IDs you'll use throughout this article:
| Model | ID | Release Date |
|---|---|---|
| Claude Opus 4.6 | claude-opus-4-6-20260205 | Feb 5, 2026 |
| Claude Sonnet 4.6 | claude-sonnet-4-6-20260217 | Feb 17, 2026 |
If you're new to Claude's tool use API, the tool system deep dive covers the execution loop mechanics.
What changed in Claude 4.6 (the TL;DR)
Claude 4.6 is the largest single-release API surface change since Claude 3 introduced tool use. Nine features shipped across Opus and Sonnet, three things broke, and one pricing decision eliminated the main objection to large context windows.
Here's the full feature matrix — Opus 4.6 vs Sonnet 4.6 vs the 4.5 models they replace:
| Feature | Opus 4.6 | Sonnet 4.6 | Opus 4.5 | Sonnet 4.5 |
|---|---|---|---|---|
| Context window | 1M (GA) | 1M (GA) | 200K | 200K |
| Max output tokens | 128K | 64K | 64K | 16K |
| Adaptive thinking | Yes | Yes | No | No |
| Compaction API | Yes | Yes | No | No |
| Tool search | Yes | Yes | Yes | Yes |
| Structured outputs | GA | GA | Beta | Beta |
| Web search tool | v20260209 | v20260209 | v20250305 | v20250305 |
| Fast mode | Preview | No | No | No |
| Data residency | Yes | Yes | No | No |
| Agent Teams | Preview | No | No | No |
What broke
Three changes will bite you if you upgrade without reading the docs:
- Prefilling assistant messages — Returns a 400 error. No fallback, no flag to re-enable.
thinking: {type: "enabled"}withbudget_tokens— Deprecated. Still works today, will be removed.output_format— Moved tooutput_config.format. The old key still works but logs a deprecation warning.
If you're running agents in production, check your code for all three before upgrading. The prefill one is the silent killer — it worked fine on 4.5.
Adaptive thinking: the end of budget_tokens
Adaptive thinking is the single most important change for agent developers. It replaces the guesswork of setting a fixed thinking budget with a system that scales reasoning depth automatically based on what Claude is actually being asked to do.
The old way (deprecated)
Previously, you had to guess how many tokens Claude should spend thinking:
// OLD — deprecated on 4.6
const response = await anthropic.messages.create({
model: "claude-opus-4-5-20250129",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000,
},
messages: [{ role: "user", content: "Look up order #ORD-48291" }],
});The problem: a simple order lookup doesn't need 10,000 tokens of reasoning. But a complex multi-tool workflow — "compare this customer's purchase history against our return policy and recommend the best resolution" — might need every token. With a fixed budget, you either overspend on simple queries or underthink complex ones.
The new way: effort levels
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic();
// Simple tool call — low effort
const simpleResponse = await anthropic.messages.create({
model: "claude-opus-4-6-20260205",
max_tokens: 16000,
thinking: {
type: "adaptive",
effort: "low",
},
messages: [{ role: "user", content: "What time is it in Tokyo?" }],
});
// Complex reasoning — high effort (default)
const complexResponse = await anthropic.messages.create({
model: "claude-opus-4-6-20260205",
max_tokens: 16000,
thinking: {
type: "adaptive",
effort: "high",
},
messages: [
{
role: "user",
content:
"Review this customer's last 5 interactions, identify the recurring issue, and draft a resolution plan that addresses the root cause.",
},
],
});The Python equivalent:
import anthropic
client = anthropic.Anthropic()
# Adaptive thinking with effort control
response = client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=16000,
thinking={
"type": "adaptive",
"effort": "high",
},
messages=[
{
"role": "user",
"content": "Analyze this support transcript and identify where the agent should have escalated.",
}
],
)
# Access thinking content
for block in response.content:
if block.type == "thinking":
print(f"Reasoning: {block.thinking}")
elif block.type == "text":
print(f"Response: {block.text}")When to use each effort level
| Effort | Use Case | Agent Example |
|---|---|---|
low | Simple lookups, classification | "What's this customer's plan tier?" |
medium | Standard tool selection, single-step tasks | "Cancel this order and send confirmation" |
high | Multi-step reasoning, policy interpretation | "This customer wants a refund but is outside the window — what are our options?" |
max | Complex analysis, multi-tool orchestration | "Review all interactions from this week, identify systemic issues, and draft a report" |
For most agent workloads, high (the default) handles the sweet spot: Claude thinks deeply when the query is complex and skips unnecessary reasoning for straightforward requests. Set low for high-throughput, latency-sensitive operations like real-time classification. Use max sparingly — it burns through output tokens for analysis tasks that genuinely need deep reasoning.
1M tokens at standard pricing
Claude 4.6 ships with a 1-million-token context window — and unlike previous long-context options, there's no premium pricing. You pay the same rate per token whether you're using 10K tokens or 900K.
What this means for agent architecture
The 1M context window changes three architectural patterns:
Full conversation history. Instead of summarizing or truncating old messages, agents can maintain the complete conversation — including all tool calls, results, and reasoning — for sessions that run into the hundreds of turns. This is the difference between an agent that "remembers" what happened ten minutes ago and one that actually has the full record.
RAG with less chunking pressure. With 200K tokens, you had to aggressively chunk and rank documents before injecting them. With 1M, you can include entire documents, full policy manuals, or complete knowledge base sections. The RAG architecture patterns still apply — you still want retrieval over brute-force context stuffing — but the ceiling is dramatically higher.
Multi-agent context sharing. When one agent hands off to another, the receiving agent can ingest the full history of the previous conversation without lossy summarization. The customer doesn't have to repeat anything.
Pricing comparison
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window |
|---|---|---|---|
| Opus 4.6 | $5 | $25 | 1M |
| Sonnet 4.6 | $3 | $15 | 1M |
| GPT-4o | $2.50 | $10 | 128K |
| Gemini 1.5 Pro | $1.25 | $5 | 2M |
Cost analysis: worked example
Instead of saying "the math works," here's the actual math for a customer support agent:
Average session: 15,000 input tokens + 3,000 output tokens
Volume: 1,000 sessions/day, 30 days/month
Sonnet 4.6:
Input: $3/1M × 15,000 × 1,000 × 30 = $1,350/month
Output: $15/1M × 3,000 × 1,000 × 30 = $1,350/month
Total: $2,700/month
Opus 4.6:
Input: $5/1M × 15,000 × 1,000 × 30 = $2,250/month
Output: $25/1M × 3,000 × 1,000 × 30 = $2,250/month
Total: $4,500/month
With adaptive thinking on Sonnet (est. 40% of sessions skip extended thinking):
Output savings: ~$540/month
Effective total: ~$2,160/monthSonnet at $2,700/month handles 30,000 customer conversations. That's $0.09 per conversation. For most support operations, that's a fraction of what a human agent costs per ticket. The no-premium pricing on the 1M context window is what makes this viable — previously, long-context requests came with a multiplier that broke the economics at high volume.
The compaction API: infinite conversations for agents
Even with 1M tokens, long-running agent sessions will eventually hit the ceiling. The compaction API solves this by automatically summarizing older context when you approach the limit — enabling conversations that run indefinitely.
Compaction is server-side and automatic — you enable it and Claude handles the summarization.
How it works
Code example
Compaction works with the standard messages API — you enable it and the API handles the rest:
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic();
// Long-running agent conversation with compaction
async function runAgentLoop(
conversationHistory: Anthropic.MessageParam[]
) {
const response = await anthropic.messages.create({
model: "claude-opus-4-6-20260205",
max_tokens: 8192,
thinking: { type: "adaptive", effort: "high" },
// Enable compaction for long-running sessions
compaction: { enabled: true },
system:
"You are a customer support agent with access to order, billing, and account tools.",
messages: conversationHistory,
tools: supportTools,
});
return response;
}Before compaction, the standard approach was to manually summarize conversations on the client side — writing your own summarization prompts, deciding what to keep, managing the context budget yourself. That pattern still works if you need fine-grained control over what gets preserved. But for most agent workflows, server-side compaction is less code and better results.
If you've built a custom memory system for your agents, compaction complements it. Compaction handles the in-conversation context window. Your persistent memory system handles cross-conversation recall — customer preferences, resolution history, learned facts. They solve different problems.
Tool search: 85% fewer tokens for large tool libraries
If your agent has more than a handful of tools, you've felt this problem: every tool definition eats context tokens. An agent with 30 tools might burn 15K-20K tokens on tool definitions alone before the conversation even starts.
Tool search fixes this by letting you defer tool loading. Instead of dumping all 30 tool definitions into the context, you mark most of them as deferred. Claude gets a single "tool search" tool plus your critical, always-needed tools. When Claude needs a deferred tool, it searches dynamically.
The defer_loading pattern
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic();
// Define tools — most are deferred
const tools: Anthropic.Tool[] = [
// Always loaded — core tools used in every conversation
{
name: "lookup_customer",
description: "Look up a customer by email, phone, or account ID",
input_schema: {
type: "object" as const,
properties: {
identifier: {
type: "string",
description: "Email, phone number, or account ID",
},
},
required: ["identifier"],
},
},
// Deferred — only loaded when Claude searches for them
{
name: "process_refund",
description:
"Process a refund for a specific order. Requires order ID and reason.",
input_schema: {
type: "object" as const,
properties: {
orderId: { type: "string" },
reason: { type: "string" },
amount: { type: "number" },
},
required: ["orderId", "reason"],
},
// @ts-expect-error — defer_loading is a new field
defer_loading: true,
},
{
name: "schedule_callback",
description:
"Schedule a callback for a customer at a specific time.",
input_schema: {
type: "object" as const,
properties: {
customerId: { type: "string" },
scheduledTime: { type: "string", format: "date-time" },
reason: { type: "string" },
},
required: ["customerId", "scheduledTime"],
},
// @ts-expect-error — defer_loading is a new field
defer_loading: true,
},
// ... 25 more deferred tools
];
const response = await anthropic.messages.create({
model: "claude-opus-4-6-20260205",
max_tokens: 4096,
thinking: { type: "adaptive" },
tools,
messages: [
{
role: "user",
content: "I need to return order #ORD-77123",
},
],
});Claude sees lookup_customer (always loaded) and the tool search capability. When the user mentions a return, Claude searches for refund-related tools, discovers process_refund, and uses it — without the other 25+ tool definitions ever entering the context.
The numbers
Anthropic's internal testing showed accuracy improved with tool search, not just token efficiency:
| Model | Without tool search | With tool search |
|---|---|---|
| Opus 4 | 49% | 74% |
| Opus 4.5 | 79.5% | 88.1% |
Note: Most benchmarks cited in this article come from Anthropic's own documentation. Independent third-party benchmarks for Claude 4.6's agent features are still limited as of March 2026.
Fewer tools in context means less confusion about which tool to pick. For agents managing complex tool libraries — especially those using MCP to expose tools from multiple servers — this is a meaningful architectural improvement.
Structured outputs: finally GA
Structured outputs — getting Claude to return valid JSON matching a specific schema — graduated from beta to GA on Claude 4.6. The API change is small but matters: output_format moved to output_config.format.
Before (beta)
// OLD — beta header required, output_format at top level
const response = await anthropic.messages.create({
model: "claude-sonnet-4-5-20250514",
max_tokens: 1024,
// Required beta header
betas: ["structured-outputs-2025-01-24"],
messages: [{ role: "user", content: "Extract the customer issue" }],
// Old location
output_format: {
type: "json_schema",
json_schema: {
name: "customer_issue",
schema: issueSchema,
},
},
});After (GA)
// NEW — no beta header, output_config.format
const response = await anthropic.messages.create({
model: "claude-opus-4-6-20260205",
max_tokens: 1024,
messages: [
{
role: "user",
content: "Extract the customer issue from this transcript",
},
],
output_config: {
format: {
type: "json_schema",
json_schema: {
name: "customer_issue",
schema: {
type: "object",
properties: {
category: {
type: "string",
enum: [
"billing",
"shipping",
"product",
"account",
"other",
],
},
severity: {
type: "string",
enum: ["low", "medium", "high", "critical"],
},
summary: { type: "string" },
actionRequired: { type: "boolean" },
},
required: [
"category",
"severity",
"summary",
"actionRequired",
],
},
},
},
},
});The Python equivalent:
response = client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=1024,
messages=[
{"role": "user", "content": "Extract the customer issue from this transcript"}
],
output_config={
"format": {
"type": "json_schema",
"json_schema": {
"name": "customer_issue",
"schema": {
"type": "object",
"properties": {
"category": {
"type": "string",
"enum": ["billing", "shipping", "product", "account", "other"],
},
"severity": {
"type": "string",
"enum": ["low", "medium", "high", "critical"],
},
"summary": {"type": "string"},
"actionRequired": {"type": "boolean"},
},
"required": ["category", "severity", "summary", "actionRequired"],
},
},
}
},
)For agent builders, structured outputs eliminate the "parse and pray" pattern. When your agent needs to return structured tool results to a downstream system — a CRM update, a ticket creation, an analytics event — you get guaranteed schema compliance instead of hoping the JSON is valid.
Web search and code execution
Claude 4.6 adds native web search and code execution as first-party tools. Combined with Claude's multimodal capabilities for processing images, PDFs, and documents, this makes Claude 4.6 agents capable of research-driven workflows.
This matters for agents that need real-time data — stock prices, shipping status from carrier APIs, current product availability — that can't be pre-loaded into the agent's knowledge base.
const response = await anthropic.messages.create({
model: "claude-opus-4-6-20260205",
max_tokens: 4096,
thinking: { type: "adaptive" },
tools: [
{
type: "web_search_20260209",
name: "web_search",
// Dynamic filtering enabled by default on 4.6
},
],
messages: [
{
role: "user",
content:
"What are the current shipping rates for FedEx Ground from New York to Los Angeles?",
},
],
});The code execution layer runs for free when paired with web search or web fetch — Claude filters results programmatically before they consume context tokens, improving both accuracy and cost efficiency.
Opus 4.6 vs Sonnet 4.6: which one for your agent?
Sonnet 4.6 dropped twelve days after Opus 4.6, and the benchmarks tell a story most developers don't expect: Sonnet is close enough to Opus that the default choice should be Sonnet.
| Benchmark | Opus 4.6 | Sonnet 4.6 | Gap |
|---|---|---|---|
| SWE-bench Verified | 80.8% | 79.6% | 1.2% |
| OSWorld (GUI automation) | 72.7% | 72.5% | 0.2% |
| Math | — | 89% (vs 62% on Sonnet 4.5) | — |
| Speed | 20-30 t/s | 40-60 t/s | 2x faster |
| Cost (input/output) | $5 / $25 | $3 / $15 | ~40% cheaper |
| Max output tokens | 128K | 64K | 2x on Opus |
Decision framework
Start with Sonnet 4.6 when:
- Your agent handles high-volume, latency-sensitive conversations
- Tool selection and basic reasoning are the primary tasks
- You need to keep costs predictable at scale
- 64K output tokens is sufficient (it usually is)
Escalate to Opus 4.6 when:
- You need deep multi-step reasoning across many documents
- 128K output tokens matters (long analysis, large code generation)
- You're using Agent Teams for coordinated multi-agent work
- Complex policy interpretation or nuanced decision-making
The practical pattern: run Sonnet as your default, route complex requests to Opus. Most agent platforms support model routing — classify the incoming request, pick the model. Simple order lookup? Sonnet. "Review my entire account history and recommend a plan change"? Opus.
Breaking changes and migration
Three things will break your code if you upgrade to 4.6 model IDs without changes. Here's each one with the fix.
1. Prefilling assistant messages (hard break)
If you've been using assistant message prefilling to guide Claude's output format or behavior, it will not work on 4.6. The API returns a 400 error.
This is more widespread than it sounds. Prefilling was a common workaround for several patterns that now have first-class solutions:
JSON mode workaround. Before structured outputs existed, the standard way to get JSON from Claude was to prefill the opening brace:
// BREAKS on 4.6 — returns 400
messages: [
{ role: "user", content: "Classify this ticket" },
{ role: "assistant", content: '{"category": "' }, // prefill to force JSON
]Persona injection. Prefilling was used to force Claude into a specific voice or persona from the first token:
// BREAKS on 4.6 — returns 400
messages: [
{ role: "user", content: "Help me with my order" },
{ role: "assistant", content: "Hi! I'm Alex from support. " }, // prefill persona
]
// FIX: System prompt handles persona
messages: [
{ role: "user", content: "Help me with my order" }
],
system: "You are Alex from support. Always introduce yourself by name."Format enforcement in streaming. Some implementations prefilled format markers to ensure streaming responses started with the expected structure — a header, a particular greeting, or a structured preamble.
Function calling setup. Before native tool use was mature, prefilling was used to steer Claude toward calling specific functions by starting the response with a function call pattern.
Where to look in your codebase: Search for any role: "assistant" message that appears as the last element in the messages array. That's a prefill. Also check for helper functions that append assistant messages before API calls — these are often buried in utility layers.
Fix for JSON mode: Use structured outputs (now GA):
// WORKS on 4.6 — structured outputs replace prefilling for JSON
const response = await anthropic.messages.create({
model: "claude-opus-4-6-20260205",
max_tokens: 1024,
messages: [{ role: "user", content: "Classify this ticket" }],
output_config: {
format: {
type: "json_schema",
json_schema: {
name: "ticket_classification",
schema: {
type: "object",
properties: {
category: {
type: "string",
enum: ["billing", "shipping", "product", "account"],
},
},
required: ["category"],
},
},
},
},
});Fix for persona injection: Move to system prompts. They're more reliable than prefilling ever was, and they work across turns without needing to re-inject.
Fix for format enforcement: Structured outputs handle this natively. If you need free-form text with a specific structure, describe the format in your system prompt — Claude 4.6 follows formatting instructions more reliably than 4.5 did.
2. budget_tokens deprecation (soft break)
Your code won't break today, but it will when Anthropic removes support:
// DEPRECATED — still works, but switch now
thinking: { type: "enabled", budget_tokens: 10000 }
// REPLACEMENT
thinking: { type: "adaptive", effort: "high" }3. output_format rename (soft break)
// DEPRECATED
output_format: { type: "json_schema", ... }
// REPLACEMENT
output_config: { format: { type: "json_schema", ... } }Migration checklist
- Search codebase for prefilled assistant messages (role: assistant with partial content)
- Replace all assistant prefills with structured outputs or system prompts
- Replace thinking.type enabled + budget_tokens with adaptive thinking
- Move output_format to output_config.format
- Update model IDs to claude-opus-4-6-20260205 or claude-sonnet-4-6-20260217
- Test all tool definitions still work (no schema changes, but verify)
- Run full agent test suite against new model before deploying
What can go wrong
Every feature above has edge cases and unknowns that are worth considering before you build your architecture around them.
Autonomous agents with unrestricted tool access. The stakes of getting agent architecture wrong extend beyond bad responses. In December 2025, as reported by the Financial Times in February 2026, Amazon's Kiro AI agent was given a straightforward task to fix a minor issue in AWS Cost Explorer. With operator-level permissions and no mandatory peer review for AI-initiated changes, Kiro autonomously deleted and recreated an AWS production environment, triggering a 13-hour outage. This is the extreme version of every tool-use failure mode in this article — an agent with the right tools, the right permissions, and no behavioral guardrails to prevent catastrophic actions. As Claude 4.6 makes agents more capable, the gap between what an agent can do and what it should do gets more consequential.
Compaction: what gets lost? When compaction summarizes older turns, you lose the verbatim content. For most support conversations, this is fine — the summary preserves intent and key facts. But if your agent needs exact quotes, specific numbers from earlier in the conversation, or auditability of what was said in turn 12, you can't rely on compaction to preserve it. There's no API to inspect what the compacted summary contains or to control what gets prioritized during summarization. If you need that level of control, client-side summarization with your own prompts is still the right approach.
Adaptive thinking: debugging the black box. Adaptive thinking decides how much to reason based on query complexity, but you can't see the heuristic. When Claude under-thinks a complex query (produces a shallow answer with effort: "high"), or over-thinks a simple lookup (burns tokens reasoning about a straightforward classification), your only lever is changing the effort level. There's no way to inspect why Claude chose a particular thinking depth, which makes debugging inconsistent behavior harder than it was with explicit budget_tokens.
Web search: accuracy and freshness. The web search tool is useful but not infallible. Search results can be stale, inaccurate, or misleading — the same problems any search engine has. Claude doesn't verify the accuracy of search results before incorporating them into responses. For agents that make decisions based on web data (current prices, policy changes, regulatory information), you should validate critical facts through your own APIs rather than trusting web search alone. Rate limits on the web search tool are also not well documented.
Tool search accuracy. The 49% to 74% improvement on MCP evaluations is significant, but those are Anthropic's own benchmarks tested under their conditions. Your tool library's descriptions, naming conventions, and overlap between tools all affect how well tool search works in practice. Poorly described tools won't be found when needed. Ambiguously named tools may be found when they shouldn't be.
The 1M context window isn't free to fill. No pricing premium doesn't mean no cost. A 500K-token context at Sonnet rates costs $1.50 per request in input tokens alone. At 1,000 requests/day, that's $45,000/month just in input costs. The context window is a ceiling, not a target — retrieval and chunking are still the right default for most workloads.
Building a Claude 4.6 agent: putting it all together
Here's a complete working agent that combines adaptive thinking, tool use, and structured outputs — the patterns you'd use in a production customer support agent:
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic();
// Define tools with deferred loading for non-critical ones
const tools: Anthropic.Tool[] = [
{
name: "lookup_customer",
description:
"Look up customer details by email, phone, or account ID. Returns name, plan, account status, and recent activity.",
input_schema: {
type: "object" as const,
properties: {
identifier: {
type: "string",
description: "Customer email, phone, or account ID",
},
},
required: ["identifier"],
},
},
{
name: "search_orders",
description:
"Search orders by customer ID, order number, or date range. Returns order details including status and tracking.",
input_schema: {
type: "object" as const,
properties: {
customerId: { type: "string" },
orderId: { type: "string" },
status: {
type: "string",
enum: [
"pending",
"shipped",
"delivered",
"returned",
"cancelled",
],
},
},
required: [],
},
},
];
// Simulated tool execution
async function executeTool(
name: string,
input: Record<string, unknown>
): Promise<string> {
switch (name) {
case "lookup_customer":
return JSON.stringify({
id: "cust_8291",
name: "Sarah Chen",
email: "sarah@example.com",
plan: "business",
accountStatus: "active",
memberSince: "2024-08-15",
});
case "search_orders":
return JSON.stringify({
orders: [
{
id: "ORD-77123",
status: "delivered",
items: ["Widget Pro X2"],
deliveredAt: "2026-03-10",
total: 149.99,
},
],
});
default:
return JSON.stringify({ error: `Unknown tool: ${name}` });
}
}
// Agent loop with adaptive thinking
async function runAgent(userMessage: string) {
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: userMessage },
];
console.log(`\nUser: ${userMessage}\n`);
// Agent loop — handle tool use
while (true) {
const response = await anthropic.messages.create({
model: "claude-sonnet-4-6-20260217",
max_tokens: 4096,
thinking: { type: "adaptive", effort: "high" },
system:
"You are a customer support agent. Use tools to look up real data before answering. Be specific and helpful.",
tools,
messages,
});
// Check for tool use
const toolBlocks = response.content.filter(
(b) => b.type === "tool_use"
);
if (toolBlocks.length === 0) {
// No tools — extract text response
const textBlock = response.content.find(
(b) => b.type === "text"
);
if (textBlock && textBlock.type === "text") {
console.log(`Agent: ${textBlock.text}`);
}
break;
}
// Execute tools and continue
messages.push({ role: "assistant", content: response.content });
const toolResults: Anthropic.ToolResultBlockParam[] = [];
for (const tool of toolBlocks) {
if (tool.type === "tool_use") {
console.log(
` [Tool] ${tool.name}(${JSON.stringify(tool.input)})`
);
const result = await executeTool(
tool.name,
tool.input as Record<string, unknown>
);
console.log(` [Result] ${result.substring(0, 100)}...`);
toolResults.push({
type: "tool_result",
tool_use_id: tool.id,
content: result,
});
}
}
messages.push({ role: "user", content: toolResults });
}
}
// Run it
runAgent(
"Hi, I'm sarah@example.com. I received order ORD-77123 but the product is damaged. What are my options?"
);This agent uses Sonnet 4.6 with adaptive thinking — it thinks deeply about the customer's situation (damaged product, needs options) without burning tokens on the simple tool lookups. In production, you'd add error handling, timeouts, and the compaction API for long sessions. You'd also route complex cases to Opus 4.6 using a classifier.
What's coming next
Three features from the 4.6 release are in research preview, which means they work but the API will change:
Agent Teams. Multiple subagents that coordinate autonomously. In Claude Code, this means spinning up agents that each handle a different part of a codebase review. For customer-facing agents, this could mean a routing agent, a research agent, and a response agent working in parallel. The API for agent coordination isn't public yet — it's exposed through the Claude Agent SDK (renamed from Claude Code SDK).
Fast mode. Opus 4.6 only. Up to 2.5x faster output token generation at premium pricing. For latency-critical agent applications — real-time voice, live chat — this could make Opus viable where speed previously forced you to use Sonnet.
Data residency controls. The inference_geo parameter lets you specify where model inference runs ("us" or "global"). For agents handling PII, healthcare data, or financial information, this is the compliance primitive you've been asking for.
Build agents with tools, memory, and observability
See how Chanl works with Claude 4.6 agents.
Learn more- What is new in Claude 4.6 — Anthropic API Documentation
- Introducing Claude Opus 4.6 — Anthropic
- Introducing Claude Sonnet 4.6 — Anthropic
- Adaptive Thinking — Claude API Docs
- Compaction API — Claude API Docs
- Introducing Advanced Tool Use — Anthropic Engineering
- Building Agents with the Claude Agent SDK — Anthropic Engineering
- Web Search Tool — Claude API Docs
- Structured Outputs — Claude API Docs
Co-founder
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
Learn Agentic AI
One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.



