How do I choose between zero-shot and few-shot prompting for a new task?

Start with zero-shot and explicit format constraints. If the model's output drifts in format or misses subtle classification rules after 5-10 test inputs, add 2-3 few-shot examples covering your trickiest edge cases. Few-shot is more expensive in tokens, so only escalate when you have evidence zero-shot isn't reliable enough.

Why does chain-of-thought prompting reduce hallucination in math and logic tasks?

Forcing step-by-step reasoning creates intermediate checkpoints the model must satisfy before reaching a conclusion. Each step constrains the next, making it harder to jump to a plausible-sounding but incorrect answer. It also gives you a debuggable trace — you can see exactly where reasoning went wrong.

How many few-shot examples should I include, and how do I pick them?

Two to five examples is the sweet spot. Choose examples that cover boundary conditions — null values, multi-label cases, ambiguous inputs — rather than obvious happy paths. Each example should teach the model something a plain instruction can't easily convey.

What's the difference between prompt chaining and a single long prompt?

A chain runs multiple focused prompts sequentially, passing output forward. Each step can use a different model, temperature, or evaluation criteria. A single long prompt is cheaper in API calls but harder to debug — when something fails, you can't tell which part broke. Chains trade latency for observability.

How do you prevent an LLM from ignoring safety constraints in the system prompt?

Put safety rules at the very top of the system prompt under a clearly labeled section like 'CRITICAL RULES'. Repeat the most important constraint at the end. Add negative prompts for specific failure patterns you've observed. Then test adversarially — prompt injection attempts, edge cases, and ambiguous requests.

Can I use meta-prompting to automatically fix prompts that fail in production?

Yes, but with a human in the loop. Feed the failing input, expected output, and actual output back to the model and ask it to diagnose the failure and suggest a revised prompt. Review the suggestion before deploying — the model's fix might over-correct and break other cases. Pair this with regression tests.

How does the ReAct pattern handle tool errors or timeouts?

Return the error as a tool result instead of crashing. The model treats errors as observations and can reason about alternatives — retrying, trying a different tool, or telling the user what happened. Always set a maximum loop iteration count to prevent infinite tool-call cycles.

Prompt Engineering from First Principles: 12 Techniques Every AI Developer Needs

Most prompt engineering guides give you recipes without explaining why they work. When a recipe fails, you're stuck. This guide builds from first principles — twelve techniques with runnable TypeScript, so you understand not just what to do but when each technique earns its keep. Once you've built your prompts, you'll want to evaluate them systematically rather than relying on gut feeling.

Technique	What it does	When to use it
Zero-shot	Instruction with no examples	Straightforward tasks with clear output format
Few-shot	2-5 input/output examples	Consistent formatting, domain-specific labels
Chain-of-thought	Step-by-step reasoning	Math, logic, multi-step problems
System/role prompting	Persona + domain knowledge	Every production agent
Structured output	JSON/XML format enforcement	Downstream parsing, API responses
Template variables	Dynamic values at runtime	Reusable prompts across contexts
Instruction hierarchy	Priority-ordered rules	Complex agents with many constraints
Negative prompting	Explicit "do not" boundaries	Patching observed failure patterns
Self-consistency	Majority vote across runs	High-stakes classification decisions
Prompt chaining	Pipeline of focused steps	Complex multi-step workflows
ReAct	Reasoning + tool calling loop	Agents that need external data
Meta-prompting	LLM improves its own prompts	Prompt optimization from failure cases

1. Zero-shot prompting

The simplest technique: give the model an instruction with no examples. Classification, summarization, extraction — anything where the expected behavior is obvious from the instruction alone.

The catch is that the model interprets ambiguity generously. Ask it to classify a review and you'll get a paragraph instead of a label:

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 256,
  messages: [
    { role: "user", content: "Is this review positive or negative? 'The food was amazing but the service was painfully slow.'" }
  ],
});
 
// Output: A long paragraph discussing nuances of the review

Constrain the output format and the problem disappears:

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 64,
  messages: [
    {
      role: "user",
      content: `Classify this customer review as exactly one of: POSITIVE, NEGATIVE, or MIXED.
Respond with only the classification label, nothing else.
 
Review: "The food was amazing but the service was painfully slow."
 
Classification:`
    }
  ],
});
 
// Output: "MIXED"

Zero-shot works when you make the task unambiguous. Specify the output format, constrain the response space, don't leave room for interpretation.

2. Few-shot prompting

When zero-shot produces inconsistent formatting or misinterprets your intent, show the model what you want instead of describing it. Two to five input/output examples are usually enough.

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 128,
  messages: [
    {
      role: "user",
      content: `Extract the action items from meeting notes. Format each as a JSON object.
 
Meeting notes: "John will send the Q3 report by Friday. Sarah needs to review the API docs."
Action items:
[{"owner": "John", "task": "Send Q3 report", "deadline": "Friday"},
 {"owner": "Sarah", "task": "Review API docs", "deadline": null}]
 
Meeting notes: "We agreed to postpone the launch. Mike will update the roadmap and notify stakeholders by EOD."
Action items:
[{"owner": "Mike", "task": "Update roadmap", "deadline": "EOD"},
 {"owner": "Mike", "task": "Notify stakeholders", "deadline": "EOD"}]
 
Meeting notes: "Lisa will schedule a follow-up for next week. The team should review the new pricing tiers before then."
Action items:`
    }
  ],
});
 
// Output: [{"owner": "Lisa", "task": "Schedule follow-up", "deadline": "next week"},
//          {"owner": "Team", "task": "Review new pricing tiers", "deadline": "next week"}]

Notice what the examples teach implicitly: the JSON schema, field names, null handling for missing deadlines, how to normalize vague time references. You didn't write a specification — you demonstrated one. Choose examples that cover edge cases (like that null deadline) and your few-shot prompt becomes a spec by demonstration.

3. Chain-of-thought (CoT)

Asking the model to show its reasoning before giving a final answer is the difference between a student scribbling an answer and showing their work. It's essential for math, logic, and code debugging — anywhere jumping straight to an answer leads to errors.

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
// Without CoT — the model often gets this wrong
const naive = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 256,
  messages: [
    {
      role: "user",
      content: "A customer bought 3 items at $12.99 each, has a 15% discount coupon, and shipping is $5.99 for orders under $40 but free for orders $40+. What's the total?"
    }
  ],
});
 
// With CoT — dramatically more accurate
const withCoT = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 512,
  messages: [
    {
      role: "user",
      content: `A customer bought 3 items at $12.99 each, has a 15% discount coupon, and shipping is $5.99 for orders under $40 but free for orders $40+. What's the total?
 
Think through this step-by-step:
1. Calculate the subtotal
2. Apply the discount
3. Determine shipping cost
4. Calculate the final total
 
Show your reasoning, then give the final answer.`
    }
  ],
});
 
// Output:
// 1. Subtotal: 3 × $12.99 = $38.97
// 2. Discount: $38.97 × 0.15 = $5.85, so after discount: $38.97 - $5.85 = $33.12
// 3. Shipping: $33.12 < $40, so shipping is $5.99
// 4. Final total: $33.12 + $5.99 = $39.11

The real win isn't just accuracy — it's debuggability. When the model gets it wrong, you can see where the reasoning broke down. That's invaluable in production systems where you need to understand failures, not just detect them.

4. System prompts and role prompting

System prompts define who the model is, what it knows, and how it should behave — before any user interaction happens. This is the backbone of every production AI agent.

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 1024,
  system: `You are a senior billing support specialist at a SaaS company.
 
Your knowledge:
- Subscription tiers: Free, Pro ($29/mo), Enterprise ($99/mo)
- Billing cycles are on the 1st of each month
- Prorated refunds are available within 14 days of a charge
- You can issue credits but cannot process direct refunds — those go to the billing team
 
Your behavior:
- Always verify the customer's email before discussing account details
- Be empathetic but concise — customers contacting billing are often frustrated
- If a request is outside your authority, explain what you CAN do and offer to escalate
- Never guess at account balances or charge amounts — say you'll look it up`,
  messages: [
    {
      role: "user",
      content: "I was charged twice this month and I want my money back."
    }
  ],
});
 
// The model responds in character: asks for email verification,
// acknowledges the frustration, explains the refund escalation process

This prompt does four things well: defines the domain (billing), sets boundaries (what the agent can and can't do), establishes tone (empathetic but concise), and provides guardrails (never guess at numbers). That combination — knowledge, authority limits, tone, and safety rails — is what separates a useful agent from a liability.

If you're managing prompts across multiple agents, version control becomes critical. Platforms with dedicated prompt management let you iterate without redeploying code.

5. Structured output

Getting consistent, parseable output is one of the most practically important skills in production AI. You've got two good options: OpenAI's response_format for guaranteed JSON, or XML tags with Claude for nested structures.

typescript

import OpenAI from "openai";
 
const client = new OpenAI();
 
const response = await client.chat.completions.create({
  model: "gpt-4o",
  response_format: { type: "json_object" },
  messages: [
    {
      role: "system",
      content: "You extract structured data from customer support messages. Always respond in JSON."
    },
    {
      role: "user",
      content: `Extract the following from this message:
- intent (one of: billing_question, technical_issue, feature_request, cancellation, general_inquiry)
- urgency (low, medium, high)
- entities (any products, features, or account details mentioned)
 
Message: "My enterprise dashboard has been showing wrong analytics data since Tuesday.
This is blocking our quarterly review tomorrow — we need this fixed ASAP."
 
Respond as JSON with keys: intent, urgency, entities, summary`
    }
  ],
});
 
const parsed = JSON.parse(response.choices[0].message.content!);
// {
//   "intent": "technical_issue",
//   "urgency": "high",
//   "entities": ["enterprise dashboard", "analytics", "quarterly review"],
//   "summary": "Analytics data showing incorrect information since Tuesday, blocking quarterly review"
// }

Claude handles XML tags particularly well for cases where you need nested structure:

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 512,
  messages: [
    {
      role: "user",
      content: `Analyze this support ticket and provide your analysis in the following XML format:
 
<analysis>
  <intent>the primary intent</intent>
  <urgency>low|medium|high</urgency>
  <sentiment>positive|neutral|negative</sentiment>
  <recommended_action>what should happen next</recommended_action>
</analysis>
 
Ticket: "I've been a customer for 3 years and I love your product, but this new update
completely broke the reporting feature. I need it fixed or I'll have to look at alternatives."
 
Provide only the XML, no other text.`
    }
  ],
});
 
// Output:
// <analysis>
//   <intent>technical_issue</intent>
//   <urgency>high</urgency>
//   <sentiment>negative</sentiment>
//   <recommended_action>Escalate to engineering for reporting feature regression fix;
//   acknowledge loyalty and provide timeline</recommended_action>
// </analysis>

JSON is easier to parse programmatically. XML is easier for the model to produce correctly because closing tags act as natural delimiters. Use JSON with response_format when available, XML for Claude or when you need nested structures the model can reason about.

6. Template variables

Real-world prompts aren't static strings. Template variables separate prompt logic from prompt data — injecting customer names, plan tiers, knowledge base content at runtime so a single prompt works across different contexts.

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
// Define your template
const supportTemplate = (vars: {
  agentName: string;
  companyName: string;
  customerName: string;
  productTier: string;
  knowledgeBase: string;
}) => `You are ${vars.agentName}, a support agent for ${vars.companyName}.
 
The customer you're speaking with:
- Name: ${vars.customerName}
- Plan: ${vars.productTier}
- ${vars.productTier === "Enterprise" ? "This is a high-priority account. Prioritize their request." : "Standard priority."}
 
Reference knowledge:
${vars.knowledgeBase}
 
Respond helpfully and reference specific documentation when possible.`;
 
// Use with different contexts
const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 1024,
  system: supportTemplate({
    agentName: "Alex",
    companyName: "Acme Cloud",
    customerName: "Sarah Chen",
    productTier: "Enterprise",
    knowledgeBase: "- SSO is configured via Settings > Security > SAML\n- Enterprise accounts have dedicated support SLAs",
  }),
  messages: [
    { role: "user", content: "How do I set up SSO for my team?" }
  ],
});

This becomes especially powerful when you store templates externally — product managers can tweak the agent's personality, update knowledge references, or adjust per-tier behavior without touching code. That's exactly the kind of workflow prompt management tools are built for.

7. Instruction hierarchy

Models tend to follow instructions that appear later in a prompt more reliably, and they lose track of constraints mentioned early on. Structuring your prompt with explicit priority levels prevents the model from "forgetting" critical rules.

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
// Bad: Important constraint buried in the middle
const weakPrompt = `Help the customer with their billing question.
Never reveal internal pricing formulas or discount approval thresholds.
Be friendly and conversational.
If they ask about competitor pricing, redirect to our value proposition.
Always confirm the customer's identity before sharing account details.
Use their first name when possible.`;
 
// Good: Structured with clear priority levels
const strongPrompt = `## CRITICAL RULES (never violate)
1. Verify customer identity (email + last 4 of card) before sharing ANY account details
2. Never reveal internal pricing formulas or discount approval thresholds
3. Never share other customers' information
 
## RESPONSE GUIDELINES
- Be friendly and conversational; use the customer's first name
- If asked about competitor pricing, redirect to our value proposition
- Keep responses under 3 sentences unless the customer asks for detail
 
## KNOWLEDGE
- Current plans: Starter ($19/mo), Growth ($49/mo), Scale ($149/mo)
- Billing cycle: 1st of each month
- Refund window: 30 days`;
 
const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 1024,
  system: strongPrompt,
  messages: [
    { role: "user", content: "What discount can you give me?" }
  ],
});

Three layers: critical rules (must never be broken), response guidelines (should be followed), and reference knowledge (context). This mirrors how Anthropic recommends structuring prompts. When your agent has dozens of instructions, this hierarchy is what keeps the important ones from getting buried.

8. Negative prompting

Sometimes telling the model what not to do is more effective than trying to describe everything it should do. This is your go-to fix when you've observed specific failure patterns in testing — rather than rewriting the entire positive instruction set, you add targeted guardrails.

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 1024,
  system: `You are a medical information assistant providing general health education.
 
DO:
- Provide general health information from established medical sources
- Suggest the user consult a healthcare provider for personal medical questions
- Use clear, accessible language
 
DO NOT:
- Diagnose conditions or interpret symptoms for a specific person
- Recommend specific medications, dosages, or treatment plans
- Say "you probably have..." or "it sounds like you might have..."
- Provide information about self-harm methods
- Contradict established medical consensus (e.g., vaccine safety)
- Use phrases like "I'm not a doctor, but..." — instead, directly state the general information and recommend professional consultation`,
  messages: [
    {
      role: "user",
      content: "I've been having chest pain for two days. What do I have?"
    }
  ],
});
 
// Output explains that chest pain has many possible causes (general info),
// strongly recommends seeking immediate medical attention,
// does NOT attempt to diagnose

That "DO NOT" list came from real failure observations — models tend to hedge with "I'm not a doctor, but..." and then effectively diagnose anyway. When you're running scenario-based testing against your agents, you'll discover these patterns quickly, and negative prompts are the fastest way to patch them.

9. Self-consistency

Run the same prompt multiple times, collect the answers, pick the one that shows up most often. It's a brute-force reliability technique — if three out of five runs agree, you can trust the result more than any single run. Worth the extra API cost for classification, data extraction, or content moderation where accuracy matters more than latency.

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
async function classifyWithConsistency(
  text: string,
  labels: string[],
  runs: number = 5
): Promise<{ label: string; confidence: number }> {
  const prompt = `Classify the following text into exactly one category: ${labels.join(", ")}.
Respond with only the category label.
 
Text: "${text}"
 
Category:`;
 
  // Run multiple classifications in parallel
  const results = await Promise.all(
    Array.from({ length: runs }, () =>
      client.messages.create({
        model: "claude-sonnet-4-5-20250514",
        max_tokens: 32,
        temperature: 0.7, // Some variation to get diverse reasoning paths
        messages: [{ role: "user", content: prompt }],
      })
    )
  );
 
  // Count votes
  const votes: Record<string, number> = {};
  for (const result of results) {
    const label = (result.content[0] as { text: string }).text.trim();
    votes[label] = (votes[label] || 0) + 1;
  }
 
  // Find majority
  const sorted = Object.entries(votes).sort((a, b) => b[1] - a[1]);
  const [topLabel, topCount] = sorted[0];
 
  return {
    label: topLabel,
    confidence: topCount / runs,
  };
}
 
// Usage
const result = await classifyWithConsistency(
  "Your product is okay I guess, but I expected more for the price",
  ["positive", "negative", "neutral", "mixed"]
);
 
console.log(result);
// { label: "mixed", confidence: 0.8 }  — 4 out of 5 runs agreed

The temperature: 0.7 is deliberate. At temperature 0, you'd get the same answer every time — defeating the purpose. You want enough variation to surface alternative interpretations, then let majority vote resolve the ambiguity. Three to five runs hits the sweet spot between reliability and cost.

10. Prompt chaining

Instead of asking the model to do everything at once, build a pipeline where each step handles one task and passes its output to the next. A single monolithic prompt that handles research, analysis, formatting, and quality checks is fragile. A chain of focused prompts is testable, debuggable, and individually improvable.

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
async function ask(system: string, user: string): Promise<string> {
  const res = await client.messages.create({
    model: "claude-sonnet-4-5-20250514",
    max_tokens: 1024,
    system,
    messages: [{ role: "user", content: user }],
  });
  return (res.content[0] as { text: string }).text;
}
 
async function analyzeCustomerFeedback(feedback: string) {
  // Step 1: Extract structured data
  const extracted = await ask(
    "You extract structured data from customer feedback. Respond in JSON only.",
    `Extract from this feedback:
- main_issue: the primary complaint or praise
- product_area: which part of the product (billing, UI, performance, support, other)
- emotion: the customer's emotional state (frustrated, satisfied, confused, angry, neutral)
- has_churn_risk: boolean
 
Feedback: "${feedback}"`
  );
 
  // Step 2: Generate response draft using extracted context
  const draft = await ask(
    `You are a customer success manager drafting responses to feedback.
Use the structured analysis provided to craft an appropriate response.`,
    `Based on this analysis:
${extracted}
 
Original feedback: "${feedback}"
 
Draft a response that:
1. Acknowledges their specific concern
2. Addresses the emotional tone appropriately
3. Provides a concrete next step`
  );
 
  // Step 3: Quality check the draft
  const qualityCheck = await ask(
    "You are a QA reviewer for customer communications. Be critical.",
    `Review this draft response for a customer:
 
Draft: "${draft}"
 
Check for:
1. Does it sound genuine (not robotic)?
2. Does it make promises we might not keep?
3. Is the tone appropriate for someone who is ${extracted}?
 
Respond with: APPROVED or NEEDS_REVISION with specific feedback.`
  );
 
  return { extracted: JSON.parse(extracted), draft, qualityCheck };
}
 
const result = await analyzeCustomerFeedback(
  "I've been waiting 3 weeks for a response to my support ticket. This is unacceptable for an enterprise customer paying $10k/month."
);

Each step has a single responsibility. You can test them independently, swap models per step (cheaper model for extraction, more capable one for drafting), and add steps without rewriting the whole pipeline. If you're measuring quality with scorecards, you can score each step individually to pinpoint exactly where your chain breaks down.

11. ReAct pattern

ReAct (Reasoning + Acting) is what makes an AI agent an agent rather than a chatbot. The model reasons about what it needs, calls a tool, observes the result, then reasons again. Here's the full loop:

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
// Define available tools
const tools: Anthropic.Messages.Tool[] = [
  {
    name: "lookup_order",
    description: "Look up a customer order by order ID. Returns order status, items, and shipping info.",
    input_schema: {
      type: "object" as const,
      properties: {
        order_id: { type: "string", description: "The order ID (e.g., ORD-12345)" },
      },
      required: ["order_id"],
    },
  },
  {
    name: "check_inventory",
    description: "Check current inventory level for a product SKU.",
    input_schema: {
      type: "object" as const,
      properties: {
        sku: { type: "string", description: "Product SKU" },
      },
      required: ["sku"],
    },
  },
];
 
// Simulate tool execution
function executeTool(name: string, input: Record<string, string>): string {
  if (name === "lookup_order") {
    return JSON.stringify({
      order_id: input.order_id,
      status: "shipped",
      tracking: "1Z999AA10123456784",
      items: [{ sku: "WIDGET-100", name: "Premium Widget", qty: 2 }],
      estimated_delivery: "2026-03-10",
    });
  }
  if (name === "check_inventory") {
    return JSON.stringify({ sku: input.sku, in_stock: 47, warehouse: "US-West" });
  }
  return "Unknown tool";
}
 
// ReAct loop
async function handleCustomerQuery(query: string) {
  const messages: Anthropic.Messages.MessageParam[] = [
    { role: "user", content: query },
  ];
 
  let response = await client.messages.create({
    model: "claude-sonnet-4-5-20250514",
    max_tokens: 1024,
    system: "You are a customer support agent. Use the available tools to look up real data before answering. Never guess at order details.",
    tools,
    messages,
  });
 
  // Loop while the model wants to use tools
  while (response.stop_reason === "tool_use") {
    const toolUseBlocks = response.content.filter(
      (block): block is Anthropic.Messages.ToolUseBlock => block.type === "tool_use"
    );
 
    // Execute each tool call
    const toolResults: Anthropic.Messages.ToolResultBlockParam[] = toolUseBlocks.map((block) => ({
      type: "tool_result" as const,
      tool_use_id: block.id,
      content: executeTool(block.name, block.input as Record<string, string>),
    }));
 
    // Feed results back
    messages.push({ role: "assistant", content: response.content });
    messages.push({ role: "user", content: toolResults });
 
    response = await client.messages.create({
      model: "claude-sonnet-4-5-20250514",
      max_tokens: 1024,
      system: "You are a customer support agent. Use the available tools to look up real data before answering. Never guess at order details.",
      tools,
      messages,
    });
  }
 
  return response.content;
}
 
await handleCustomerQuery("Where is my order ORD-12345? Can I add another widget to it?");
 
// The model:
// 1. Reasons: "I need to look up this order first"
// 2. Acts: calls lookup_order("ORD-12345")
// 3. Observes: order is shipped, contains 2 Premium Widgets
// 4. Reasons: "Order is already shipped, can't modify. Let me check widget inventory for a new order"
// 5. Acts: calls check_inventory("WIDGET-100")
// 6. Observes: 47 in stock
// 7. Responds: explains order status, offers to place new order for additional widget

The tools your agent has access to define what it can actually do — and the quality of your tool descriptions directly impacts how reliably the model chooses the right one. If you're connecting agents to external systems, MCP provides a standardized protocol for tool discovery and execution that pairs naturally with this pattern.

12. Meta-prompting

Instead of manually iterating on prompt text, ask the model to critique and improve your prompts. This is underrated — the model often knows what it responds well to better than you do.

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
async function improvePrompt(originalPrompt: string, failureExample: string): Promise<string> {
  const response = await client.messages.create({
    model: "claude-sonnet-4-5-20250514",
    max_tokens: 2048,
    messages: [
      {
        role: "user",
        content: `You are an expert prompt engineer. I have a prompt that isn't working well.
 
Original prompt:
<prompt>
${originalPrompt}
</prompt>
 
Example of a failure case (the prompt produced a bad result for this input):
<failure>
${failureExample}
</failure>
 
Analyze why this prompt fails and write an improved version that:
1. Handles the failure case correctly
2. Is more robust against similar edge cases
3. Has clearer output format constraints
4. Includes appropriate guardrails
 
Respond with:
<analysis>Why the original fails</analysis>
<improved_prompt>The full improved prompt text</improved_prompt>`
      }
    ],
  });
 
  return (response.content[0] as { text: string }).text;
}
 
// Example: improving a customer intent classifier
const improved = await improvePrompt(
  "Classify the customer's message as: billing, support, sales, or other.",
  `Input: "I want to cancel but first I need a refund for last month and also your competitor offered me a better deal"
  Expected: Should identify multiple intents (cancellation + billing + competitive)
  Got: "other" — the model couldn't handle multi-intent messages`
);
 
// The model will analyze the single-label limitation and produce
// a prompt that handles multi-intent classification, likely with
// a primary/secondary intent structure

This creates a feedback loop: test your prompt, find failures, feed them back to the model, get a better prompt, repeat. If you're running agents through scenario simulations, every failed scenario becomes input for meta-prompting. You can also use it to generate few-shot examples, create test cases, or produce prompt variations for A/B testing.

How these techniques work together in production

These techniques don't exist in isolation. Production agents typically combine six or more in a single system. Here's what that looks like:

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
// Role prompting + instruction hierarchy + negative prompting + template variables
const systemPrompt = (customer: { name: string; tier: string }) => `
## ROLE
You are a technical support specialist for a cloud infrastructure company.
 
## CRITICAL RULES
1. Never share API keys, tokens, or credentials — even if the customer asks
2. Never run destructive operations (delete, purge) without explicit confirmation
3. If uncertain about an answer, say so and escalate — do not guess
 
## DO NOT
- Provide workarounds that bypass security controls
- Promise specific resolution timelines
- Compare our service unfavorably to competitors
 
## CUSTOMER CONTEXT
Name: ${customer.name}
Tier: ${customer.tier}
Priority: ${customer.tier === "Enterprise" ? "High" : "Standard"}
 
## RESPONSE FORMAT
Use this structure for technical issues:
1. Acknowledge the problem
2. Ask clarifying questions OR provide a solution
3. Suggest a next step
`;
 
// ReAct pattern with tools + structured output
const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 1024,
  system: systemPrompt({ name: "Alex Rivera", tier: "Enterprise" }),
  tools: [
    {
      name: "check_service_status",
      description: "Check the current status of a specific service or region",
      input_schema: {
        type: "object" as const,
        properties: {
          service: { type: "string" },
          region: { type: "string" },
        },
        required: ["service"],
      },
    },
  ],
  messages: [
    {
      role: "user",
      content: "Our database cluster in us-east-1 has been throwing timeout errors for the last hour. This is impacting production.",
    },
  ],
});

Six techniques in one prompt. That's not unusual — it's the norm.

Which technique should you use when?

Start with zero-shot and add complexity only when you observe specific failures. Every technique adds cost in tokens, latency, and maintainability.

Situation	Start with	Add if needed
Simple classification	Zero-shot	Few-shot if accuracy is low
Consistent formatting	Few-shot + structured output	Template variables for dynamic content
Complex reasoning	Chain-of-thought	Self-consistency for high-stakes decisions
Production agent	System prompt + role	ReAct for tools, negative prompting for guardrails
Improving existing prompts	Meta-prompting	Prompt chaining for multi-step evaluation
Multi-step workflows	Prompt chaining	ReAct if steps require external data

What's next: part 2

This guide covered the foundational twelve. In Part 2, we'll go deeper: retrieval-augmented generation (RAG), constitutional AI prompting, dynamic tool selection, and prompt optimization at scale. We'll also build automated evaluation pipelines so you can measure whether your prompt changes actually improve performance.

Pick one technique you haven't tried and apply it to a real problem this week. The best way to internalize these patterns is to see them fail, understand why, and iterate.

If you're building agents and want to see where your prompts succeed and fall short in production, analytics dashboards can surface those patterns across real conversations.

Progress0/12

Sources & References

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

prompt-engineering llm typescript learning-ai

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.