Reference Guide

6 Agent Design Patterns

The patterns that separate production agents from toy demos. Know when to use each β€” and when not to.

01
Tool-Using Agent
The foundation pattern β€” an agent that can call external functions
TOOL-USING AGENT β€” EXECUTION FLOW
User
Request
β†’
πŸ€– Agent
decides & plans
⇄
TOOLS (parallel or sequential)
πŸ” Web
Search
βš™οΈ API
Call
πŸ’Ύ File
System
← results return to agent β†’
β†’
Final
Response

The tool-using agent is the foundation of all useful AI agents. The model receives a task from the user, reasons about what information or actions it needs, decides which tools to invoke, gets results back, and either calls more tools or returns a final answer. This single loop β€” reason, act, observe β€” is what makes agents genuinely powerful.

Tools can be anything you can express as a function: web search, database queries, calculators, file readers, email senders, REST API calls, code interpreters. The model never directly executes code β€” it requests tool calls by name with structured arguments, your code runs them, and returns results as text. The model then continues reasoning with those results in context.

This pattern works because language models are strong reasoners but weak executors. They cannot browse the internet, run code, or check today's date on their own β€” but they can plan and reason exceptionally well. Tools supply the execution muscle; the model supplies the intelligence to orchestrate them.

βœ“ When to Use

  • Tasks that need real-time or external data (prices, weather, news)
  • Any task requiring reading or writing to external systems (files, databases, APIs)
  • When the model's training data is insufficient or potentially stale

βœ— When NOT to Use

  • Simple Q&A that a single prompt can handle β€” no tools needed
  • Latency-critical paths where even one tool call is too slow

Real Example

A research assistant agent. User asks: "What are the latest papers on transformer attention mechanisms?" The agent calls search_web("transformer attention mechanisms 2024 arxiv"), reads the top 3 result URLs, extracts abstracts, then writes a formatted summary with citations. Without tools, the model could only recall training data β€” with tools, it accesses today's research.

Python β€” Basic tool-calling loop with Anthropic SDK
import anthropic

client = anthropic.Anthropic()

# Define tools the agent can call
tools = [
    {
        "name": "search_web",
        "description": "Search the web for current information",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }
]

messages = [{"role": "user", "content": "What's the current price of Bitcoin?"}]

# Agent loop β€” runs until model returns end_turn
while True:
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        tools=tools,
        messages=messages
    )

    if response.stop_reason == "end_turn":
        # Model is done β€” extract and return final text
        for block in response.content:
            if hasattr(block, 'text'):
                print(block.text)
        break

    if response.stop_reason == "tool_use":
        messages.append({"role": "assistant", "content": response.content})
        tool_results = []

        for block in response.content:
            if block.type == "tool_use":
                # Execute the tool and capture result
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result
                })

        # Return results to model and loop again
        messages.append({"role": "user", "content": tool_results})
02
Router Agent
Classify intent and delegate to the right specialist
ROUTER AGENT β€” INTENT CLASSIFICATION & DELEGATION
Incoming
Request
β†’
🧭 Router
classifies intent
β†’
billing
β†’
πŸ’³ Billing
Specialist
tech
β†’
πŸ”§ Tech
Specialist
retain
β†’
🀝 Retention
Specialist
β†’
Specialist
Response

A router agent reads an incoming request and delegates it to the most appropriate specialist agent. The router itself is deliberately lightweight β€” its only job is to classify intent accurately and quickly. It doesn't try to solve the problem itself. This separation is important: a small, fast model can be an excellent router even if it's terrible at domain-specific tasks.

Specialist agents are optimized for their narrow domains. A billing specialist has billing data access, billing-specific instructions, and billing-focused tool access. A tech support specialist has product documentation, debug tools, and a system prompt full of troubleshooting patterns. Neither bleeds into the other. This isolation makes specialists much more reliable than a single "do everything" agent.

The business case for routing is also about cost. You can route simple, well-defined queries to a smaller, cheaper model (like claude-haiku-4-5) and only escalate complex or ambiguous requests to a more powerful model. This can reduce inference costs by 80% or more on high-volume applications without any user-visible degradation in quality.

βœ“ When to Use

  • General-purpose assistants covering multiple domains or use cases
  • Products where some tasks are simple (use small model) and some complex (use large model)
  • Customer support systems with clearly separated issue categories

βœ— When NOT to Use

  • When all incoming requests need identical handling β€” routing adds latency for no gain
  • When the domains heavily overlap and classification would be unreliable
  • Single-purpose tools where specialization is already baked in

Real Example

A customer support system for a SaaS product. The router reads each message and routes: "my invoice is wrong" β†’ Billing Agent (has Stripe access, refund tools); "the API keeps timing out" β†’ Tech Agent (has docs, log access, escalation tools); "I want to cancel" β†’ Retention Agent (has discount tools, cancellation flow). Each specialist runs with a system prompt and toolset tailored to exactly that problem domain.

Python β€” Router classification and delegation
import anthropic
import json

client = anthropic.Anthropic()

ROUTER_PROMPT = """Classify this customer message into exactly one category.
Return a JSON object with a single key "category" and one of these values:
- "billing" β€” payment, invoice, refund, subscription, pricing
- "technical" β€” bugs, errors, API, performance, features
- "retention" β€” cancel, too expensive, switching, leaving
- "general" β€” anything else

Respond with JSON only. No explanation."""

def route_request(message: str) -> str:
    """Use a fast model to classify intent"""
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # Cheap router
        max_tokens=64,
        system=ROUTER_PROMPT,
        messages=[{"role": "user", "content": message}]
    )
    result = json.loads(response.content[0].text)
    return result["category"]

def run_specialist(category: str, message: str) -> str:
    """Delegate to the right specialist with its system prompt"""
    specialists = {
        "billing": {
            "model": "claude-haiku-4-5-20251001",
            "system": "You are a billing specialist. You have access to Stripe...",
        },
        "technical": {
            "model": "claude-opus-4-6",  # Complex β€” use big model
            "system": "You are a technical support engineer...",
        },
        "retention": {
            "model": "claude-opus-4-6",
            "system": "You are a customer success specialist. Your goal...",
        },
    }

    spec = specialists.get(category, specialists["technical"])
    response = client.messages.create(
        model=spec["model"],
        max_tokens=1024,
        system=spec["system"],
        messages=[{"role": "user", "content": message}]
    )
    return response.content[0].text

# Usage
user_message = "I want to cancel my subscription, it's too expensive"
category = route_request(user_message)        # β†’ "retention"
reply = run_specialist(category, user_message) # β†’ retention specialist runs
03
Pipeline (Sequential) Agent
Chain specialized stages where each output becomes the next input
PIPELINE AGENT β€” SEQUENTIAL STAGE PROCESSING
Raw
Input
β†’
Stage 1
πŸ•·οΈ Scraper
fetch content
β†’
Stage 2
πŸ“ Summarizer
condense text
β†’
Stage 3
βœ… Fact-Check
verify claims
β†’
Stage 4
🎨 Formatter
structure output
β†’
Published
Article
each stage receives the previous stage's output as its input

In a pipeline agent, each stage is a specialized agent that does exactly one job β€” and does it well. Stage 1's output becomes Stage 2's input. Stage 2's output becomes Stage 3's input, and so on. This linear dataflow creates a transformation chain: raw input enters one end, refined output emerges from the other. Each stage has its own system prompt, toolset, and model selection optimized for its specific task.

The key benefit of pipelines is fault isolation. If Stage 3 fails, you don't need to rerun Stages 1 and 2 β€” you just rerun Stage 3 with the checkpoint output from Stage 2. This makes pipelines practical for expensive, multi-step workflows where intermediate results are worth preserving. You can also run expensive stages (like using a big model to fact-check) only when needed, and use cheap stages everywhere else.

Pipelines are also highly debuggable. You can inspect the output of each stage independently to find where quality degraded. This is a major advantage over monolithic agents where failures are hard to localize. The tradeoff is latency β€” stages run sequentially, so total latency is the sum of all stage latencies. This makes pipelines unsuitable for real-time applications but excellent for batch jobs.

βœ“ When to Use

  • Document processing pipelines (ingest β†’ extract β†’ transform β†’ store)
  • Content workflows where quality gates between stages matter
  • Multi-step data transformations with clear stage boundaries

βœ— When NOT to Use

  • Real-time user-facing tasks where total latency is critical
  • When stages are so tightly coupled that separation creates more friction than value

Real Example

A content publishing pipeline for a newsletter. Stage 1 (Scraper Agent): fetches 20 article URLs from an RSS feed. Stage 2 (Summarizer Agent): condenses each article to 3 sentences. Stage 3 (Curator Agent): selects the 5 most relevant, scores them by topic fit. Stage 4 (Writer Agent): writes an editorial intro connecting the articles. Stage 5 (Formatter Agent): assembles HTML email. Each stage is independently retryable and debuggable.

Python β€” Pipeline where each stage's output feeds the next
import anthropic

client = anthropic.Anthropic()

def run_stage(stage_name: str, system: str, content: str, model: str = "claude-opus-4-6") -> str:
    """Run a single pipeline stage and return its output"""
    print(f"[Pipeline] Running stage: {stage_name}")
    response = client.messages.create(
        model=model,
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": content}]
    )
    output = response.content[0].text
    print(f"  β†’ {len(output)} chars output")
    return output

def run_content_pipeline(raw_articles: str) -> str:
    """Run the full content pipeline sequentially"""

    # Stage 1: Summarize raw content
    summaries = run_stage(
        "summarizer",
        system="Extract the key points from each article as bullet points. Be concise.",
        content=raw_articles,
        model="claude-haiku-4-5-20251001"  # Cheap model for summarization
    )

    # Stage 2: Fact-check (only run on summaries, not full articles)
    verified = run_stage(
        "fact-checker",
        system="Review these summaries. Flag any claims that seem unverified or exaggerated.",
        content=summaries,
        model="claude-opus-4-6"  # Use powerful model for reasoning
    )

    # Stage 3: Format into final output
    formatted = run_stage(
        "formatter",
        system="Format the verified summaries into a professional markdown newsletter.",
        content=verified,
        model="claude-haiku-4-5-20251001"  # Cheap model for formatting
    )

    return formatted

# Run the pipeline
raw = fetch_articles()  # Your scraper returns raw text
newsletter = run_content_pipeline(raw)
publish(newsletter)
04
Reflection / Self-Correction Agent
Generate, critique, revise β€” loop until quality passes
REFLECTION AGENT β€” GENERATE β†’ CRITIQUE β†’ REVISE LOOP
Task
Input
β†’
Draft Agent
✍️ Generator
produces draft
β†’
Critic Agent
πŸ” Evaluator
scores against criteria
β†’
PASS?
YES→
Final
Output
NO↓
feedback loops
back to Generator

The reflection pattern treats output quality as an iterative problem rather than a single-shot challenge. A generator agent produces a draft. A critic agent β€” which can be the same model with a different system prompt, or a completely separate model β€” evaluates that draft against defined criteria. If the draft fails the critique, feedback goes back to the generator for revision. This continues until the output passes or max iterations are reached.

The power here is in separation of concerns. Generation and evaluation are fundamentally different cognitive tasks. A model writing code focuses on making it work. A model evaluating code focuses on whether it handles edge cases, follows best practices, and has no security holes. Giving each task its own focused system prompt dramatically improves the quality of both. Studies consistently show that models catch their own errors much more reliably when asked to critique separately from when they're asked to write and verify simultaneously.

Reflection is particularly valuable for code generation, where you can actually run the output and feed runtime errors back as critic feedback. This creates an incredibly tight feedback loop: write code β†’ run it β†’ if tests fail, show errors to generator β†’ revise β†’ run again. Agents with this loop can solve coding problems that would stump single-shot attempts by iterating toward correctness.

βœ“ When to Use

  • Code generation β€” especially when you can actually run and test the output
  • Long-form writing where accuracy, tone, and completeness all need checking
  • Complex reasoning tasks where a second-pass review catches logical errors

βœ— When NOT to Use

  • Time-sensitive tasks β€” each iteration adds seconds of latency and API cost
  • Cases where "good enough on first try" is acceptable and revision ROI is low
  • When critique criteria are subjective or poorly defined β€” critic gives poor signal

Real Example

A code generation agent for writing data processing scripts. User asks: "Write a function to parse CSV files with mixed date formats." Generator writes the code. Critic agent checks: Does it handle timezone-naive vs timezone-aware datetimes? Does it handle empty cells? Does it handle quoted commas? Fails on empty cells β†’ feedback sent to generator β†’ revises β†’ critic re-checks β†’ passes β†’ output returned. The final code handles 12 edge cases the first draft missed.

Python β€” Generator + critic + revision loop
import anthropic

client = anthropic.Anthropic()

GENERATOR_PROMPT = """You write high-quality Python code. When given a task:
- Write clean, well-commented code
- Handle edge cases and errors
- Include type hints and docstrings"""

CRITIC_PROMPT = """You are a senior code reviewer. Evaluate this code strictly.
Return a JSON object:
{
  "passed": true/false,
  "score": 0-10,
  "issues": ["list of specific problems found"],
  "verdict": "brief explanation"
}
Be strict. Score below 8 = fail. No issues = pass."""

def generate(task: str, feedback: str = "") -> str:
    prompt = task
    if feedback:
        prompt += f"\n\nPrevious attempt failed critique:\n{feedback}\n\nFix all issues."
    response = client.messages.create(
        model="claude-opus-4-6", max_tokens=2048,
        system=GENERATOR_PROMPT,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

def critique(code: str, task: str) -> dict:
    import json
    response = client.messages.create(
        model="claude-opus-4-6", max_tokens=512,
        system=CRITIC_PROMPT,
        messages=[{"role": "user", "content": f"Task: {task}\n\nCode:\n{code}"}]
    )
    return json.loads(response.content[0].text)

def reflection_agent(task: str, max_iterations: int = 4) -> str:
    feedback = ""
    for i in range(max_iterations):
        draft = generate(task, feedback)
        review = critique(draft, task)
        print(f"Iteration {i+1}: score={review['score']}, passed={review['passed']}")
        if review["passed"]:
            print("  βœ“ Passed critique!")
            return draft
        feedback = "\n".join(review["issues"])
    return draft  # Return best attempt

result = reflection_agent("Write a Python function to parse CSV with mixed date formats")
05
Human-in-the-Loop
Agent autonomy with human checkpoints for high-stakes decisions
HUMAN-IN-THE-LOOP β€” AUTONOMY WITH APPROVAL GATES
Task
Start
β†’
πŸ€– Agent
works autonomously
β†’
High-Stakes
Action?
NO β†’ confident
β†’
Auto-Execute
& Continue
YES β†’ risky
β†’
πŸ‘€ Human
Review
β†’
Approve
or Reject

Human-in-the-loop agents operate autonomously for the vast majority of decisions but pause and request human approval when they encounter actions that are irreversible, expensive, or high-stakes. The agent presents its proposed action along with its reasoning and confidence, the human reviews, and then approves or rejects. On approval, the agent continues. On rejection, it can ask for clarification or take an alternative path.

The key design decision is defining what triggers a human checkpoint. Common triggers: actions affecting money above a threshold, sending communications to external parties, permanently deleting data, modifying production configurations, or anything the model assigns low confidence to. The agent itself can be the judge β€” if the model estimates it's less than 85% confident, it escalates. You can also hardcode specific tool calls as always-require-approval regardless of model confidence.

This pattern is the practical answer to "what if the agent does something wrong?" in regulated industries and enterprise contexts. A fully autonomous agent that can send emails, make purchases, or modify databases is a liability without this pattern. Human-in-the-loop lets you deploy agents in sensitive environments by ensuring a human signs off on irreversible actions, while still capturing most of the efficiency gains from automation.

βœ“ When to Use

  • High-stakes actions: sending emails to real people, spending money, deleting data
  • Regulated industries where automated decisions require audit trails
  • Early deployment phases before you trust the agent enough for full autonomy

βœ— When NOT to Use

  • Low-stakes bulk automation where human review would create a bottleneck
  • When humans are unavailable or response time makes the agent non-functional
  • Batch processing jobs that run overnight β€” nobody is there to approve

Real Example

A calendar management agent. It can autonomously: create new events, accept invites from whitelisted contacts, reschedule meetings by up to 1 hour. It always requires human approval to: cancel events with more than 2 attendees, decline external invites (irreversible impression), modify recurring events (affects multiple future dates), or send any email on your behalf. The approval request includes the proposed action, all affected parties, and a one-click approve/reject interface via Slack.

Python β€” Confidence threshold + human approval flow
import anthropic
import json

client = anthropic.Anthropic()

# Actions that ALWAYS require human approval
ALWAYS_APPROVE = {"send_email", "delete_event", "make_purchase"}
CONFIDENCE_THRESHOLD = 0.85

def assess_action(action: str, inputs: dict) -> dict:
    """Ask the model to assess risk and confidence"""
    response = client.messages.create(
        model="claude-opus-4-6", max_tokens=256,
        system="Assess this action. Return JSON: {confidence: 0-1, risk: low/medium/high, reason: str}",
        messages=[{"role": "user", "content": f"Action: {action}\nInputs: {json.dumps(inputs)}"}]
    )
    return json.loads(response.content[0].text)

def needs_approval(action: str, assessment: dict) -> bool:
    if action in ALWAYS_APPROVE:
        return True
    if assessment["risk"] == "high":
        return True
    if assessment["confidence"] < CONFIDENCE_THRESHOLD:
        return True
    return False

def request_human_approval(action: str, inputs: dict, assessment: dict) -> bool:
    """Present action to human and wait for input"""
    print(f"\n⚠️  APPROVAL REQUIRED")
    print(f"Action: {action}")
    print(f"Inputs: {json.dumps(inputs, indent=2)}")
    print(f"Risk: {assessment['risk']} | Confidence: {assessment['confidence']:.0%}")
    print(f"Reason: {assessment['reason']}")
    answer = input("Approve? (y/n): ").strip().lower()
    return answer == "y"

def execute_with_approval(action: str, inputs: dict) -> str:
    assessment = assess_action(action, inputs)
    if needs_approval(action, assessment):
        approved = request_human_approval(action, inputs, assessment)
        if not approved:
            return "Action rejected by human. Taking alternative approach."
    return run_action(action, inputs)  # Execute approved action
06
Multi-Agent Collaboration
Parallel specialist agents coordinated by an orchestrator
MULTI-AGENT β€” PARALLEL EXECUTION WITH ORCHESTRATION
Complex
Task
β†’
Orchestrator
🎯 Planner
decomposes task
β†’β†’
parallel
🏒 Competitor
Research Agent
πŸ’° Pricing
Analysis Agent
⭐ Review
Sentiment Agent
β†’
Aggregator
πŸ”— Synthesizer
combines results
β†’
Final
Report

Multi-agent collaboration breaks a complex task into independent subtasks, runs specialist agents on each in parallel, then aggregates the results. An orchestrator agent decomposes the work and assigns subtasks. Specialist agents run simultaneously, each focused on their narrow domain. A synthesis agent (or the orchestrator itself) combines all outputs into a coherent final result. The total time is roughly the time of the slowest agent, not the sum of all agents β€” this is the core latency advantage.

The coordination overhead is real and should not be underestimated. Each agent needs a clear task description, the right tools, and enough context to work independently. Agents cannot share information in real-time the way human teams can β€” they work on their isolated subtask and deliver a result. This means the orchestrator must define the boundaries between subtasks precisely enough that no two agents need the same piece of information mid-task.

This pattern scales to very large research and analysis tasks. A task that would take 20 minutes sequentially might complete in 5 minutes with 5 parallel agents. But debugging is harder β€” when the final synthesis is wrong, you need to trace back which specialist produced the bad input. Always log each agent's raw output before aggregation. Without this, diagnosing quality issues in production multi-agent systems is extremely difficult.

βœ“ When to Use

  • Complex research or analysis where subtasks are genuinely independent
  • When latency matters and you can afford parallel API calls
  • Large tasks that exceed a single context window

βœ— When NOT to Use

  • Simple tasks β€” orchestration overhead exceeds any parallel benefit
  • When subtasks depend on each other's outputs (use pipeline instead)
  • Cost-sensitive applications β€” parallel agents multiply API costs

Real Example

A market intelligence report for a startup. Orchestrator decomposes into 3 parallel workstreams: Competitor Agent (searches 5 competitor websites, extracts features and pricing); Pricing Agent (analyzes pricing pages, builds a comparison matrix, identifies pricing gaps); Review Agent (reads G2 and Trustpilot reviews, runs sentiment analysis, extracts recurring complaints). All 3 run simultaneously in 90 seconds total. Synthesizer agent combines into a 5-page markdown report with strategic recommendations. Sequential would take 4-5 minutes.

Python β€” asyncio parallel agent execution with aggregation
import asyncio
import anthropic

client = anthropic.Anthropic()

async def run_agent(name: str, task: str, system: str) -> dict:
    """Run a single specialist agent asynchronously"""
    loop = asyncio.get_event_loop()
    response = await loop.run_in_executor(None, lambda: client.messages.create(
        model="claude-opus-4-6",
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": task}]
    ))
    print(f"  βœ“ {name} completed")
    return {"agent": name, "output": response.content[0].text}

async def run_parallel_agents(company: str) -> str:
    print(f"Launching parallel agents for: {company}")

    # Define specialist agents
    agents = [
        ("competitor-researcher",
         f"Research competitors of {company}. List top 5 with features.",
         "You are a competitive intelligence analyst."),
        ("pricing-analyst",
         f"Analyze pricing strategies in the {company} market.",
         "You are a pricing strategy consultant."),
        ("sentiment-analyst",
         f"Summarize customer sentiment about {company}'s competitors.",
         "You are a customer experience researcher."),
    ]

    # Run all agents in parallel
    tasks = [run_agent(name, task, system) for name, task, system in agents]
    results = await asyncio.gather(*tasks)

    # Aggregate: combine all outputs and synthesize
    combined = "\n\n".join([f"## {r['agent']}\n{r['output']}" for r in results])
    synthesis = client.messages.create(
        model="claude-opus-4-6", max_tokens=2048,
        system="Synthesize these research findings into a strategic report with actionable recommendations.",
        messages=[{"role": "user", "content": combined}]
    )
    return synthesis.content[0].text

# Run the multi-agent system
report = asyncio.run(run_parallel_agents("Notion"))
print(report)

Pattern Comparison

Quick reference: choose the right pattern for your use case at a glance.

Pattern Complexity Latency Cost Reliability Best For
Tool-Using Agent Low Medium Low High Data fetching, single-domain tasks
Router Agent Medium Low Low–Med High General-purpose assistants, cost optimization
Pipeline (Sequential) Medium High Medium High Document processing, content workflows
Reflection / Self-Correction Medium High High Very High Code generation, high-quality writing
Human-in-the-Loop Medium Variable Medium Very High High-stakes actions, regulated industries
Multi-Agent Collaboration High Low (parallel) High Medium Complex research, large analysis tasks

Note: These patterns are not mutually exclusive. Production systems commonly combine multiple patterns β€” for example, a router that delegates to pipeline agents, each with reflection loops for quality control.

Copied!
Kit Journey
0%