The patterns that separate production agents from toy demos. Know when to use each β and when not to.
The tool-using agent is the foundation of all useful AI agents. The model receives a task from the user, reasons about what information or actions it needs, decides which tools to invoke, gets results back, and either calls more tools or returns a final answer. This single loop β reason, act, observe β is what makes agents genuinely powerful.
Tools can be anything you can express as a function: web search, database queries, calculators, file readers, email senders, REST API calls, code interpreters. The model never directly executes code β it requests tool calls by name with structured arguments, your code runs them, and returns results as text. The model then continues reasoning with those results in context.
This pattern works because language models are strong reasoners but weak executors. They cannot browse the internet, run code, or check today's date on their own β but they can plan and reason exceptionally well. Tools supply the execution muscle; the model supplies the intelligence to orchestrate them.
A research assistant agent. User asks: "What are the latest papers on transformer attention mechanisms?" The agent calls search_web("transformer attention mechanisms 2024 arxiv"), reads the top 3 result URLs, extracts abstracts, then writes a formatted summary with citations. Without tools, the model could only recall training data β with tools, it accesses today's research.
import anthropic client = anthropic.Anthropic() # Define tools the agent can call tools = [ { "name": "search_web", "description": "Search the web for current information", "input_schema": { "type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"] } } ] messages = [{"role": "user", "content": "What's the current price of Bitcoin?"}] # Agent loop β runs until model returns end_turn while True: response = client.messages.create( model="claude-opus-4-6", max_tokens=1024, tools=tools, messages=messages ) if response.stop_reason == "end_turn": # Model is done β extract and return final text for block in response.content: if hasattr(block, 'text'): print(block.text) break if response.stop_reason == "tool_use": messages.append({"role": "assistant", "content": response.content}) tool_results = [] for block in response.content: if block.type == "tool_use": # Execute the tool and capture result result = execute_tool(block.name, block.input) tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": result }) # Return results to model and loop again messages.append({"role": "user", "content": tool_results})
A router agent reads an incoming request and delegates it to the most appropriate specialist agent. The router itself is deliberately lightweight β its only job is to classify intent accurately and quickly. It doesn't try to solve the problem itself. This separation is important: a small, fast model can be an excellent router even if it's terrible at domain-specific tasks.
Specialist agents are optimized for their narrow domains. A billing specialist has billing data access, billing-specific instructions, and billing-focused tool access. A tech support specialist has product documentation, debug tools, and a system prompt full of troubleshooting patterns. Neither bleeds into the other. This isolation makes specialists much more reliable than a single "do everything" agent.
The business case for routing is also about cost. You can route simple, well-defined queries to a smaller, cheaper model (like claude-haiku-4-5) and only escalate complex or ambiguous requests to a more powerful model. This can reduce inference costs by 80% or more on high-volume applications without any user-visible degradation in quality.
A customer support system for a SaaS product. The router reads each message and routes: "my invoice is wrong" β Billing Agent (has Stripe access, refund tools); "the API keeps timing out" β Tech Agent (has docs, log access, escalation tools); "I want to cancel" β Retention Agent (has discount tools, cancellation flow). Each specialist runs with a system prompt and toolset tailored to exactly that problem domain.
import anthropic import json client = anthropic.Anthropic() ROUTER_PROMPT = """Classify this customer message into exactly one category. Return a JSON object with a single key "category" and one of these values: - "billing" β payment, invoice, refund, subscription, pricing - "technical" β bugs, errors, API, performance, features - "retention" β cancel, too expensive, switching, leaving - "general" β anything else Respond with JSON only. No explanation.""" def route_request(message: str) -> str: """Use a fast model to classify intent""" response = client.messages.create( model="claude-haiku-4-5-20251001", # Cheap router max_tokens=64, system=ROUTER_PROMPT, messages=[{"role": "user", "content": message}] ) result = json.loads(response.content[0].text) return result["category"] def run_specialist(category: str, message: str) -> str: """Delegate to the right specialist with its system prompt""" specialists = { "billing": { "model": "claude-haiku-4-5-20251001", "system": "You are a billing specialist. You have access to Stripe...", }, "technical": { "model": "claude-opus-4-6", # Complex β use big model "system": "You are a technical support engineer...", }, "retention": { "model": "claude-opus-4-6", "system": "You are a customer success specialist. Your goal...", }, } spec = specialists.get(category, specialists["technical"]) response = client.messages.create( model=spec["model"], max_tokens=1024, system=spec["system"], messages=[{"role": "user", "content": message}] ) return response.content[0].text # Usage user_message = "I want to cancel my subscription, it's too expensive" category = route_request(user_message) # β "retention" reply = run_specialist(category, user_message) # β retention specialist runs
In a pipeline agent, each stage is a specialized agent that does exactly one job β and does it well. Stage 1's output becomes Stage 2's input. Stage 2's output becomes Stage 3's input, and so on. This linear dataflow creates a transformation chain: raw input enters one end, refined output emerges from the other. Each stage has its own system prompt, toolset, and model selection optimized for its specific task.
The key benefit of pipelines is fault isolation. If Stage 3 fails, you don't need to rerun Stages 1 and 2 β you just rerun Stage 3 with the checkpoint output from Stage 2. This makes pipelines practical for expensive, multi-step workflows where intermediate results are worth preserving. You can also run expensive stages (like using a big model to fact-check) only when needed, and use cheap stages everywhere else.
Pipelines are also highly debuggable. You can inspect the output of each stage independently to find where quality degraded. This is a major advantage over monolithic agents where failures are hard to localize. The tradeoff is latency β stages run sequentially, so total latency is the sum of all stage latencies. This makes pipelines unsuitable for real-time applications but excellent for batch jobs.
A content publishing pipeline for a newsletter. Stage 1 (Scraper Agent): fetches 20 article URLs from an RSS feed. Stage 2 (Summarizer Agent): condenses each article to 3 sentences. Stage 3 (Curator Agent): selects the 5 most relevant, scores them by topic fit. Stage 4 (Writer Agent): writes an editorial intro connecting the articles. Stage 5 (Formatter Agent): assembles HTML email. Each stage is independently retryable and debuggable.
import anthropic client = anthropic.Anthropic() def run_stage(stage_name: str, system: str, content: str, model: str = "claude-opus-4-6") -> str: """Run a single pipeline stage and return its output""" print(f"[Pipeline] Running stage: {stage_name}") response = client.messages.create( model=model, max_tokens=2048, system=system, messages=[{"role": "user", "content": content}] ) output = response.content[0].text print(f" β {len(output)} chars output") return output def run_content_pipeline(raw_articles: str) -> str: """Run the full content pipeline sequentially""" # Stage 1: Summarize raw content summaries = run_stage( "summarizer", system="Extract the key points from each article as bullet points. Be concise.", content=raw_articles, model="claude-haiku-4-5-20251001" # Cheap model for summarization ) # Stage 2: Fact-check (only run on summaries, not full articles) verified = run_stage( "fact-checker", system="Review these summaries. Flag any claims that seem unverified or exaggerated.", content=summaries, model="claude-opus-4-6" # Use powerful model for reasoning ) # Stage 3: Format into final output formatted = run_stage( "formatter", system="Format the verified summaries into a professional markdown newsletter.", content=verified, model="claude-haiku-4-5-20251001" # Cheap model for formatting ) return formatted # Run the pipeline raw = fetch_articles() # Your scraper returns raw text newsletter = run_content_pipeline(raw) publish(newsletter)
The reflection pattern treats output quality as an iterative problem rather than a single-shot challenge. A generator agent produces a draft. A critic agent β which can be the same model with a different system prompt, or a completely separate model β evaluates that draft against defined criteria. If the draft fails the critique, feedback goes back to the generator for revision. This continues until the output passes or max iterations are reached.
The power here is in separation of concerns. Generation and evaluation are fundamentally different cognitive tasks. A model writing code focuses on making it work. A model evaluating code focuses on whether it handles edge cases, follows best practices, and has no security holes. Giving each task its own focused system prompt dramatically improves the quality of both. Studies consistently show that models catch their own errors much more reliably when asked to critique separately from when they're asked to write and verify simultaneously.
Reflection is particularly valuable for code generation, where you can actually run the output and feed runtime errors back as critic feedback. This creates an incredibly tight feedback loop: write code β run it β if tests fail, show errors to generator β revise β run again. Agents with this loop can solve coding problems that would stump single-shot attempts by iterating toward correctness.
A code generation agent for writing data processing scripts. User asks: "Write a function to parse CSV files with mixed date formats." Generator writes the code. Critic agent checks: Does it handle timezone-naive vs timezone-aware datetimes? Does it handle empty cells? Does it handle quoted commas? Fails on empty cells β feedback sent to generator β revises β critic re-checks β passes β output returned. The final code handles 12 edge cases the first draft missed.
import anthropic client = anthropic.Anthropic() GENERATOR_PROMPT = """You write high-quality Python code. When given a task: - Write clean, well-commented code - Handle edge cases and errors - Include type hints and docstrings""" CRITIC_PROMPT = """You are a senior code reviewer. Evaluate this code strictly. Return a JSON object: { "passed": true/false, "score": 0-10, "issues": ["list of specific problems found"], "verdict": "brief explanation" } Be strict. Score below 8 = fail. No issues = pass.""" def generate(task: str, feedback: str = "") -> str: prompt = task if feedback: prompt += f"\n\nPrevious attempt failed critique:\n{feedback}\n\nFix all issues." response = client.messages.create( model="claude-opus-4-6", max_tokens=2048, system=GENERATOR_PROMPT, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text def critique(code: str, task: str) -> dict: import json response = client.messages.create( model="claude-opus-4-6", max_tokens=512, system=CRITIC_PROMPT, messages=[{"role": "user", "content": f"Task: {task}\n\nCode:\n{code}"}] ) return json.loads(response.content[0].text) def reflection_agent(task: str, max_iterations: int = 4) -> str: feedback = "" for i in range(max_iterations): draft = generate(task, feedback) review = critique(draft, task) print(f"Iteration {i+1}: score={review['score']}, passed={review['passed']}") if review["passed"]: print(" β Passed critique!") return draft feedback = "\n".join(review["issues"]) return draft # Return best attempt result = reflection_agent("Write a Python function to parse CSV with mixed date formats")
Human-in-the-loop agents operate autonomously for the vast majority of decisions but pause and request human approval when they encounter actions that are irreversible, expensive, or high-stakes. The agent presents its proposed action along with its reasoning and confidence, the human reviews, and then approves or rejects. On approval, the agent continues. On rejection, it can ask for clarification or take an alternative path.
The key design decision is defining what triggers a human checkpoint. Common triggers: actions affecting money above a threshold, sending communications to external parties, permanently deleting data, modifying production configurations, or anything the model assigns low confidence to. The agent itself can be the judge β if the model estimates it's less than 85% confident, it escalates. You can also hardcode specific tool calls as always-require-approval regardless of model confidence.
This pattern is the practical answer to "what if the agent does something wrong?" in regulated industries and enterprise contexts. A fully autonomous agent that can send emails, make purchases, or modify databases is a liability without this pattern. Human-in-the-loop lets you deploy agents in sensitive environments by ensuring a human signs off on irreversible actions, while still capturing most of the efficiency gains from automation.
A calendar management agent. It can autonomously: create new events, accept invites from whitelisted contacts, reschedule meetings by up to 1 hour. It always requires human approval to: cancel events with more than 2 attendees, decline external invites (irreversible impression), modify recurring events (affects multiple future dates), or send any email on your behalf. The approval request includes the proposed action, all affected parties, and a one-click approve/reject interface via Slack.
import anthropic import json client = anthropic.Anthropic() # Actions that ALWAYS require human approval ALWAYS_APPROVE = {"send_email", "delete_event", "make_purchase"} CONFIDENCE_THRESHOLD = 0.85 def assess_action(action: str, inputs: dict) -> dict: """Ask the model to assess risk and confidence""" response = client.messages.create( model="claude-opus-4-6", max_tokens=256, system="Assess this action. Return JSON: {confidence: 0-1, risk: low/medium/high, reason: str}", messages=[{"role": "user", "content": f"Action: {action}\nInputs: {json.dumps(inputs)}"}] ) return json.loads(response.content[0].text) def needs_approval(action: str, assessment: dict) -> bool: if action in ALWAYS_APPROVE: return True if assessment["risk"] == "high": return True if assessment["confidence"] < CONFIDENCE_THRESHOLD: return True return False def request_human_approval(action: str, inputs: dict, assessment: dict) -> bool: """Present action to human and wait for input""" print(f"\nβ οΈ APPROVAL REQUIRED") print(f"Action: {action}") print(f"Inputs: {json.dumps(inputs, indent=2)}") print(f"Risk: {assessment['risk']} | Confidence: {assessment['confidence']:.0%}") print(f"Reason: {assessment['reason']}") answer = input("Approve? (y/n): ").strip().lower() return answer == "y" def execute_with_approval(action: str, inputs: dict) -> str: assessment = assess_action(action, inputs) if needs_approval(action, assessment): approved = request_human_approval(action, inputs, assessment) if not approved: return "Action rejected by human. Taking alternative approach." return run_action(action, inputs) # Execute approved action
Multi-agent collaboration breaks a complex task into independent subtasks, runs specialist agents on each in parallel, then aggregates the results. An orchestrator agent decomposes the work and assigns subtasks. Specialist agents run simultaneously, each focused on their narrow domain. A synthesis agent (or the orchestrator itself) combines all outputs into a coherent final result. The total time is roughly the time of the slowest agent, not the sum of all agents β this is the core latency advantage.
The coordination overhead is real and should not be underestimated. Each agent needs a clear task description, the right tools, and enough context to work independently. Agents cannot share information in real-time the way human teams can β they work on their isolated subtask and deliver a result. This means the orchestrator must define the boundaries between subtasks precisely enough that no two agents need the same piece of information mid-task.
This pattern scales to very large research and analysis tasks. A task that would take 20 minutes sequentially might complete in 5 minutes with 5 parallel agents. But debugging is harder β when the final synthesis is wrong, you need to trace back which specialist produced the bad input. Always log each agent's raw output before aggregation. Without this, diagnosing quality issues in production multi-agent systems is extremely difficult.
A market intelligence report for a startup. Orchestrator decomposes into 3 parallel workstreams: Competitor Agent (searches 5 competitor websites, extracts features and pricing); Pricing Agent (analyzes pricing pages, builds a comparison matrix, identifies pricing gaps); Review Agent (reads G2 and Trustpilot reviews, runs sentiment analysis, extracts recurring complaints). All 3 run simultaneously in 90 seconds total. Synthesizer agent combines into a 5-page markdown report with strategic recommendations. Sequential would take 4-5 minutes.
import asyncio import anthropic client = anthropic.Anthropic() async def run_agent(name: str, task: str, system: str) -> dict: """Run a single specialist agent asynchronously""" loop = asyncio.get_event_loop() response = await loop.run_in_executor(None, lambda: client.messages.create( model="claude-opus-4-6", max_tokens=2048, system=system, messages=[{"role": "user", "content": task}] )) print(f" β {name} completed") return {"agent": name, "output": response.content[0].text} async def run_parallel_agents(company: str) -> str: print(f"Launching parallel agents for: {company}") # Define specialist agents agents = [ ("competitor-researcher", f"Research competitors of {company}. List top 5 with features.", "You are a competitive intelligence analyst."), ("pricing-analyst", f"Analyze pricing strategies in the {company} market.", "You are a pricing strategy consultant."), ("sentiment-analyst", f"Summarize customer sentiment about {company}'s competitors.", "You are a customer experience researcher."), ] # Run all agents in parallel tasks = [run_agent(name, task, system) for name, task, system in agents] results = await asyncio.gather(*tasks) # Aggregate: combine all outputs and synthesize combined = "\n\n".join([f"## {r['agent']}\n{r['output']}" for r in results]) synthesis = client.messages.create( model="claude-opus-4-6", max_tokens=2048, system="Synthesize these research findings into a strategic report with actionable recommendations.", messages=[{"role": "user", "content": combined}] ) return synthesis.content[0].text # Run the multi-agent system report = asyncio.run(run_parallel_agents("Notion")) print(report)
Quick reference: choose the right pattern for your use case at a glance.
| Pattern | Complexity | Latency | Cost | Reliability | Best For |
|---|---|---|---|---|---|
| Tool-Using Agent | Low | Medium | Low | High | Data fetching, single-domain tasks |
| Router Agent | Medium | Low | LowβMed | High | General-purpose assistants, cost optimization |
| Pipeline (Sequential) | Medium | High | Medium | High | Document processing, content workflows |
| Reflection / Self-Correction | Medium | High | High | Very High | Code generation, high-quality writing |
| Human-in-the-Loop | Medium | Variable | Medium | Very High | High-stakes actions, regulated industries |
| Multi-Agent Collaboration | High | Low (parallel) | High | Medium | Complex research, large analysis tasks |
Note: These patterns are not mutually exclusive. Production systems commonly combine multiple patterns β for example, a router that delegates to pipeline agents, each with reflection loops for quality control.