Section 01

What Is an AI Agent?
(Not the Hype Version)

Every week there's a new headline about "autonomous AI agents" doing something incredible — or something terrifying. The word "agent" has been stretched to mean everything from a glorified chatbot to a fully self-directed digital entity. Here's the actual definition, stripped of marketing:

An AI agent is a system that perceives its environment, decides what to do, takes action, and observes the results — and repeats this loop until the task is done.

That's it. The loop is what makes it an agent. Without the loop — without the ability to take action and then observe what happened — you just have a text generator.

⚠️
Common Misconception Calling something an "AI agent" does not make it one. Wrapping GPT-4 in a web app and adding a system prompt is NOT an agent. It's a chatbot. The difference is in the architecture, not the marketing.

What an agent is NOT:

  • Just a chatbot with a long system prompt
  • A thin wrapper around an LLM API call
  • "Autonomous" in any science fiction sense — it operates within the tools and permissions you give it
  • Something that works without human-designed constraints and guardrails

The key difference — tools:

The most important thing that separates an agent from a chatbot is the ability to use tools to affect the world and retrieve information. A chatbot receives text and returns text. An agent can:

  • Search the web and read the results
  • Read and write files to disk
  • Call external APIs (Slack, GitHub, Stripe, your database)
  • Execute code and observe the output
  • Send emails, create calendar events, post messages
  • Query a database and incorporate results into its reasoning

And critically — after using a tool, it observes the result and decides what to do next. That feedback loop is the engine of agent behavior.

📥
Input
Task / message
🧠
LLM
Decides next step
🔧
Tool / Response
Action taken
👁️
Observation
Tool result
↻ Loop until task complete or stop_reason == "end_turn"
💡
Mental Model Think of the LLM as the brain and the tools as the hands. The brain decides what to do. The hands do it. The eyes (observation) report back what happened. Loop until done.

Section 02

Chatbot vs Agent vs
Simple API Call

One of the most expensive mistakes developers make is using the wrong tool for the job. Not everything needs an agent. Agents are slower and more expensive than simple LLM calls. Knowing when to reach for each approach will save you time, money, and debugging pain.

Dimension Simple API Call Chatbot Agent
Goal Single transformation Conversational exchange Complete a multi-step task
LLM Calls 1 1 per turn N (until done)
Tools None None (usually) Web, files, APIs, DB, etc.
State Stateless Per-conversation Persistent across steps
Cost Lowest Low Higher (variable)
Speed Fast (<2s) Fast (<3s) Slower (seconds to minutes)
Reliability Very high High Requires careful design
Use When Summarize, translate, extract, classify Q&A, customer support, copilot Research, automation, complex workflows
Rule of Thumb — When to Use Each
  • If a single prompt + completion solves your problem, just use the API. No loops, no tools, no overhead.
  • If you need back-and-forth conversation but no external data access, a chatbot (with a good system prompt) is the right call.
  • If the task takes more than one LLM call to complete, you need an agent.
  • If you need to fetch or write data from an external source during the task, you need an agent with tools.
  • If the task is unpredictable in length — you don't know upfront how many steps it needs — you need an agent.
💡
Start Simple, Escalate Always start with the simplest approach. Can one well-crafted prompt handle it? Use that. Can a fixed two-step pipeline handle it? Use that. Only reach for a full agent loop when the task genuinely requires dynamic decision-making about what to do next.

Real examples:

Simple API Call: "Summarize this support ticket in 2 sentences." — One call, done. No tools needed.
Chatbot: "Help me draft a response to this email." — Conversational, iterative, but no external data needed.
Agent: "Research the top 5 competitors to [startup], pull their pricing pages, and write a comparison report." — Multi-step, needs web access, unpredictable number of searches.

Section 03

Anatomy of an Agent

Every agent — regardless of framework, provider, or complexity — is built from four core components. Understand these and you can reason about any agent architecture you encounter or design.

01
👁️
Perception
What the agent can "see" at any given moment. This includes the entire conversation history, the results of previous tool calls, and any context you've injected.
messages[], tool_results, injected_context
02
🧠
Reasoning
The LLM call that decides what to do next. Given everything in the context window, the model decides: call a tool, ask for clarification, or produce the final answer.
client.messages.create(model, tools, messages)
03
🔧
Action
The tool call or final response. When the LLM decides to use a tool, your code routes that to the actual implementation — Python function, API call, database query.
execute_tool(name, inputs) → result
04
💾
Memory
How the agent remembers across steps and conversations. Three types: in-context (the messages array), external (a database), and episodic (compressed summaries of past sessions).
messages[] / vector_db / summary_store

Memory types — which one to use:

In-Context Memory (the messages array)

Everything the agent has seen and done in the current session, stored directly in the conversation. Fast to access, but limited by context window size. Best for single-session tasks up to ~50 steps. Once you hit the context limit, you need to summarize or externalize.

External Memory (vector DB / database)

Store facts, documents, and previous results outside the context window. Use semantic search to retrieve only what's relevant for the current step. Best for long-running agents, knowledge bases, and agents that need to remember across sessions.

Episodic Memory (summarized history)

Compress previous conversations into structured summaries and inject them at the start of new sessions. Best for personal assistants and support agents that need to "remember" users across multiple separate conversations without blowing up the context window.

⚠️
Context Window Limit Is Real Every token in your messages array costs money on every LLM call. An agent running 20 iterations with long tool results can easily hit 100K+ tokens. Design your memory strategy before you start building, not after you get your first $200 bill.

Section 04

The Agent Loop in Python

Enough theory. Here's the core agent loop implemented with the Anthropic Python SDK. This is production-ready code you can copy, extend, and deploy. Study every line — this pattern underpins every agent in this kit.

💡
Prerequisites Install the SDK with pip install anthropic and set your API key as the environment variable ANTHROPIC_API_KEY.
Python — Core Agent Loop
import anthropic
import json

client = anthropic.Anthropic()

# Define tools the agent can use
tools = [
    {
        "name": "search_web",
        "description": "Search the web for current information on a topic",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The search query"}
            },
            "required": ["query"]
        }
    },
    {
        "name": "write_file",
        "description": "Write content to a file",
        "input_schema": {
            "type": "object",
            "properties": {
                "filename": {"type": "string"},
                "content": {"type": "string"}
            },
            "required": ["filename", "content"]
        }
    }
]

def execute_tool(name, inputs):
    """Route tool calls to actual implementations"""
    if name == "search_web":
        return search_the_web(inputs["query"])  # your impl
    elif name == "write_file":
        return write_to_file(inputs["filename"], inputs["content"])
    return "Tool not found"

def run_agent(task: str, max_iterations: int = 10) -> str:
    """Core agent loop"""
    messages = [{"role": "user", "content": task}]

    for iteration in range(max_iterations):
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=4096,
            system="You are a helpful research agent. Use tools to complete tasks thoroughly.",
            tools=tools,
            messages=messages
        )

        # Agent decided it's done
        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, 'text'):
                    return block.text
            return "Task complete"

        # Agent wants to use tools
        if response.stop_reason == "tool_use":
            # Add assistant response to history
            messages.append({"role": "assistant", "content": response.content})

            # Execute each tool call
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    print(f"  → Calling tool: {block.name}({block.input})")
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": str(result)
                    })

            # Feed results back to agent
            messages.append({"role": "user", "content": tool_results})

    return "Max iterations reached — check your task complexity"

# Run it
result = run_agent("Research the top 5 open-source LLMs in 2024 and write a summary to research.md")
print(result)

Line-by-line breakdown — what actually matters:

messages = [{"role": "user", "content": task}]

The messages array is the agent's entire working memory. Every tool call and every result gets appended here. The agent's context window IS the messages array.

response.stop_reason == "end_turn"

This is the agent saying "I'm done." It has decided not to use any more tools and is returning its final answer. Always check this first.

response.stop_reason == "tool_use"

The agent has decided to call a tool. You must execute the tool, capture the result, and feed it back as a user message. If you don't do this, the API will return an error — it's waiting for the tool result.

max_iterations = 10

Your safety net against infinite loops. Always set this. 10 is a good default for most tasks. Complex research agents may need 20-30. If an agent needs more than 50 iterations, you have an architecture problem.

tool_use_id: block.id

Critical: When returning tool results, you must include the tool_use_id that matches the original tool call. The API uses this to correlate results with calls. Get this wrong and your agent loop will break.

🚨
Add Error Handling to execute_tool() In production, wrap every tool call in a try/except. If a web search fails or an API is down, return a descriptive error string as the tool result. The agent can then decide to retry, use a fallback, or tell the user it couldn't complete the task. Never let an unhandled exception kill the agent loop.

Production error handling pattern:

Python — Robust Tool Execution
def execute_tool(name: str, inputs: dict) -> str:
    """Execute a tool call with error handling."""
    try:
        if name == "search_web":
            result = search_the_web(inputs["query"])
            return str(result) if result else "No results found for this query"

        elif name == "write_file":
            with open(inputs["filename"], "w") as f:
                f.write(inputs["content"])
            return f"File '{inputs['filename']}' written successfully ({len(inputs['content'])} chars)"

        else:
            return f"Unknown tool: {name}. Available tools: search_web, write_file"

    except FileNotFoundError as e:
        return f"File error: {str(e)}. Check that the directory exists."
    except ConnectionError as e:
        return f"Network error calling {name}: {str(e)}. Try again or use cached data."
    except Exception as e:
        return f"Tool {name} failed: {str(e)}"

Section 05

Common Agent Architectures

Once you understand the basic agent loop, you can compose multiple agents into more powerful systems. Here are the five patterns you'll use most often, when to use each, and what each looks like structurally.

1. Single Agent
One LLM with access to multiple tools. It decides which tools to call, in which order, until the task is done. This covers 80% of agent use cases.
Use when: The task is complex but doesn't require parallel work. Start here.
User Task
Agent
Tool A
Tool B
Final Answer
2. Pipeline (Sequential)
Agent A's output becomes Agent B's input, then Agent C. Each agent is a specialist at exactly one job. Predictable, easy to debug.
Use when: Document processing, report generation, multi-stage transformation.
Input
Researcher
Writer
Output
3. Router + Specialists
A router agent reads the request and delegates to the right specialist. Each specialist handles one domain: research, code, writing, data.
Use when: General-purpose assistants that need to handle diverse request types efficiently.
Request
Router
Research
Code
Writing
4. Multi-Agent (Parallel)
Multiple agents work simultaneously on different subtasks. An orchestrator kicks them off, waits, and aggregates results. Faster for large research tasks.
Use when: Tasks that can be parallelized — researching multiple topics, analyzing multiple files.
Orchestrator
Worker 1
Worker 2
Worker 3
↓ aggregate
Result
5. Human-in-the-Loop
Agent works autonomously until it hits a decision point — low confidence, destructive action, ambiguous requirement — then pauses and asks a human before continuing.
Use when: High-stakes actions, financial decisions, anything irreversible.
Agent
Works...
↓ uncertain?
Human Review
Continue
💡
Start With Single Agent Don't pre-optimize into a multi-agent system. Build a single agent first, see where it breaks or gets slow, then decide if splitting into multiple agents solves those specific problems. Multi-agent systems are significantly harder to debug.

Section 06

The 5 Most Common
Agent Mistakes

These are the failure modes that trip up developers at every level — from first-time agent builders to teams deploying at scale. Knowing them in advance will save you hours of debugging.

Mistake 01
Over-Engineering with Multi-Agent When a Single Agent Would Work

Developers see multi-agent systems in blog posts and assume that's the professional approach. It's not — it's a complexity multiplier. A well-designed single agent with good tools can handle most tasks. Multi-agent adds orchestration overhead, harder debugging, more failure points, and higher latency.

Fix Default to a single agent. Only split into multiple agents when you've identified a concrete bottleneck — parallel execution need, context window overflow, or specialist accuracy requirements — not because it "feels more scalable."
Mistake 02
Infinite Loops — Agent Calls Tools Without Converging

The agent keeps searching, re-reading, or re-processing without making progress toward the goal. This happens when: the task is too vague, the tools return low-quality results, or the system prompt doesn't guide toward a stopping condition.

Fix Always set max_iterations. Write system prompts that say explicitly when to stop ("If you've searched 3 times without finding what you need, summarize what you found and state clearly what's missing"). Add loop detection — if the last 3 tool calls are identical, break out.
Mistake 03
Stuffing Too Much into Context — The Whole Codebase Every Iteration

Injecting 50K tokens of context on every LLM call. This inflates cost quadratically (you pay for input tokens on every iteration), increases latency, and often hurts quality because the model is overwhelmed with irrelevant information.

Fix Be surgical with context. Use retrieval to fetch only the relevant 2-3 documents or code files for the current step. Summarize tool results before adding them to messages. Use Claude's prompt caching for context that doesn't change across iterations (system prompts, static docs).
Mistake 04
No Error Handling on Tool Calls — One Failed API Kills the Task

A web search returns a 503. A database query times out. A file path doesn't exist. Without error handling, the exception propagates up and the entire agent run fails — even if it was 90% done.

Fix Wrap every tool execution in try/except. Return descriptive error strings as tool results — not exceptions. The agent can then reason about the error and decide to retry, use a fallback tool, or report the issue gracefully. Never let tool errors surface as Python exceptions.
Mistake 05
Trusting Agent Output Blindly Before Taking Real-World Actions

The agent writes a database query and you run it directly. The agent drafts an email and you send it immediately. The agent decides to delete files and your code deletes them. LLMs make mistakes, misread instructions, and can be manipulated through prompt injection in tool results.

Fix Validate agent outputs before any consequential action. For write/delete/send operations, add a confirmation step or a human-in-the-loop checkpoint. For structured outputs (SQL, JSON, code), validate the schema and syntax before execution. Log all tool calls — if something goes wrong, you need the audit trail.
The Meta-Lesson All five mistakes share a common root: optimism about what the agent will do. Build agents assuming they will fail, loop, get confused, and produce invalid output. Design the failure cases first, then the happy path.

Section 07

Where to Go Next

You now have the mental model. You understand what agents are, when to use them, how the loop works, and what the common failure modes look like. The rest of the kit builds directly on this foundation.

💡
Recommended Path Read the Design Patterns guide next to understand the 6 core patterns. Then jump to First Agent to build something real. After that, the order doesn't matter — follow what's most relevant to what you're building.
Kit Journey
0%