8.2 — Core Components of an AI Agent

Learning Objectives

Describe the five core components that every AI agent relies on
Distinguish between in-context memory, RAG, and persistent storage — and when each is appropriate
Explain how tool use works at the API level and why it's central to agentic capability

The Five Building Blocks

Every AI agent — from a simple customer service bot to a sophisticated coding agent that can autonomously implement features — is assembled from the same five building blocks. Understanding each one gives you a mental model for evaluating any agent you encounter, and for thinking clearly about what will and won't work when building your own.

1. Perception Module

The perception module is how the agent receives information about its environment. Modern agents can perceive across multiple modalities:

Text: Natural language instructions, documents, code, web pages, emails
Images: Screenshots, photos, diagrams, charts
Structured data: JSON responses from APIs, CSV files, database query results
Sensor feeds: For robotics and IoT applications — temperature, position, camera feeds
File systems: Directory listings, file contents, code repositories

The context window is the practical limit of perception. Everything the model can reason about must fit within its context window — currently up to 1 million tokens for both Claude Opus 4.7 and Gemini 3 Flash. For a coding agent, this means: at any given moment, it can hold the task description, relevant file contents, error messages, test output, and conversation history — up to the window limit.

Think of the context window as working memory: what the agent can actively think about right now.

2. Memory Systems

Agents use three distinct types of memory, and choosing between them is one of the key design decisions when building an agent:

In-Context Memory

Everything currently in the context window. Fast, instantly accessible, expensive per-token, and bounded by the window size. As a session grows longer, earlier information must be summarized or dropped.

Best for: Information needed for the current task — recent tool results, the current codebase files, the active conversation.

Retrieval-Augmented Generation (RAG)

A vector database stores embeddings (numerical representations of meaning) for thousands or millions of documents. When the agent needs information, it queries the vector DB for semantically similar content and adds the retrieved chunks to its context.

Best for: Large knowledge bases that can't fit in context — documentation libraries, internal wikis, historical conversation archives, product catalogs.

Example: a customer service agent that retrieves the 5 most relevant knowledge base articles for each incoming question, adds them to context, and answers based on retrieved content.

Persistent External Storage

The agent writes information to a database, file, or structured store — and can retrieve it in future sessions. Unlike in-context or RAG memory, this persists indefinitely across sessions.

Best for: User preferences, completed work, structured data that needs querying, state that must survive agent restarts.

💡Key Concept

A concrete memory example: A research agent studying a company might: retrieve past research notes from a vector DB (RAG) to avoid duplicating work, hold current search results in its context window (in-context), and write its final synthesis to a structured database record (persistent storage). All three memory types working together.

3. Planning & Reasoning

The planning module is where the agent decides what to do next. Several strategies have proven effective:

Chain-of-Thought (CoT): The model writes out its reasoning before deciding on an action. Instead of immediately choosing "search for X," it first writes "I need to find recent pricing data for Competitor A. The best source would be their website's pricing page. Let me search for that." The explicit reasoning dramatically improves decision quality.

ReAct (Reason + Act): The dominant production pattern. The model alternates between reasoning ("I should check if this function has tests") and acting (search for test files). Each observation from an action feeds back into the next reasoning step. Transparent, debuggable, and effective.

Tree of Thought: Explores multiple reasoning branches simultaneously — like a chess player thinking several moves ahead across different possible game states. Computationally expensive but useful for complex planning problems.

Reflection: The agent critiques its own output before finalizing. "Does this code actually solve the problem as described? Let me re-read the requirements." Self-correction loops improve output quality, especially for high-stakes tasks.

4. Tool Use

Tool use is the bridge between reasoning and action — it's what makes an agent capable of affecting the world rather than just generating text about it.

At the API level, tool use works like this: when you set up an agent, you provide a list of tool definitions. Each definition includes the tool's name, a description of what it does, and the parameters it accepts. The model reads these definitions and decides when to invoke a tool. The tool runs in your code, and the result is sent back to the model as a new context element.

Common tool categories:

Tool Type	Examples	What It Enables
Web search	Brave Search, Tavily, SerpAPI	Real-time information retrieval
Code execution	Python interpreter, Node.js runner	Run and test code, analyze data
File system	Read/write files, list directories	Access and modify the codebase
Database	SQL queries, vector search	Read and update structured data
External APIs	REST calls, GraphQL	Interact with any external service
Browser control	Playwright, Puppeteer	Web automation and scraping
Communication	Email, Slack, GitHub	Act on the world outside the codebase

The power of tool use comes from composition: an agent with access to web search, a code interpreter, and file system tools can research a topic, write code that processes the data, test it, and save the results — all in a single agentic session.

Managed tool orchestration: For organizations that want tool-use agents without building infrastructure, platforms like Claude Managed Agents (Anthropic, public beta) provide composable APIs for defining agent behavior and tools — Anthropic handles the orchestration, scaling, and monitoring. This lowers the barrier from "build a full agent framework" to "define what the agent should do."

5. Action Output

The final component closes the loop: the agent takes an action in the world.

Actions span a wide range of impact:

Read-only (safest): Searching the web, reading files, querying databases. No side effects; always safe to allow autonomously.

Write-but-reversible: Creating new files, drafting emails, writing database records. Has side effects, but most can be undone.

Irreversible high-stakes: Sending emails, making purchases, deleting data, deploying to production. These warrant human approval before execution.

✅Tip

Human-in-the-loop design: The most reliable production agents don't try to be fully autonomous for every action. Design checkpoints where the agent presents its plan or output for human review before taking irreversible actions. The agent does the hard thinking; the human makes the final call on consequential decisions.

How the Components Work Together

Consider a real example: a coding agent tasked with "fix the authentication bug reported in issue #247."

Perception: The agent reads the GitHub issue, the relevant code files, and the failing test output
Memory: It retrieves similar past bug fixes from its RAG store to learn from prior patterns
Planning: It uses ReAct to reason: "The error says token validation fails after 24 hours. I should look at the token expiry logic."
Tool use: It searches the codebase for token validation code, reads the specific file, runs the tests to confirm the failure
Reasoning: It identifies the bug — an off-by-one error in the expiry calculation
Action: It edits the file, runs the tests again, confirms they pass, and opens a pull request

Each component enables the next. Remove any one of them and the agent cannot complete the task: no perception means no context; no memory means no access to relevant knowledge; no planning means random actions; no tools means no ability to act; no action output means no result.

Key Takeaways

Every AI agent is built from five components: perception, memory, planning/reasoning, tool use, and action output
Memory comes in three forms: in-context (fast but limited), RAG (large-scale retrieval), and persistent storage (cross-session state)
Tool use is what transforms a language model from a text generator into a system that can act on the world
The most effective agents integrate all five components, with human-in-the-loop checkpoints for high-stakes actions

Core Components of an AI Agent

Audio & video lessons are paid features