8.1 — What Is an AI Agent?

Learning Objectives

Define what distinguishes an AI agent from a standard LLM chat session
Identify the core perception-reasoning-action loop that every agent follows
Recognize the main categories of agents and their real-world applications

From Chatbot to Agent

When you type a message to ChatGPT or Claude, you're using a large language model in its most basic form: you send text, it sends text back. The model has no memory beyond your conversation, no ability to take actions in the world, and no way to look anything up unless you paste it in.

An AI agent is fundamentally different. An agent perceives its environment, reasons about what to do, takes an action, observes the result of that action, and repeats. The loop never ends until the task is complete. The model isn't just responding — it's working.

💡Key Concept

The Perception → Reasoning → Action Loop: Every AI agent follows this cycle: (1) Perceive — receive input (text, images, API data, file contents); (2) Reason — decide what action to take; (3) Act — execute the action (search the web, write code, send a message); (4) Observe — receive the result; (5) Repeat — loop back to reasoning with new information. This continues until the task is done or the agent asks for human input.

Consider the difference in practice. With a standard LLM chat: you ask "what's wrong with this code?" and the model explains the bug. With a coding agent: you say "the tests are failing on CI, fix it" and the agent reads your repository, identifies the failing tests, traces the root cause, edits the relevant files, runs the tests locally to confirm the fix, and opens a pull request — all without further input from you.

📝Note

Same model, different scaffolding: The underlying LLM powering an agent (Claude, GPT-5.5, Gemini 3) is the same model you'd use for a chat session. The difference isn't in the model itself — it's in the software framework that wraps it, giving it access to tools and the ability to take multi-step actions.

Types of AI Agents

Not all agents are built the same. The three foundational archetypes:

Reactive Agents

Reactive agents respond to inputs according to a fixed set of rules or a trained policy, without planning ahead. The "input → immediate response" pattern.

Real-world example: a customer service bot that detects the phrase "cancel my subscription" and immediately routes to a retention workflow. It doesn't reason about whether cancellation makes sense — it reacts.

Reactive agents are fast, predictable, and easy to debug. They're appropriate when the decision space is well-defined and limited.

Goal-Directed Agents

Goal-directed agents receive an objective and work toward it — selecting actions based on what will best achieve the goal, adjusting their approach based on results.

Real-world example: a research agent given "write a competitive analysis of the top five project management tools." It searches for each tool, reads the relevant pages, finds pricing information, checks review sites, and synthesizes everything into a structured document. It didn't know in advance exactly which sites it would visit — it made those decisions based on what it found.

Most production AI agents today are goal-directed.

Learning Agents

Learning agents update their behavior based on feedback — improving their approach over time. A customer service agent that learns from which resolutions satisfied customers. A coding agent that notices which testing strategies find more bugs and starts applying them more often.

Learning agents are more powerful long-term but more complex to design and validate safely.

Real-World Agent Examples

The shift from "AI as tool" to "AI as teammate" is already happening across multiple domains:

Coding agents: Given "implement user authentication with email and Google OAuth," a coding agent reads your project, picks the appropriate libraries, writes the code across multiple files, creates database migrations, writes tests, debugs failures, and opens a PR for your review.

Research agents: Given "research our top 10 competitors and summarize their pricing and positioning," the agent browses each company's website, reads product pages, finds pricing information, notes differentiating claims, and produces a structured summary with citations.

Customer service agents: Handle the full lifecycle of routine support tickets. Check order status, process refunds for eligible orders, answer questions from a knowledge base, and escalate only the genuinely complex cases to human agents.

Computer-use agents: Control a computer like a human would — launching applications, clicking buttons, filling out forms, extracting data from legacy software without APIs. Useful for automating tasks in systems that predate API access. As of April 2026, all three major providers offer computer use: Claude (Anthropic), GPT-5.5 (OpenAI), and Gemini 3 Pro/Flash (Google).

Desktop knowledge work agents: A newer category exemplified by Claude Cowork (Anthropic, GA February 2026) — desktop agents designed for non-technical professionals. Rather than writing code, these agents navigate local files, draft documents, process emails, and connect to enterprise tools (Google Drive, Gmail, DocuSign). They bring the autonomous perception-reasoning-action loop to everyday office work, not just software development.

Data pipeline agents: Monitor data sources continuously, detect anomalies or threshold violations, trigger downstream processes, and send alerts with context — without human monitoring.

Why Agentic AI Matters Now

For most of the 2020s, "AI" meant autocomplete or a chatbot. Agentic AI represents the transition from AI as an assistant you consult to AI as a system that does work autonomously.

This shift matters for several reasons:

Scale: A human knowledge worker can handle one task at a time. An agent can run dozens of sub-tasks in parallel. A single well-designed agent can compress hours of work into minutes.

Compounding capability: As models improve, agent capability improves with them — the same agent framework gets dramatically better each time the underlying model does.

Enterprise adoption: Agentic AI is rapidly becoming the default pattern for production AI deployments. Understanding how agents work is essential for anyone evaluating, building, or deploying AI systems in a professional context.

📝Note

The frontier — agent-first devices. The agent pattern is starting to move beyond software and into hardware. At its Build 2026 conference, Microsoft introduced Project Solara, a platform for "agent-first" devices — hardware designed so that an AI agent, not a screen of apps, is the primary way you get things done. It is early and unproven, but it signals where some believe agentic computing is heading: from a feature inside your apps to the organizing principle of the device itself.

The modules that follow cover exactly how agents are built: the core components they rely on, the protocols that connect them to the world, the frameworks developers use to build them, and the safety challenges that must be addressed before deploying them at scale.

Key Takeaways

An AI agent follows a continuous loop: perceive → reason → act → observe → repeat — unlike a chatbot, which responds once and stops
The underlying model (Claude, GPT-5.5, Gemini 3) is the same; what makes an agent is the scaffolding that gives it tools and the ability to take multi-step actions
The three agent archetypes are reactive (stimulus-response), goal-directed (working toward an objective), and learning (improving from feedback)
Agentic AI is becoming the dominant production deployment pattern — not a niche application, but the default for serious AI use

What Is an AI Agent?

Audio & video lessons are paid features