Learning Objectives
- Explain the ReAct pattern and why it dominates production agent deployments
- Compare single-agent and multi-agent orchestration approaches
- Identify the leading open-source frameworks and managed platforms for agent development
Single-Agent vs. Multi-Agent
The first architectural decision when building an agent: should one agent handle the entire task, or should multiple specialized agents collaborate?
Single-Agent Architecture
One LLM handles all reasoning, tool selection, execution, and synthesis. The simplest and most common approach.
Advantages: Easy to debug (one reasoning trace), low latency (no coordination overhead), predictable behavior, simpler monitoring.
Limitations: The agent must be competent across all aspects of the task. A single agent tasked with "research competitors, write a report, and create a slide deck" needs to be good at web research, analytical writing, and visual layout simultaneously.
Best for: Well-defined tasks with a clear start and end, tasks that don't benefit from parallelism, early-stage implementations where simplicity matters.
Multi-Agent Architecture
Multiple specialized agents coordinate to complete a complex task. The most common pattern is orchestrator-worker:
- An orchestrator agent receives the high-level task, breaks it into subtasks, delegates to specialized workers, collects results, and synthesizes a final output
- Worker agents each specialize in one domain: one for web research, one for code generation, one for writing, one for quality review
Example: a software engineering workflow with an orchestrator that receives a feature request and delegates to: a planning agent (designs the solution), a coding agent (implements it), a testing agent (writes and runs tests), and a documentation agent (updates the docs). The orchestrator assembles the outputs into a pull request.
Advantages: Each agent specializes; tasks run in parallel; complex workflows become manageable.
Limitations: Coordination overhead; harder to debug (which agent made the mistake?); error propagation; higher cost (more LLM calls).
📝Note
The landscape is evolving rapidly. Most production agents today are single-agent. Multi-agent systems are increasingly practical as models improve and frameworks mature, but they introduce complexity that should be justified by a clear need — don't add agents for their own sake.
The ReAct Pattern
ReAct (Reason + Act) is the dominant pattern for production LLM agents. It emerged from a 2022 Google Research paper and has become the de facto standard for agentic workflows.
The core loop:
Think: "I need to find the latest pricing for Stripe's Teams plan."
Act: search_web("Stripe Teams plan pricing 2026")
Observe: [search results returned]
Think: "The results show $12/user/month. Now I need Lemon Squeezy's pricing."
Act: search_web("Lemon Squeezy pricing plans")
Observe: [search results returned]
Think: "I have both prices. I can now write the comparison."
Act: write_file("pricing-comparison.md", ...)
Why ReAct works so well:
- Explicit reasoning before each action dramatically reduces errors vs. acting impulsively
- Each observation is incorporated into subsequent reasoning — the agent adapts based on what it finds
- The trace is readable — developers can inspect the full Think/Act/Observe sequence to understand why the agent did what it did
- Failures are localizable — when something goes wrong, you can pinpoint which step produced the wrong conclusion
ReAct is implemented in LangChain, LlamaIndex, AG2, CrewAI, and most other frameworks. When you see "agent reasoning traces" in a product's UI, you're usually seeing ReAct in action.
Open-Source Frameworks
LangChain + LangGraph
The most widely used framework ecosystem for building LLM applications. LangChain and LangGraph both reached their v1.0 milestones in early 2026, with adoption by companies like Uber, LinkedIn, and Klarna.
LangChain provides the high-level building blocks:
- Chains: Sequences of LLM calls and tool uses linked together
- Agents: Pre-built ReAct, OpenAI tools, and custom agent implementations
- Tool ecosystem: 100+ pre-built tool integrations
- Memory modules: Conversation history, vector store integration, entity memory
LangGraph is now the recommended approach for production agent workflows. It models agent logic as stateful directed graphs — enabling complex orchestration, durable execution (agents persist through failures), human-in-the-loop checkpoints, and both short-term working memory and long-term persistent memory. LangChain agents are built on LangGraph underneath, so you can start with LangChain's high-level APIs and drop down to LangGraph when you need finer control.
LangChain also launched Deep Agents — a harness for agents that can plan, spawn subagents, and use file systems for complex multi-step tasks.
The honest picture: LangChain has faced valid criticism for excessive abstraction. Teams that have gone deep into production use often simplify away from it, using raw API calls plus their own orchestration. LangGraph addresses much of this criticism by providing a lower-level, more controllable runtime. For getting started, LangChain is excellent. For production at scale, LangGraph is the recommended path.
LlamaIndex
Specialized in data ingestion and RAG. Where LangChain is a general-purpose framework, LlamaIndex focuses on: indexing any data source (PDFs, databases, APIs, web pages), building sophisticated retrieval pipelines, and powering agents that work with large document collections.
Best for: building agents that need to reason over large internal knowledge bases — documentation Q&A systems, internal search, research synthesis over document archives.
AG2 (formerly AutoGen)
Originally created by Microsoft Research in 2023, AutoGen was a pioneering multi-agent conversation framework. In late 2024, the original creators forked the project as AG2 — an independent, community-governed open-source project ("The Open-Source AgentOS") at ag2.ai.
AG2 retains AutoGen's core design: agents are defined as participants in a conversation and exchange messages to coordinate. Particularly strong for coding tasks where a "Coder" agent and a "Reviewer" agent interact to iteratively improve code.
📝Note
Microsoft's pivot: In early 2026, Microsoft retired AutoGen and debuted the Microsoft Agent Framework to unify and govern enterprise AI agents. AutoGen and Semantic Kernel were placed in maintenance mode (bug fixes and security patches only, no new features). For new multi-agent projects, Microsoft now directs developers to its Agent Framework platform, while the open-source community continues development under the AG2 banner.
CrewAI
Role-based multi-agent framework. You define agents as "crew members" with specific roles, goals, and backstories. The framework handles the orchestration of how they collaborate.
CrewAI's opinionated structure makes it easy to get started — you define the crew and the task, and the framework routes work between agents. The platform has scaled significantly: over 12 million executions per day, roughly 2 billion agentic executions over the past year, and adoption by 60%+ of the Fortune 500.
CrewAI Flows is the production architecture for building event-driven agent systems — supporting conditional logic, loops, real-time state management, and integration with external systems. Flows let you start with rules-based steps, add LLM enrichment, layer in agent delegation, and scale to full crew orchestration within the same framework. The Enterprise AMP Suite adds tracing, observability, a unified control plane, and seamless enterprise integrations.
Paperclip
Open-source orchestration platform that takes a different approach: instead of defining agent pipelines or chains, you build a company org chart where AI agents have defined roles, reporting lines, budgets, and goals. Agents are "hired" into positions, given responsibilities, and managed through scheduled heartbeats (timed activations), per-agent monthly budgets with automatic enforcement, and a governance layer with board-level approval for high-stakes decisions. With over 37,000 GitHub stars, Paperclip supports any agent type (Claude, OpenClaw, Cursor) and can run multiple isolated agent-powered businesses from a single self-hosted deployment.
Managed Platforms and Cloud-Hosted Agents
For teams that don't want to build agent infrastructure from scratch, two distinct approaches have emerged: desktop agents (run locally on your machine) and cloud-hosted agents (run on the provider's infrastructure).
Cloud-Hosted Agent Platforms
Claude Managed Agents (Anthropic, public beta): Composable APIs for building and deploying cloud-hosted AI agents at scale. Unlike Claude Code and Cowork (which run locally), Managed Agents runs on Anthropic's infrastructure — you define the agent's behavior, tools, and constraints; Anthropic handles orchestration, scaling, monitoring, and persistence. Supports composing multi-agent workflows where agents hand off to each other, with built-in audit logging, role-based access, and cost controls. Ideal for organizations that want persistent, always-on agents without managing compute infrastructure.
Claude Cowork (Anthropic, GA February 2026): A desktop agent for non-technical knowledge work — described as "Claude Code for the rest of your work." Runs on your computer and can autonomously read/create/edit local files, navigate between applications, and connect to enterprise tools via plugins (Google Drive, Gmail, DocuSign, FactSet). Enterprise features include SCIM, SSO, and audit logging. Represents the extension of agentic AI beyond developers to business professionals, analysts, and knowledge workers.
Desktop Coding Agents
Claude Code (Anthropic): Terminal-based coding agent with 1 million token context. Deep codebase understanding via MCP. Reads repos, writes files, runs tests, creates PRs. Supports sub-agents (ephemeral workers for parallel subtasks) and Agent Teams (multiple coordinated Claude Code instances sharing a task list). The Claude Agent SDK (Python and TypeScript) lets developers build custom agents with built-in file operations, shell commands, web search, and MCP integration out of the box. Apple's Xcode 26.3 integrated the SDK natively.
OpenAI Codex (OpenAI): Web-based coding agent, now also available on Windows (March 2026). GPT-5.5 under the hood for complex tasks (default since the April 23, 2026 launch), with GPT-5.4 mini available for lighter tasks and subagents at roughly 30% of the compute cost. Earlier models include GPT-5.1-Codex-Max (frontier reasoning) and GPT-5.2-Codex (long-horizon work with context compaction). The separate OpenAI Agents SDK provides the official framework for building custom agents with tool use, handoffs between agents, and guardrails.
Gemini CLI (Google): Open-source CLI. Automatically routes between Gemini 3.1 Pro (complex tasks) and Flash (speed). 1 million token context. Supports plan mode, subagent delegation with proxy routing, and a model-driven parallel tool scheduler that runs safe tools concurrently. Free to a daily limit.
GitHub Copilot Coding Agent: Rebuilt from the earlier Copilot Workspace, the Coding Agent (GA since September 2025) takes a GitHub issue and autonomously implements a solution — reads the issue, plans, writes code, and opens a PR. Copilot CLI (GA February 2026) brings full agentic capabilities to the terminal — planning, building, reviewing, and remembering across sessions. Agent mode in VS Code and JetBrains handles multi-file changes with terminal command execution and iterative error remediation.
Open-Source Personal Agents
A distinct category is emerging alongside coding agents and workflow automation: personal AI agents that operate as always-available assistants through messaging platforms.
OpenClaw is the most prominent example — a free, open-source autonomous agent with over 247,000 GitHub stars (the fastest-growing open-source project on GitHub as of early 2026). OpenClaw runs locally on your machine and uses existing messaging platforms (WhatsApp, Telegram, Discord, Slack, Signal, iMessage) as its interface — no proprietary app required.
Key architectural differences from coding agents and workflow tools:
- LLM-agnostic: Works with Claude, GPT, DeepSeek, or local models — the user chooses the underlying LLM
- Messaging-native UI: Instead of a terminal (coding agents) or web dashboard (workflow tools), OpenClaw meets users in their existing communication platforms
- Proactive scheduling: Can execute tasks on a schedule (cron-style) without being prompted — a step toward truly autonomous assistants
- Skill marketplace: Extensible via ClawHub.ai, a community marketplace of third-party skills — which also introduces significant security considerations (see Section 8.5)
OpenClaw was created by Peter Steinberger, who joined OpenAI in February 2026. The project moved to an independent open-source foundation.
Memory System Design
Returning to memory with the framework context in mind: how do real agent systems actually implement the three memory types?
In-context memory is managed automatically by the framework. As the context fills, most frameworks implement a summarization step — compressing older parts of the conversation into a shorter summary to make room for new tool outputs.
RAG implementation typically involves:
- An ingestion pipeline that processes documents, chunks them, embeds each chunk, and stores in a vector DB
- A retrieval step at the start of each agent turn: query the vector DB for chunks relevant to the current question, add top results to context
- Tools like LlamaIndex or LangChain's retrieval modules handle this pipeline
Persistent storage is implemented as tool calls: the agent calls a write_to_db or save_to_file tool when it needs to persist information. A future session can call read_from_db or load_from_file to retrieve it.
✅Tip
Start with in-context memory. For most agent tasks, you don't need RAG or persistent storage. Start with pure in-context memory, get the agent working, and add more sophisticated memory only when you hit actual limitations — context window exhaustion, cross-session state requirements, or performance constraints.
Key Takeaways
- Single-agent is simpler and should be the default; add multi-agent orchestration only when task complexity genuinely requires specialization and parallelism
- The ReAct pattern (Reason + Act + Observe loop) is the dominant production pattern because it's transparent, debuggable, and robust
- LangChain + LangGraph (v1.0) is the most widely used framework ecosystem; LlamaIndex specializes in RAG; AG2 (formerly AutoGen) and CrewAI target multi-agent coordination
- Memory design choices — in-context, RAG, or persistent — should be driven by actual requirements, not anticipated ones; start simple and add complexity when needed