Why multi-agent?
When developers say they need “multi-agent,” they’re usually looking for one or more of these capabilities:- Context management: Provide specialized knowledge without overwhelming the model’s context window. If context were infinite and latency zero, you could dump all knowledge into a single prompt — but since it’s not, you need patterns to selectively surface relevant information.
- Distributed development: Allow different teams to develop and maintain capabilities independently, composing them into a larger system with clear boundaries.
- Parallelization: Spawn specialized workers for subtasks and execute them concurrently for faster results.
Patterns
Here are the main patterns for building multi-agent systems, each suited to different use cases:| Pattern | How it works |
|---|---|
| Subagents | A main agent coordinates subagents as tools. All routing passes through the main agent, which decides when and how to invoke each subagent. |
| Handoffs | Behavior changes dynamically based on state. Tool calls update a state variable that triggers routing or configuration changes, switching agents or adjusting the current agent’s tools and prompt. |
| Skills | Specialized prompts and knowledge loaded on-demand. A single agent stays in control while loading context from skills as needed. |
| Router | A routing step classifies input and directs it to one or more specialized agents. Results are synthesized into a combined response. |
| Custom workflow | Build bespoke execution flows with LangGraph, mixing deterministic logic and agentic behavior. Embed other patterns as nodes in your workflow. |
Choosing a pattern
Use this table to match your requirements to the right pattern:- Distributed development: Can different teams maintain components independently?
- Parallelization: Can multiple agents execute concurrently?
- Multi-hop: Does the pattern support calling multiple subagents in series?
- Direct user interaction: Can subagents converse directly with the user?
Visual overview
- Subagents
- Handoffs
- Skills
- Router
A main agent coordinates subagents as tools. All routing passes through the main agent.
Performance comparison
Different patterns have different performance characteristics. Understanding these tradeoffs helps you choose the right pattern for your latency and cost requirements. Key metrics:- Model calls: Number of LLM invocations. More calls = higher latency (especially if sequential) and higher per-request API costs.
- Tokens processed: Total context window usage across all calls. More tokens = higher processing costs and potential context limits.
One-shot request
User: “Buy coffee”A specialized coffee agent/skill can call a
buy_coffee tool.
- Subagents
- Handoffs
- Skills
- Router
4 model calls:
Repeat request
Turn 1: “Buy coffee” Turn 2: “Buy coffee again”The user repeats the same request in the same conversation.
- Subagents
- Handoffs
- Skills
- Router
4 calls again → 8 total
- Subagents are stateless by design—each invocation follows the same flow
- The main agent maintains conversation context, but subagents start fresh each time
- This provides strong context isolation but repeats the full flow
Multi-domain
User: “Compare Python, JavaScript, and Rust for web development”Each language agent/skill contains ~2000 tokens of documentation. All patterns can make parallel tool calls.
- Subagents
- Handoffs
- Skills
- Router
5 calls, ~9K tokensEach subagent works in isolation with only its relevant context. Total: 9K tokens.
Summary
Here’s how patterns compare across all three scenarios:
Choosing a pattern: