マルチエージェントシステムの構築:使用するタイミングと方法
Anthropic は、単一エージェントで十分な場合が多いため、マルチエージェントシステムは文脈汚染、並列処理、専門化の 3 つの特定の状況でのみ採用すべきであると提言している。
キーポイント
単一エージェントからの開始推奨
多くの場合、適切なプロンプトとツールを持つ単一エージェントで十分であり、マルチエージェントはオーバーヘッド(トークン消費の 3-10 倍増)や失敗点の増加をもたらすため、まずは単一エージェントから始めるべきである。
マルチエージェントが有効な 3 つの条件
文脈汚染による性能低下、タスクの並列実行が必要、および専門化によるツール選択や焦点の向上が必要な場合にのみ、複数のエージェントが単一エージェントを上回る。
オーケストレーターとサブエージェントのパターン
本記事では、リードエージェントが特定のサブタスクのために専門的なサブエージェントを生成・管理する階層モデル(オーケストレーター - サブエージェント)に焦点を当てている。
実装におけるコストとリスク
チームは複雑なマルチエージェントアーキテクチャに数ヶ月投資しても、単一エージェントのプロンプト改善で同等の結果が得られるケースがあり、ハンドオフ時の文脈喪失や調整コストが課題となる。
コンテキスト汚染の防止
単一エージェントでは不要な情報が蓄積され推論能力が低下する「コンテキスト汚染」を防ぐため、サブエージェントは独立したクリーンなコンテキストで特定タスクに集中します。
情報の要約とフィルタリング
専門的なサブエージェントが大量のデータ(例:注文履歴)を処理し、主要エージェントには必要な情報だけを要約して渡すことで、コンテキストの質を維持します。
コスト対効果の判断基準
マルチエージェントアーキテクチャは追加コストがかかるため、単一エージェントでは解決できない制約に明確なメリットがある場合にのみ採用すべきです。
影響分析・編集コメントを表示
影響分析
この記事は、マルチエージェントシステムへの過度な期待を戒め、実務におけるコスト対効果(ROI)を客観的に評価する重要な指針を示しています。開発者がアーキテクチャの複雑さを増やす前に、単一エージェントの可能性を再評価し、明確な要件に基づいてのみ分散型アプローチを採用するよう促すことで、業界全体の設計思想に成熟をもたらす可能性があります。
編集コメント
「マルチエージェント=高性能」という直感的な誤解を解き、実装コストと効果のバランスを論理的に整理した非常に示唆に富む記事です。開発者は安易な分散化を避け、明確な課題解決のためにのみこのアーキテクチャを採用すべきでしょう。
具体的な基準。
「動作することを確認する」ではなく、「完全なテストスイートを実行し、すべての失敗を報告する」と指定します。
包括的なチェック。
検証エージェントに複数のシナリオとエッジケースをテストすることを要求します。
ネガティブテスト。
失敗すべき入力を試行し、実際に失敗することを確認するよう検証エージェントに指示します。
明示的な指示。
「合格とマークする前に、完全なテストスイートを実行しなければならない」という指示は不可欠です。包括的な検証に対する明示的な要件がなければ、検証エージェントは近道を取ります。
マルチエージェントシステムは強力ですが、普遍的に適切というわけではありません。複数の調整されたエージェントの複雑さを追加する前に、以下を確認してください:
マルチエージェントが解決する真の制約が存在すること。例えば、コンテキストの制限、並列化の機会、専門化の必要性などです。
分解は問題の種類ではなく、コンテキストに従うこと。作業をその種類ではなく、必要とするコンテキストによってグループ化します。
サブエージェントが完全なコンテキストを必要とせずに作業を検証できる明確な検証ポイントが存在すること。
私たちのアドバイスは?
機能する最もシンプルなアプローチから始め、証拠がそれを支持する場合にのみ複雑さを追加してください。
これはマルチエージェントシステムに関する連載記事の最初の投稿です。シングルエージェントパターンの詳細については、「効果的なエージェントの構築」を参照してください。コンテキスト管理戦略については、「AIエージェントのための効果的なコンテキストエンジニアリング」を参照してください。私たちがマルチエージェント研究システムをどのように構築したかについての詳細については、「私たちがマルチエージェント研究システムを構築した方法」を参照してください。
謝辞
Cara Phillips 執筆、Paul Chen、Andy Schumeister、Brad Abrams、Theo Chu が貢献。
PrevPrev0/5NextNexteBook
Claudeを使用して構築するチームのための製品ニュースとベストプラクティスをもっと探る。
金融のためのコワークとプラグイン
エンタープライズAI 金融のためのコワークとプラグイン 金融のためのコワークとプラグイン 金融のためのコワークとプラグイン 金融のためのコワークとプラグイン 2026年2月24日 エンタープライズ全体のチームのためのコワークとプラグイン
エージェント エンタープライズ全体のチームのためのコワークとプラグイン エンタープライズ全体のチームのためのコワークとプラグイン エンタープライズ全体のチームのためのコワークとプラグイン エンタープライズ全体のチームのためのコワークとプラグイン 2026年1月12日 残りの作業のための Claude Code としてのコワーク
製品発表 残りの作業のための Claude Code としてのコワーク 残りの作業のための Claude Code としてのコワーク 残りの作業のための Claude Code としてのコワーク 残りの作業のための Claude Code としてのコワーク 2025年11月10日 プロンプトエンジニアリングのベストプラクティス
エージェント プロンプトエンジニアリングのベストプラクティス プロンプトエンジニアリングのベストプラクティス プロンプトエンジニアリングのベストプラクティス プロンプトエンジニアリングのベストプラクティス Claudeで組織の運営方法を変革する
開発者向けニュースレターを入手
製品アップデート、ハウツー、コミュニティスポットライトなど。毎月メールでお届けします。
購読する 購読する 月次の開発者向けニュースレターの配信をご希望の場合は、メールアドレスをご提供ください。いつでも購読を解除できます。
原文を表示
Building multi-agent systems: When and how to use them
While single-agent systems handle most enterprise workflows effectively, multi-agent architectures can unlock additional value for your organization. Learn when and how to use them.
ProductClaude Developer PlatformClaude Code
DateJanuary 23, 2026
Reading time5min
ShareCopy linkhttps://claude.com/blog/building-multi-agent-systems-when-and-how-to-use-them
A multi-agent system is an architecture where multiple LLM instances run with separate conversation contexts, coordinated through code. Multiple coordination patterns exist (agent swarms, capability-based systems, and message bus architectures), but this article focuses on the orchestrator-subagent pattern: a hierarchical model where a lead agent spawns and manages specialized subagents for specific subtasks. This pattern offers a straightforward coordination model and is a good starting point for teams new to multi-agent systems. We'll explore other patterns in detail in our next article.
Today, multi-agent systems are often applied in situations where a single agent would perform better, though this calculus continues to evolve as models improve. At Anthropic, we’ve seen teams invest months building elaborate multi-agent architectures only to discover that improved prompting on a single agent achieved equivalent results.
After building multi-agent systems and working with teams deploying them in production, we've identified three situations where multiple agents consistently outperform a single agent: when context pollution degrades performance, when tasks can run in parallel, and when specialization improves tool selection or task focus. Outside these situations, the coordination costs typically exceed the benefits. In this article, we share how to recognize single-agent limits, identify the three scenarios where multi-agent systems excel, and avoid common implementation mistakes.
The case for starting with a single agent
A well-designed single agent with appropriate tools can accomplish far more than many developers expect.
Multi-agent systems introduce overhead. Every additional agent represents another potential point of failure, another set of prompts to maintain, and another source of unexpected behavior.
We've observed teams build elaborate multi-agent systems with separate agents for planning, execution, review, and iteration, only to discover that they suffered from lost context at each handoff and spent more tokens coordinating than executing. In our testing, multi-agent implementations typically use 3-10x more tokens than single-agent approaches for equivalent tasks. This overhead stems from duplicating context across agents, coordination messages between agents, and summarizing results for handoffs.
A decision framework for multi-agent systems
Multi-agent architectures provide value when they address specific constraints that a single agent cannot overcome. This means multi-agent architectures should be reserved for cases where they provide clear benefits that justify the additional cost.
The patterns below represent cases where we consistently observe positive returns on this investment.
Context protection
Large language models have finite context windows, and response quality can degrade as context grows. When an agent's context accumulates information from one subtask that is irrelevant to subsequent subtasks, context pollution occurs. Subagents provide isolation, with each operating in its own clean context focused on its specific task.
Consider a customer support agent that needs to retrieve order history while diagnosing technical issues. If every order lookup adds thousands of tokens to the context, the agent's ability to reason about the technical problem degrades.
The single-agent approach:
Single agent accumulates everything in context conversation_history = [ {"role": "user", "content": "My order #12345 isn't working"}, {"role": "assistant", "content": "Let me check your order..."}, # Tool result adds 2000+ tokens of order history {"role": "user", "content": "... (order details, past purchases, shipping info) ..."}, {"role": "assistant", "content": "Now let me diagnose the technical issue..."}, # Context is now polluted with order details the agent doesn't need ]
The agent must reason about the technical issue while maintaining 2000+ tokens of irrelevant order history in context, diluting attention and reducing response quality.
The multi-agent approach:
from anthropic import Anthropic client = Anthropic() class OrderLookupAgent: def lookup_order(self, order_id: str) -> dict: # Separate agent with its own context messages = [ {"role": "user", "content": f"Get essential details for order {order_id}"} ] response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=messages, tools=[get_order_details_tool] ) # Returns only essential information return extract_summary(response) class SupportAgent: def handle_issue(self, user_message: str): if needs_order_info(user_message): order_id = extract_order_id(user_message) # Get only what's needed, not full history order_summary = OrderLookupAgent().lookup_order(order_id) # Inject compact summary, not full context context = f"Order {order_id}: {order_summary['status']}, purchased {order_summary['date']}" # Main agent context stays clean messages = [ {"role": "user", "content": f"{context}\n\nUser issue: {user_message}"} ] response = client.messages.create( model="claude-sonnet-4-5", max_tokens=2048, messages=messages ) return response
The order lookup agent processes the full order history and extracts a summary. The main agent receives only the 50-100 tokens it actually needs, keeping context focused.
Context isolation is most effective when subtasks generate high context volume (more than 1000 tokens) but most of that information is irrelevant to the main task, when the subtask is well-defined with clear criteria for what information to extract, and for lookup or retrieval operations that require filtering before use.
Parallelization
Running multiple agents in parallel allows you to explore a larger search space than a single agent can cover. This pattern has proven particularly valuable for search and research tasks.
Our Research feature uses this approach. A lead agent analyzes a query and spawns multiple subagents to investigate different facets in parallel. Each subagent searches independently, then returns distilled findings. Multi-agent search has shown substantial accuracy improvements over single-agent approaches by allowing exploration across larger information spaces.
The core implementation decomposes a question into independent facets, runs subagents concurrently, then synthesizes the results.
import asyncio from anthropic import AsyncAnthropic client = AsyncAnthropic() async def research_topic(query: str) -> dict: # Lead agent breaks query into research facets facets = await lead_agent.decompose_query(query) # Spawn subagents to research each facet in parallel tasks = [ research_subagent(facet) for facet in facets ] results = await asyncio.gather(*tasks) # Lead agent synthesizes findings return await lead_agent.synthesize(results) async def research_subagent(facet: str) -> dict: """Each subagent has its own context window""" messages = [ {"role": "user", "content": f"Research: {facet}"} ] response = await client.messages.create( model="claude-sonnet-4-5", max_tokens=4096, messages=messages, tools=[web_search, read_document] ) return extract_findings(response)
This improved coverage comes at a cost. Multi-agent systems typically consume 3 to 10 times more tokens than single-agent approaches for equivalent tasks. This happens because each agent needs its own context, agents must exchange messages to coordinate, and results must be summarized when passed between agents. While parallelism helps reduce total execution time compared to running all that work sequentially, multi-agent systems often take longer overall than single-agent systems because of the sheer increase in total computation.
The primary benefit of parallelization is thoroughness, not speed. When you need to search across a large information space or investigate many angles of a complex question, parallel agents can cover more ground than a single agent working within its context limits. The tradeoff is higher token usage and often longer total execution time in exchange for more comprehensive results.
Different tasks sometimes benefit from different tool sets, system prompts, or domains of expertise. Rather than providing a single agent with access to dozens of tools, specialized agents with focused toolsets matched to their responsibilities can improve reliability.
Tool set specialization
When an agent has access to too many tools, performance suffers. Three signals indicate tool specialization would help:
Quantity. An agent with too many tools (often 20+) struggles to select the appropriate one.
Domain confusion. When tools span multiple unrelated domains (database operations, API calls, file system operations), the agent confuses which domain applies to a given task.
Degraded performance. Adding new tools degrades performance on existing tasks, suggesting the agent has reached its capacity for tool management.
System prompt specialization
Different tasks sometimes require different personas, constraints, or instructions that conflict when combined. A customer support agent needs to be empathetic and patient; a code review agent needs to be precise and critical. A compliance-checking agent needs rigid rule-following; a brainstorming agent needs creative flexibility. When a single agent must switch between conflicting behavioral modes, separating into specialized agents with tailored system prompts produces more consistent results.
Domain expertise specialization
Some tasks benefit from deep domain context that would overwhelm a generalist agent. A legal analysis agent might need extensive context about case law and regulatory frameworks. A medical research agent might need specialized knowledge about clinical trial methodology. Rather than loading all domain context into a single agent, specialized agents can carry focused expertise relevant to their specific responsibilities.
Example: Multi-platform integration. Consider an integration system where agents need to work across CRM, marketing automation, and messaging platforms. Each platform has 10-15 relevant API endpoints. A single agent with 40+ tools often struggles to select correctly, confusing similar operations across platforms. Splitting into specialized agents with focused toolsets and tailored prompts resolves selection errors.
from anthropic import Anthropic client = Anthropic() # Specialized agents with focused toolsets and tailored prompts class CRMAgent: """Handles customer relationship management operations""" system_prompt = """You are a CRM specialist. You manage contacts, opportunities, and account records. Always verify record ownership before updates and maintain data integrity across related records.""" tools = [ crm_get_contacts, crm_create_opportunity, # 8-10 CRM-specific tools ] class MarketingAgent: """Handles marketing automation operations""" system_prompt = """You are a marketing automation specialist. You manage campaigns, lead scoring, and email sequences. Prioritize data hygiene and respect contact preferences.""" tools = [ marketing_get_campaigns, marketing_create_lead, # 8-10 marketing-specific tools ] class OrchestratorAgent: """Routes requests to specialized agents""" def execute(self, user_request: str): response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, system="""You coordinate platform integrations. Route requests to the appropriate specialist: - CRM: Contact records, opportunities, accounts, sales pipeline - Marketing: Campaigns, lead nurturing, email sequences, scoring - Messaging: Notifications, alerts, team communication""", messages=[ {"role": "user", "content": user_request} ], tools=[delegate_to_crm, delegate_to_marketing, delegate_to_messaging] ) return response
This pattern mirrors effective professional collaboration, where specialists with tools matched to their roles collaborate more effectively than generalists attempting to maintain expertise across all domains. However, specialization introduces routing complexity. The orchestrator must correctly classify requests and delegate to the right agent, and misrouting leads to poor results. Maintaining multiple specialized agents also increases prompt maintenance overhead. Specialization works best when domains are clearly separable and routing decisions are unambiguous.
Outgrowing single-agent architectures
Beyond the general framework, certain concrete signals suggest that single-agent patterns have been outgrown:
Approaching context limits.If an agent routinely uses large amounts of context and performance is degrading, context pressure may be the bottleneck. Note that recent advances in context management (such as compaction) are reducing this limitation, allowing single agents to maintain effective memory across much longer horizons.
Managing many tools. When an agent has 15-20+ tools, the model spends significant context and attention understanding its options. Before adopting a multi-agent architecture, consider using the Tool Search Tool, which lets Claude dynamically discover tools on-demand rather than loading all definitions upfront. This can reduce token usage by up to 85% while improving tool selection accuracy.
Parallelizable subtasks. When tasks naturally decompose into independent pieces (research across multiple sources, tests for multiple components), parallel subagents can provide substantial speedups.
These thresholds will shift as models improve. Current limits represent practical guidelines, not fundamental constraints.
Context-centric decomposition
When adopting a multi-agent architecture, the most important design decision is how to divide work between agents. We've observed that teams frequently make this choice incorrectly, leading to coordination overhead that negates the benefits of multi-agent design.
The key insight is to adopt a context-centric view rather than a problem-centric view when decomposing work.
Problem-centric decomposition (often counterproductive). Dividing by type of work (one agent writes features, another writes tests, a third reviews code) creates constant coordination overhead. Each handoff loses context. The test-writing agent lacks knowledge of why certain implementation decisions were made and the code reviewer lacks the context of exploration and iteration.
Context-centric decomposition (usually effective). Dividing by context boundaries means an agent handling a feature should also handle its tests, because it already possesses the necessary context. Work should only be split when context can be truly isolated.
This principle emerges from observing failure modes in multi-agent systems. When agents are split by problem type, they engage in a "telephone game," passing information back and forth with each handoff degrading fidelity. In one experiment with agents specialized by software development role (planner, implementer, tester, reviewer), the subagents spent more tokens on coordination than on actual work.
Effective decomposition boundaries include:
Independent research paths. Investigating "market trends in Asia" versus "market trends in Europe" can proceed in parallel with no shared context.
Separate components with clean interfaces. With a well-defined API contract, frontend and backend work can proceed in parallel.
Blackbox verification. A verifier that only needs to run tests and report results does not require implementation context.
Problematic decomposition boundaries include:
Sequential phases of the same work. Planning, implementation, and testing of the same feature share too much context.
Tightly coupled components. Components requiring constant back-and-forth belong in the same agent.
Work requiring shared state. Agents that would need to frequently synchronize understanding should remain together.
The verification subagent pattern
One multi-agent pattern that consistently works well across domains is the verification subagent. This is a dedicated agent whose sole responsibility is testing or validating the main agent's work.
It's worth noting that more capable orchestrator models (like Claude Opus 4.5) are increasingly able to evaluate subagent work directly without a separate verification step. However, verification subagents remain valuable when using less capable orchestrators, when verification requires specialized tools, or when you want to enforce explicit verification checkpoints in your workflow.
Verification subagents succeed because they sidestep the telephone game problem. Verification requires minimal context transfer by nature, so a verifier can blackbox-test a system without needing the full history of how it was built.
The main agent completes a unit of work. Before proceeding, it spawns a verification subagent with the artifact to verify, clear success criteria, and tools to perform verification.
The verifier does not need to understand why the artifact was built as it was. It only needs to determine whether the artifact meets the specified criteria.
from anthropic import Anthropic client = Anthropic() class CodingAgent: def implement_feature(self, requirements: str) -> dict: """Main agent implements the feature""" messages = [ {"role": "user", "content": f"Implement: {requirements}"} ] response = client.messages.create( model="claude-sonnet-4-5", max_tokens=4096, messages=messages, tools=[read_file, write_file, list_directory] ) return { "code": response.content, "files_changed": extract_files(response) } class VerificationAgent: def verify_implementation(self, requirements: str, files_changed: list) -> dict: """Separate agent verifies the work""" messages = [ {"role": "user", "content": f""" Requirements: {requirements} Files changed: {files_changed} Run the test suite and verify: 1. All existing tests pass 2. New functionality works as specified 3. No obvious errors or security issues You MUST run the complete test suite before marking as passed. Do not mark as passing after only running a few tests. Run: pytest --verbose Only mark as PASSED if ALL tests pass with no failures. """} ] response = client.messages.create( model="claude-sonnet-4-5", max_tokens=4096, messages=messages, tools=[run_tests, execute_code, read_file] ) return { "passed": extract_pass_fail(response), "issues": extract_issues(response) } def implement_with_verification(requirements: str, max_attempts: int = 3): for attempt in range(max_attempts): result = CodingAgent().implement_feature(requirements) verification = VerificationAgent().verify_implementation( requirements, result['files_changed'] ) if verification['passed']: return result requirements += f"\n\nPrevious attempt failed: {verification['issues']}" raise Exception(f"Failed verification after {max_attempts} attempts")
Verification subagents are effective for:
Quality assurance. Running test suites, linting code, validating outputs against schemas.
Compliance checking. Verifying documents meet policy requirements, checking outputs against rules.
Output validation. Confirming generated content meets specifications before delivery.
Factual verification. Having a separate agent verify claims or citations in generated content.
The early victory problem
The most significant failure mode for verification subagents is marking outputs as passing without thorough testing. The verifier runs one or two tests, observes them pass, and declares success.
Mitigation strategies include:
Concrete criteria. Specify "Run the full test suite and report all failures" rather than "make sure it works."
Comprehensive checks. Require the verifier to test multiple scenarios and edge cases.
Negative tests. Direct the verifier to attempt inputs that should fail and confirm they do.
Explicit instructions. The instruction "You MUST run the complete test suite before marking as passed" is essential. Without explicit requirements for comprehensive validation, verification agents take shortcuts.
Multi-agent systems are powerful, but not universally appropriate. Before adding the complexity of multiple coordinated agents, confirm that:
Genuine constraints exist that multi-agent solves, such as context limits, parallelization opportunities, or need for specialization.
Decomposition follows context, not problem type. Group work by what context it requires, not by what kind of work it is.
Clear verification points exist where subagents can validate work without requiring full context.
Our advice? Start with the simplest approach that works, and add complexity only when evidence supports it.
This is the first in a series of posts on multi-agent systems. For more on single-agent patterns, see Building effective agents. For context management strategies, see Effective context engineering for AI agents. For a deep dive into how we built our multi-agent research system, see How we built our multi-agent research system.
Acknowledgements
Written by Cara Phillips, with contributions from Paul Chen, Andy Schumeister, Brad Abrams, and Theo Chu.
PrevPrev0/5NextNexteBook
Explore more product news and best practices for teams building with Claude.
Cowork and plugins for finance
Enterprise AICowork and plugins for finance Cowork and plugins for finance Cowork and plugins for finance Cowork and plugins for finance Feb 24, 2026Cowork and plugins for teams across the enterprise
AgentsCowork and plugins for teams across the enterpriseCowork and plugins for teams across the enterpriseCowork and plugins for teams across the enterpriseCowork and plugins for teams across the enterprise Jan 12, 2026Cowork: Claude Code for the rest of your work
Product announcementsCowork: Claude Code for the rest of your workCowork: Claude Code for the rest of your workCowork: Claude Code for the rest of your workCowork: Claude Code for the rest of your work Nov 10, 2025Best practices for prompt engineering
AgentsBest practices for prompt engineeringBest practices for prompt engineeringBest practices for prompt engineeringBest practices for prompt engineeringTransform how your organization operates with Claude
Get the developer newsletter
Product updates, how-tos, community spotlights, and more. Delivered monthly to your inbox.
SubscribeSubscribePlease provide your email address if you'd like to receive our monthly developer newsletter. You can unsubscribe at any time.
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み