A main coordinator invokes specialized sub-agents, synthesizes their outputs, and manages workflow transitions. Hub-and-spoke architecture for multi-agent systems.
Core Structure
Orchestrator (Main Coordinator)
├── Phase 1: Scout Agent (read-only exploration)
├── Phase 2: Planning Council (parallel domain experts)
│ ├── Architecture Expert
│ ├── Testing Expert
│ ├── Security Expert
│ └── ... (domain-specific)
├── Phase 3: Build Agents (parallel batches, dependency-aware)
├── Phase 4: Review Panel (parallel experts + meta-checks)
└── Phase 5: Validation (execution)
Key Mechanisms
Single-Message Parallelism
All parallel agents must be invoked in one message for true concurrency. Sequential messages serialize execution.
In a single message, make multiple Task tool calls:
- Task(subagent_type="build-agent", prompt="[spec for file1]")
- Task(subagent_type="build-agent", prompt="[spec for file2]")
- Task(subagent_type="build-agent", prompt="[spec for file3]")
This is the critical insight: parallelism is achieved at the message level, not the agent level.
[2025-12-09]: This is the make-or-break implementation detail that most practitioners miss. If you invoke three Task tools across three separate messages, they execute sequentially—not in parallel. The orchestrator must emit all parallel Task calls in a single response. This explains why many "parallel" multi-agent systems actually run sequentially: developers assume agents will run concurrently by default, but the framework requires explicit single-message invocation. When debugging performance issues in multi-agent systems, check single-message parallelism first.
How you scale multi-agent work: Spawn all independent agents in a single message rather than sequential messages. This isn't just about orchestrators—it's the fundamental pattern for parallelizing any multi-agent work. Three Task calls in one message execute concurrently. Three Task calls across three messages execute sequentially. The difference compounds: 10 agents in parallel complete in roughly the same wall-clock time as 1 agent; 10 agents serialized take 10× longer.
Sources: Anthropic: How we built our multi-agent research system, Subagents - Claude Code Docs
SDK Orchestration vs. Model-Native Swarm
[2026-01-30]: The orchestration patterns described here assume SDK-level coordination—external code or tools (Task, LangGraph, AutoGen) manage agent spawning and result synthesis. Kimi K2.5 introduced an alternative: model-native swarm orchestration.
Key distinction: SDK orchestration uses explicit tool calls to spawn subagents. The orchestrator invokes Task tools, waits for responses, and synthesizes results through framework code. Model-native swarm embeds orchestration within the model's reasoning—the model decides when to parallelize, spawns subagents internally, and coordinates execution through trained behavior rather than prompted instructions.
Trade-off: SDK orchestration provides explicit control and traceable coordination logic. Model-native swarm offers autonomous parallelization and potentially lower coordination overhead, but reduces visibility into orchestration decisions and couples workflows to specific model families.
See: Multi-Model Architectures: Model-Native Swarm Orchestration for detailed comparison, including PARL training approach, Critical Steps metric, and performance characteristics (100+ subagents, 3-4.5× speedup).
Dependency-Aware Batching
For files with dependencies:
- Batch 1: Files with no dependencies (parallel)
- Batch 2: Files depending on Batch 1 (wait, then parallel)
- Batch 3: Files depending on Batch 2
- etc.
Analyze the dependency graph, group into batches, parallelize within each batch.
Spec File as Shared Context
A single artifact (spec file) flows through all phases:
- Scout outputs exploration findings
- Plan phase creates
docs/specs/<name>.mdwith full context - Build agents read spec file for implementation details
- Review agents reference spec file for compliance checking
This avoids passing massive context between agents—instead, they read from a shared artifact.
Phase Gating
Mandatory prerequisites before transitions:
- scout → plan: Pass exploration findings
- plan → build: Spec file must exist (verify with
test -f) - build → review: Build must complete successfully
- review → validate: Always run validation after review
If a prerequisite fails, halt and provide remediation instructions.
Context Passing
| Transition | What Flows |
|---|---|
| Scout → Plan | File locations, patterns, dependencies |
| Plan → Build | Spec file path, architecture patterns |
| Build → Review | Commit hashes, changed file paths |
| Review → Validate | Validation scope (based on review findings) |
Each agent receives complete context—they're stateless and assume nothing from prior calls.
Context Isolation via Sub-Agents
[2025-12-10]: The primary rationale for delegation isn't just parallelism—it's context hygiene. Each sub-agent gets a fresh context window for its specialized task, preventing pollution of the orchestrator's decision-making context.
Why Context Isolation Matters
When an orchestrator needs to search files, grep patterns, or summarize code, doing this work directly fills its context window with raw data:
- File listings with hundreds of paths
- Grep results with dozens of matching lines
- Full file contents for analysis
This raw data crowds out the orchestrator's primary job: workflow coordination and decision synthesis.
The Pattern
Deploy fresh context windows for search operations:
Orchestrator (clean context):
├─ Task → Scout: "Find all Python files in src/"
│ Scout context: file paths, directory structure
│ Scout returns: "Found 47 Python files, organized in 3 modules..."
│
├─ Task → Analyzer: "Summarize authentication flow"
│ Analyzer context: auth files, dependencies
│ Analyzer returns: "Authentication uses OAuth2 with..."
│
└─ Orchestrator synthesizes summaries into decisions
(never saw raw file listings or grep output)
Sub-agents return synthesized summaries, not raw data:
- Scout returns: "Found 47 files in 3 modules" (not 47 file paths)
- Grep agent returns: "Pattern appears in 12 locations, primarily in validation layer" (not 12 raw matches)
- Code analyzer returns: "Authentication flow uses OAuth2 with custom middleware" (not full code dump)
The Trade-off
| Aspect | Direct Execution | Sub-Agent Delegation |
|---|---|---|
| Token cost | Lower (single context) | Higher (multiple contexts) |
| Context clarity | Polluted with raw data | Clean, summary-only |
| Decision quality | Degraded by noise | Focused on synthesis |
| Parallelism | Sequential | Concurrent |
When the trade-off favors delegation:
- Orchestrator needs to make complex decisions based on synthesized information
- Search/analysis operations produce large intermediate results
- Multiple independent investigations can run in parallel
- Workflow spans multiple phases requiring clean transitions
When direct execution works:
- Simple, single-phase tasks
- Orchestrator needs raw data for decision-making
- Token budget is constrained
- No subsequent decision synthesis needed
Context Hygiene Best Practices
- Prompt sub-agents for synthesis: "Summarize findings in 3-5 bullet points" rather than "return all matches"
- Filter before reporting: Sub-agents should grep/analyze/filter, then report insights—not raw output
- Keep orchestrator context minimal: Only workflow state, phase transitions, and synthesized findings
- Avoid echo chambers: Don't pass large sub-agent outputs as-is to other sub-agents—summarize first
This Is Why Orchestrators Delegate
The orchestrator pattern isn't just about parallelism—it's about maintaining clean separation between:
- Discovery (file finding, pattern searching, code reading) — sub-agent contexts
- Synthesis (decision-making, workflow coordination) — orchestrator context
Each sub-agent pollutes its own context with raw data, then returns clean summaries. The orchestrator never sees the mess.
Mental model: Sub-agents are expensive, disposable context buffers. They absorb the noise so the orchestrator can think clearly.
Expert Synthesis
When multiple experts analyze in parallel:
- Collect structured outputs from each expert
- Identify cross-cutting concerns (mentioned by 2+ experts)
- Synthesize into unified recommendations
- Create priority actions and risk assessment
The orchestrator is responsible for synthesis—individual experts stay focused on their domain.
Error Handling
Graceful Degradation
- If an expert fails, note the failure and continue with available analyses
- Recommend manual review for failed expert domains
- Include recovery instructions in output
Partial Success
- Commit successful changes before reporting failures
- Allow selective retry via phases parameter
- Never leave the workflow in an inconsistent state
When to Use This Pattern
Good Fit
Multi-concern workflows requiring parallel analysis:
- Complex tasks spanning multiple domains (architecture, security, testing, performance)
- Tasks that benefit from parallel execution (multiple independent analyses or builds)
- Workflows requiring explicit phase gates and artifacts for review
- Need for context isolation between exploration and decision-making
Indicators you need this pattern:
- Single agent's context window fills with search/analysis data
- Multiple independent tasks can run concurrently
- Workflow has clear phase transitions with checkpoints
- Need to synthesize insights from multiple domain experts
Poor Fit
Simple or tightly-coupled tasks:
- Single-file changes with straightforward requirements
- Tasks where coordination overhead exceeds parallelism benefit
- Workflows requiring tight real-time interaction between agents
- When context pollution isn't a concern (simple, single-phase tasks)
Anti-Patterns Discovered
- Missing spec path validation: Build phase must verify spec file exists before proceeding
- Implicit environment assumptions: Always detect and report environment in validation
- Incomplete meta-reviews: Reviews must include hygiene checks (commits, labels, linked issues)
- Vague risk assessment: Must be concrete with mitigation strategies, not generic warnings
Questions This Raises
- How do you decide the right number of parallel agents? (Resource constraints vs. diminishing returns)
- When does the spec file become a bottleneck vs. a coordination aid?
- How do you handle agents that produce conflicting recommendations?
- What's the minimum viable orchestrator? (Probably: scout → build → validate)
Capability Minimization
[2025-12-09]: Orchestrators work better when their subagents have intentionally restricted capabilities. This isn't just security—it's an architectural forcing function.
Why Restrict Tools
- Reduces context overhead: An agent with 3 tools maintains smaller context than one with 20
- Forces delegation: A read-only scout cannot implement—it must report findings for others to act on
- Enables parallelization: Agents with minimal scope can run more instances simultaneously
- Clarifies responsibility: Tool restrictions make the agent's role unambiguous
Tool Restriction Patterns
| Agent Role | Tools | Rationale |
|---|---|---|
| Scout | Read, Glob, Grep | Cannot modify—forces reporting back |
| Builder | Write, Edit, Read, Bash | Focused on implementation |
| Reviewer | Read, Grep, Bash (tests only) | Cannot fix what they find |
| Validator | Bash (run only), Read | Executes, doesn't implement |
Scope Restriction Beyond Tools
Tool restriction is only half the pattern. Scope restriction achieves the same goal through workflow design:
- One file per builder: Each build-agent handles exactly one file, even though it could write many
- One domain per expert: Security expert doesn't comment on testing, even though it could
- Orchestrator as spec writer: Primary job becomes packaging comprehensive context, not doing work
A common pattern in multi-agent systems:
"Never guess or assume context: Each build-agent needs comprehensive instructions as if they are new engineers"
This forces the orchestrator to be explicit about what each subagent needs, keeping each agent's context minimal and focused.
The Meta-Principle
Default Claude Code behavior is to inherit all parent tools. The deliberate choice to restrict tools signals architectural intent:
# Inherits everything (default)
tools: # omit entirely
# Read-only analysis specialist
tools: Read, Glob, Grep
# Implementation specialist
tools: Write, Edit, Read, BashWhen reviewing an agent definition, ask: "What can this agent NOT do, and is that intentional?"
SDK-Level vs CLI-Level Enforcement
[2025-12-09]: True HEAD vs subagent tool differentiation requires SDK-level enforcement, not CLI configuration.
CLI tools like Claude Code apply tool restrictions uniformly—the HEAD agent and all subagents share the same allowed tools set. This is a known limitation. If you configure allowedTools in settings.json, those restrictions apply everywhere.
SDK-level orchestration solves this by passing different allowed_tools arrays when spawning each agent:
# Orchestrator: management tools only, no implementation
orchestrator_tools = [
"mcp__mgmt__create_agent",
"mcp__mgmt__command_agent",
"Read", "Bash" # info-gathering only
]
# Excluded: Write, Edit, WebFetch, Task
# Build subagent: implementation tools
builder_tools = ["Write", "Read", "Edit", "Bash", "Glob", "Grep"]The pattern emerges across three mechanisms:
- Technical:
allowed_toolsallowlist passed to SDK when spawning agents - Behavioral: System prompts reinforce "let subagents do the heavy lifting"
- Architectural: Orchestrator gets management/coordination tools, not implementation tools
This explains why sophisticated multi-agent systems often build custom orchestration layers on top of the Claude Agent SDK rather than relying solely on CLI tools. The SDK gives you the primitives; CLI tools give you convenience with less granularity.
Source: agenticengineer.com — Production orchestrator implementation demonstrating SDK-level tool restriction
Tool Restriction as Coordination Forcing Function
[2025-12-09]: Tool restriction isn't just about limiting what agents can do—it's about enabling coordination patterns through deliberate capability differentiation.
When an orchestrator uses different tools than its subagents, it creates natural separation of concerns:
- Encourages parallel execution: Orchestrator focuses on spawning multiple specialized agents
- Maintains separation of concerns: Orchestrator manages workflow, builders implement
- Enables different tool sets per role: Orchestrators get management tools, builders get implementation tools
- Reduces shortcuts: Clear role definitions guide proper delegation
Note: Tool restriction is enforced through system prompt guidance and agent design, not through technical enforcement mechanisms. The effectiveness comes from clear role definitions and workflow design.
Tool Assignment
Orchestrators should assign different tool sets to different subagent roles. For native tools, this follows least-privilege principles. For MCP tools, the same pattern applies—validators get browser tools, scouts get search tools, etc.
See Tool Use: MCP Tool Declarations for the frontmatter syntax and role-based assignment patterns.
Workflow Primitives
[2025-12-09]: When orchestration patterns become routine, extract them as reusable primitives. Google ADK codifies this with SequentialAgent, ParallelAgent, and LoopAgent—workflow controllers that compose specialized agents without LLM overhead per coordination decision.
The Three Primitives
| Primitive | Pattern | When to Use |
|---|---|---|
| Sequential | Pipeline—A's output feeds B | Dependent phases, ordered transformations |
| Parallel | Fan-out/gather—concurrent execution, collected results | Independent analysis, embarrassingly parallel tasks |
| Loop | Iterate until condition met | Refinement cycles, retry-with-feedback |
Why This Matters
Traditional orchestration requires an LLM call to decide "what next?" after each step. For deterministic workflows—where the pattern is always "run A, then B, then C"—that's wasted inference.
Workflow primitives eliminate this overhead:
- SequentialAgent knows it runs steps in order
- ParallelAgent knows it runs steps concurrently
- LoopAgent knows it runs until a condition
The orchestrator LLM focuses on decisions that require reasoning: which primitive to invoke, how to handle failures, when to escalate.
Composition
Primitives compose naturally:
SequentialAgent([
ParallelAgent([scout_a, scout_b, scout_c]), # Fan-out exploration
planning_agent, # Synthesize findings
LoopAgent(build_agent, until=tests_pass), # Iterate to completion
ParallelAgent([reviewer_a, reviewer_b]) # Parallel review
])
The meta-principle: when orchestrators themselves become boilerplate, extract them as parameterizable primitives.
Framework Implementations
- Google ADK: Native SequentialAgent, ParallelAgent, LoopAgent classes
- LangGraph: StateGraph with conditional edges
- Claude Code: Single-message parallelism + explicit sequencing (no native primitives)
Claude Code achieves similar patterns through discipline (single-message parallel Task calls) rather than framework primitives. ADK makes the patterns explicit and removes the possibility of accidental serialization.
See Also: Google ADK — Concrete implementation of workflow primitives
Communication Excellence
[2026-01-30]: Orchestration skill research from cc-mirror reveals sophisticated communication patterns that transform user experience from "watching a machine work" to "collaborating with intelligence."
The Conductor Philosophy
Core Identity: "Absorb complexity, radiate simplicity"
Users describe outcomes; orchestrators decompose and coordinate; workers execute with tools. The orchestrator shields users from internal machinery—task graphs, parallel execution, dependency resolution—presenting only progress and results.
Forbidden Vocabulary:
Never expose orchestration mechanics in user-facing communication:
| Forbidden | Natural Alternative |
|---|---|
| "Spawning subagents" | "Breaking this into parallel tracks" |
| "Task graph analysis" | "Checking a few angles" |
| "Fan-out pattern" | "Got several threads running" |
| "Map-reduce phase" | "Pulling it together now" |
| "Background agent count" | "Early returns looking promising" |
Vibe-Reading Adaptation
Orchestrators detect and adapt to user energy:
| User State | Signals | Orchestrator Response |
|---|---|---|
| Excited | Exclamation marks, rapid messages | Match energy, celebrate wins visibly |
| Overwhelmed | Long pauses, vague requests | Simplify, break into smaller steps |
| Frustrated | Repeated questions, short responses | Direct solutions, skip process exposition |
| Curious | Detail questions, exploratory tone | Share insights, explain discoveries |
| Rushed | "quick", "fast", time pressure | Cut ceremony, prioritize completion |
Progress Communication Patterns
Maintain engagement without revealing machinery:
Starting:
"Breaking this into parallel tracks"
"Checking a few angles on this"
Working:
"Got several threads running"
"Early returns look promising"
Synthesis:
"Pulling it together now"
"Building on what I found"
Completion: Meaningful celebration with results, not process description. Show findings with file:line references, unexpected discoveries highlighted, connection to user intent explicit.
Generic synthesis (forbidden):
- ❌ "I analyzed the code"
- ❌ "Task completed successfully"
- ✅ "Found SQL injection vulnerability in auth.py line 147"
- ✅ "Unexpected: Authentication bypasses rate limiting entirely"
Sources: cc-mirror orchestration skill, Communication Guide
Read vs. Delegate Guidelines
[2026-01-30]: Research from production orchestrators reveals clear thresholds for when orchestrators should read directly versus delegating to agents.
Mandatory Direct Reads
Always read directly, never delegate:
- Skill references - Core orchestration knowledge
- Domain guides - Project-specific standards
- Agent output files - Results from completed subagents (for synthesis)
These are coordination context, not search operations. Delegating them breaks orchestrator reasoning.
The 1-2 File Threshold
Orchestrator reads directly (1-2 files):
- Quick index lookups
- Small configuration files
- Single-file verification
- Specification documents
Delegate to agents (3+ files):
- Codebase exploration
- Multiple source file analysis
- Deep documentation review
- Pattern searching across repository
Why This Threshold Matters
| Aspect | Direct Read (1-2 files) | Delegate (3+ files) |
|---|---|---|
| Context cost | Minimal overhead | High if done directly |
| Parallelism | Serial bottleneck | Concurrent execution |
| Orchestrator focus | Maintains synthesis capacity | Preserves decision-making clarity |
| Total latency | Faster for small scope | Faster for large scope |
Delegation Rationale
Orchestrators coordinate; workers execute. When file reading becomes exploration rather than reference lookup, delegation:
- Frees orchestrator context for synthesis
- Enables parallel investigation
- Returns summaries instead of raw data
- Maintains clean separation of concerns
Anti-pattern: Orchestrator reads 15 files to understand a module, fills context with code, then struggles to synthesize findings. Pattern: Orchestrator spawns analyze-module agent, receives "Module uses OAuth2 with custom middleware in 3 layers" summary, decides next action with clean context.
Sources: cc-mirror orchestration patterns, Tool ownership guidelines
Background Execution Mechanics
[2026-01-30]: TeammateTool and orchestration skill research document background execution as fundamental to parallelism, not a feature toggle.
Default to Background
Always use run_in_background=True when spawning agents. This is the default in production orchestrators and should be the default mental model.
Why background-first:
- Enables true parallelism (multiple agents executing concurrently)
- Orchestrator continues coordination work while agents execute
- Automatic completion notifications prevent polling overhead
- Foreground execution serializes work (one agent at a time)
Non-Blocking vs. Blocking Checks
Non-blocking check (block=False):
TaskOutput(task_id="task-123", block=False)
→ Returns current progress or "still running"
Use for status updates while orchestrator continues other work.
Blocking wait (block=True):
TaskOutput(task_id="task-123", block=True)
→ Waits for completion, returns full results
Use only when results immediately needed for next decision.
Notification Handling
Background agents automatically notify orchestrator on completion. No polling required.
Workflow:
- Orchestrator spawns agents with
run_in_background=True - Orchestrator continues coordination work (spawn more agents, synthesize partial results)
- Agents complete and send notifications
- Orchestrator processes notifications sequentially
- Orchestrator fetches results via TaskOutput when synthesis begins
Trade-offs
| Pattern | Parallelism | Orchestrator Throughput | Use Case |
|---|---|---|---|
| Background (default) | High | High | Multi-agent workflows |
| Foreground | None | Blocked | Simple single-agent delegation |
The mental model: Background execution is not about "running things later"—it's about enabling the orchestrator to do multiple things concurrently. The orchestrator becomes a coordination hub, not a sequential task runner.
Sources: cc-mirror orchestration skill, TeammateTool documentation
Pattern Composition
[2026-01-30]: Real-world orchestration combines fundamental patterns into complex workflows. Research from production orchestrators documents common compositions.
PR Review (Fan-Out + Map-Reduce)
Structure:
1. Read PR metadata (orchestrator, 1 file)
2. Fan-Out: Spawn 3 parallel reviewers (single message)
- security-reviewer (Opus): Check vulnerabilities
- performance-auditor (Sonnet): Analyze efficiency
- code-quality-checker (Sonnet): Review style/patterns
3. Continue: Load domain guide for PR standards
4. Receive completion notifications (3 agents)
5. Map-Reduce: Synthesize findings into unified review
6. Deliver: Prioritized issues with severity levels
User experience:
"Got several threads running on this review..."
[30 seconds later]
"Interesting findings across security, performance, and code quality:
🔴 Critical: SQL injection vulnerability in auth.py line 147
🟡 Performance: N+1 query pattern in posts.controller (3 locations)
🟢 Quality: Consider extracting validation logic to service layer
The SQL issue needs immediate attention before merge."
Feature Implementation (Pipeline + Fan-Out + Background)
Structure:
1. Clarify via AskUserQuestion (4×4 rich questions)
2. Pipeline Phase 1: Research (Haiku)
- Find existing theme code
- Check component library support
3. Pipeline Phase 2: Plan (Opus)
- Design architecture
- Identify component changes
4. Fan-Out: Implement (Sonnet, 4 agents parallel)
- Agent A: Theme provider setup
- Agent B: Component updates
- Agent C: Storage logic
- Agent D: Toggle UI component
5. Pipeline Phase 3: Integration (Sonnet)
6. Background: Run tests while continuing
Key composition: Pipeline for dependent phases, Fan-Out for independent work, Background for long-running validation.
Bug Diagnosis (Fan-Out + Pipeline)
Structure:
1. Fan-Out: Parallel investigation (Haiku, 3 agents)
- Agent A: Analyze error logs
- Agent B: Trace code flow
- Agent C: Review system monitoring
2. Synthesis: Identify pattern (orchestrator)
3. Pipeline: Sequential fix (Sonnet)
- Implement connection pooling
- Add retry logic
- Update error handling
4. Background: Run integration tests
5. Verification (Haiku)
Pattern insight: Start broad (fan-out exploration), narrow to root cause (synthesis), apply fix sequentially (pipeline dependencies), validate in background.
Sources: cc-mirror orchestration examples, Orchestration patterns documentation
See Also
- Plan-Build-Review - simpler version without parallel experts
- Self-Improving Experts - how the domain experts evolve
- Google ADK - framework with native workflow primitives
- Context: Multi-Agent Context Isolation - the foundational context management strategy that makes orchestration viable
- Context Loading Demo - minimal example showing context payload construction and verification layer
- Claude Code: TeammateTool - Native implementation of coordination primitives (spawn, join, broadcast, plan approval) discussed abstractly in this pattern. Hidden feature providing file-based messaging and five coordination patterns (Leader-Worker, Swarm, Pipeline, Council, Plan Approval).
- Tool Design: Rich User Questioning - AskUserQuestion 4×4 pattern for scope clarification before orchestration
- Model Selection: Multi-Agent Strategy - Spawn count economics and pipeline escalation for coordinated agents