Orchestrator Pattern | Agentic Engineering

A main coordinator invokes specialized sub-agents, synthesizes their outputs, and manages workflow transitions. Hub-and-spoke architecture for multi-agent systems.

Core Structure

Orchestrator (Main Coordinator)
├── Phase 1: Scout Agent (read-only exploration)
├── Phase 2: Planning Council (parallel domain experts)
│   ├── Architecture Expert
│   ├── Testing Expert
│   ├── Security Expert
│   └── ... (domain-specific)
├── Phase 3: Build Agents (parallel batches, dependency-aware)
├── Phase 4: Review Panel (parallel experts + meta-checks)
└── Phase 5: Validation (execution)

Key Mechanisms

Single-Message Parallelism

All parallel agents must be invoked in one message for true concurrency. Sequential messages serialize execution.

In a single message, make multiple Task tool calls:
- Task(subagent_type="build-agent", prompt="[spec for file1]")
- Task(subagent_type="build-agent", prompt="[spec for file2]")
- Task(subagent_type="build-agent", prompt="[spec for file3]")

This is the critical insight: parallelism is achieved at the message level, not the agent level.

[2025-12-09]: This is the make-or-break implementation detail that most practitioners miss. If you invoke three Task tools across three separate messages, they execute sequentially—not in parallel. The orchestrator must emit all parallel Task calls in a single response. This explains why many "parallel" multi-agent systems actually run sequentially: developers assume agents will run concurrently by default, but the framework requires explicit single-message invocation. When debugging performance issues in multi-agent systems, check single-message parallelism first.

How you scale multi-agent work: Spawn all independent agents in a single message rather than sequential messages. This isn't just about orchestrators—it's the fundamental pattern for parallelizing any multi-agent work. Three Task calls in one message execute concurrently. Three Task calls across three messages execute sequentially. The difference compounds: 10 agents in parallel complete in roughly the same wall-clock time as 1 agent; 10 agents serialized take 10× longer.

Sources: Anthropic: How we built our multi-agent research system, Subagents - Claude Code Docs

SDK Orchestration vs. Model-Native Swarm

[2026-01-30]: The orchestration patterns described here assume SDK-level coordination—external code or tools (Task, LangGraph, AutoGen) manage agent spawning and result synthesis. Kimi K2.5 introduced an alternative: model-native swarm orchestration.

Key distinction: SDK orchestration uses explicit tool calls to spawn subagents. The orchestrator invokes Task tools, waits for responses, and synthesizes results through framework code. Model-native swarm embeds orchestration within the model's reasoning—the model decides when to parallelize, spawns subagents internally, and coordinates execution through trained behavior rather than prompted instructions.

Trade-off: SDK orchestration provides explicit control and traceable coordination logic. Model-native swarm offers autonomous parallelization and potentially lower coordination overhead, but reduces visibility into orchestration decisions and couples workflows to specific model families.

See: Multi-Model Architectures: Model-Native Swarm Orchestration for detailed comparison, including PARL training approach, Critical Steps metric, and performance characteristics (100+ subagents, 3-4.5× speedup).

Dependency-Aware Batching

For files with dependencies:

Batch 1: Files with no dependencies (parallel)
Batch 2: Files depending on Batch 1 (wait, then parallel)
Batch 3: Files depending on Batch 2
etc.

Analyze the dependency graph, group into batches, parallelize within each batch.

Spec File as Shared Context

A single artifact (spec file) flows through all phases:

Scout outputs exploration findings
Plan phase creates docs/specs/<name>.md with full context
Build agents read spec file for implementation details
Review agents reference spec file for compliance checking

This avoids passing massive context between agents—instead, they read from a shared artifact.

Phase Gating

Mandatory prerequisites before transitions:

scout → plan: Pass exploration findings
plan → build: Spec file must exist (verify with test -f)
build → review: Build must complete successfully
review → validate: Always run validation after review

If a prerequisite fails, halt and provide remediation instructions.

Context Passing

Transition	What Flows
Scout → Plan	File locations, patterns, dependencies
Plan → Build	Spec file path, architecture patterns
Build → Review	Commit hashes, changed file paths
Review → Validate	Validation scope (based on review findings)

Each agent receives complete context—they're stateless and assume nothing from prior calls.

Context Isolation via Sub-Agents

[2025-12-10]: The primary rationale for delegation isn't just parallelism—it's context hygiene. Each sub-agent gets a fresh context window for its specialized task, preventing pollution of the orchestrator's decision-making context.

Why Context Isolation Matters

When an orchestrator needs to search files, grep patterns, or summarize code, doing this work directly fills its context window with raw data:

File listings with hundreds of paths
Grep results with dozens of matching lines
Full file contents for analysis

This raw data crowds out the orchestrator's primary job: workflow coordination and decision synthesis.

The Pattern

Deploy fresh context windows for search operations:

Orchestrator (clean context):
├─ Task → Scout: "Find all Python files in src/"
│         Scout context: file paths, directory structure
│         Scout returns: "Found 47 Python files, organized in 3 modules..."
│
├─ Task → Analyzer: "Summarize authentication flow"
│         Analyzer context: auth files, dependencies
│         Analyzer returns: "Authentication uses OAuth2 with..."
│
└─ Orchestrator synthesizes summaries into decisions
    (never saw raw file listings or grep output)

Sub-agents return synthesized summaries, not raw data:

Scout returns: "Found 47 files in 3 modules" (not 47 file paths)
Grep agent returns: "Pattern appears in 12 locations, primarily in validation layer" (not 12 raw matches)
Code analyzer returns: "Authentication flow uses OAuth2 with custom middleware" (not full code dump)

The Trade-off

Aspect	Direct Execution	Sub-Agent Delegation
Token cost	Lower (single context)	Higher (multiple contexts)
Context clarity	Polluted with raw data	Clean, summary-only
Decision quality	Degraded by noise	Focused on synthesis
Parallelism	Sequential	Concurrent

When the trade-off favors delegation:

Orchestrator needs to make complex decisions based on synthesized information
Search/analysis operations produce large intermediate results
Multiple independent investigations can run in parallel
Workflow spans multiple phases requiring clean transitions

When direct execution works:

Simple, single-phase tasks
Orchestrator needs raw data for decision-making
Token budget is constrained
No subsequent decision synthesis needed

Context Hygiene Best Practices

Prompt sub-agents for synthesis: "Summarize findings in 3-5 bullet points" rather than "return all matches"
Filter before reporting: Sub-agents should grep/analyze/filter, then report insights—not raw output
Keep orchestrator context minimal: Only workflow state, phase transitions, and synthesized findings
Avoid echo chambers: Don't pass large sub-agent outputs as-is to other sub-agents—summarize first

This Is Why Orchestrators Delegate

The orchestrator pattern isn't just about parallelism—it's about maintaining clean separation between:

Discovery (file finding, pattern searching, code reading) — sub-agent contexts
Synthesis (decision-making, workflow coordination) — orchestrator context

Each sub-agent pollutes its own context with raw data, then returns clean summaries. The orchestrator never sees the mess.

Mental model: Sub-agents are expensive, disposable context buffers. They absorb the noise so the orchestrator can think clearly.

Expert Synthesis

When multiple experts analyze in parallel:

Collect structured outputs from each expert
Identify cross-cutting concerns (mentioned by 2+ experts)
Synthesize into unified recommendations
Create priority actions and risk assessment

The orchestrator is responsible for synthesis—individual experts stay focused on their domain.

Error Handling

Graceful Degradation

If an expert fails, note the failure and continue with available analyses
Recommend manual review for failed expert domains
Include recovery instructions in output

Partial Success

Commit successful changes before reporting failures
Allow selective retry via phases parameter
Never leave the workflow in an inconsistent state

When to Use This Pattern

Good Fit

Multi-concern workflows requiring parallel analysis:

Complex tasks spanning multiple domains (architecture, security, testing, performance)
Tasks that benefit from parallel execution (multiple independent analyses or builds)
Workflows requiring explicit phase gates and artifacts for review
Need for context isolation between exploration and decision-making

Indicators you need this pattern:

Single agent's context window fills with search/analysis data
Multiple independent tasks can run concurrently
Workflow has clear phase transitions with checkpoints
Need to synthesize insights from multiple domain experts

Poor Fit

Simple or tightly-coupled tasks:

Single-file changes with straightforward requirements
Tasks where coordination overhead exceeds parallelism benefit
Workflows requiring tight real-time interaction between agents
When context pollution isn't a concern (simple, single-phase tasks)

Anti-Patterns Discovered

Missing spec path validation: Build phase must verify spec file exists before proceeding
Implicit environment assumptions: Always detect and report environment in validation
Incomplete meta-reviews: Reviews must include hygiene checks (commits, labels, linked issues)
Vague risk assessment: Must be concrete with mitigation strategies, not generic warnings

Questions This Raises

How do you decide the right number of parallel agents? (Resource constraints vs. diminishing returns)
When does the spec file become a bottleneck vs. a coordination aid?
How do you handle agents that produce conflicting recommendations?
What's the minimum viable orchestrator? (Probably: scout → build → validate)

Capability Minimization

[2025-12-09]: Orchestrators work better when their subagents have intentionally restricted capabilities. This isn't just security—it's an architectural forcing function.

Why Restrict Tools

Reduces context overhead: An agent with 3 tools maintains smaller context than one with 20
Forces delegation: A read-only scout cannot implement—it must report findings for others to act on
Enables parallelization: Agents with minimal scope can run more instances simultaneously
Clarifies responsibility: Tool restrictions make the agent's role unambiguous

Tool Restriction Patterns

Agent Role	Tools	Rationale
Scout	Read, Glob, Grep	Cannot modify—forces reporting back
Builder	Write, Edit, Read, Bash	Focused on implementation
Reviewer	Read, Grep, Bash (tests only)	Cannot fix what they find
Validator	Bash (run only), Read	Executes, doesn't implement

Scope Restriction Beyond Tools

Tool restriction is only half the pattern. Scope restriction achieves the same goal through workflow design:

One file per builder: Each build-agent handles exactly one file, even though it could write many
One domain per expert: Security expert doesn't comment on testing, even though it could
Orchestrator as spec writer: Primary job becomes packaging comprehensive context, not doing work

A common pattern in multi-agent systems:

"Never guess or assume context: Each build-agent needs comprehensive instructions as if they are new engineers"

This forces the orchestrator to be explicit about what each subagent needs, keeping each agent's context minimal and focused.

The Meta-Principle

Default Claude Code behavior is to inherit all parent tools. The deliberate choice to restrict tools signals architectural intent:

# Inherits everything (default)
tools: # omit entirely
 
# Read-only analysis specialist
tools: Read, Glob, Grep
 
# Implementation specialist
tools: Write, Edit, Read, Bash

When reviewing an agent definition, ask: "What can this agent NOT do, and is that intentional?"

SDK-Level vs CLI-Level Enforcement

[2025-12-09]: True HEAD vs subagent tool differentiation requires SDK-level enforcement, not CLI configuration.

CLI tools like Claude Code apply tool restrictions uniformly—the HEAD agent and all subagents share the same allowed tools set. This is a known limitation. If you configure allowedTools in settings.json, those restrictions apply everywhere.

SDK-level orchestration solves this by passing different allowed_tools arrays when spawning each agent:

# Orchestrator: management tools only, no implementation
orchestrator_tools = [
    "mcp__mgmt__create_agent",
    "mcp__mgmt__command_agent",
    "Read", "Bash"  # info-gathering only
]
# Excluded: Write, Edit, WebFetch, Task
 
# Build subagent: implementation tools
builder_tools = ["Write", "Read", "Edit", "Bash", "Glob", "Grep"]

The pattern emerges across three mechanisms:

Technical: allowed_tools allowlist passed to SDK when spawning agents
Behavioral: System prompts reinforce "let subagents do the heavy lifting"
Architectural: Orchestrator gets management/coordination tools, not implementation tools

This explains why sophisticated multi-agent systems often build custom orchestration layers on top of the Claude Agent SDK rather than relying solely on CLI tools. The SDK gives you the primitives; CLI tools give you convenience with less granularity.

Source: agenticengineer.com — Production orchestrator implementation demonstrating SDK-level tool restriction

Tool Restriction as Coordination Forcing Function

[2025-12-09]: Tool restriction isn't just about limiting what agents can do—it's about enabling coordination patterns through deliberate capability differentiation.

When an orchestrator uses different tools than its subagents, it creates natural separation of concerns:

Encourages parallel execution: Orchestrator focuses on spawning multiple specialized agents
Maintains separation of concerns: Orchestrator manages workflow, builders implement
Enables different tool sets per role: Orchestrators get management tools, builders get implementation tools
Reduces shortcuts: Clear role definitions guide proper delegation

Note: Tool restriction is enforced through system prompt guidance and agent design, not through technical enforcement mechanisms. The effectiveness comes from clear role definitions and workflow design.

Tool Assignment

Orchestrators should assign different tool sets to different subagent roles. For native tools, this follows least-privilege principles. For MCP tools, the same pattern applies—validators get browser tools, scouts get search tools, etc.

See Tool Use: MCP Tool Declarations for the frontmatter syntax and role-based assignment patterns.

Workflow Primitives

[2025-12-09]: When orchestration patterns become routine, extract them as reusable primitives. Google ADK codifies this with SequentialAgent, ParallelAgent, and LoopAgent—workflow controllers that compose specialized agents without LLM overhead per coordination decision.

The Three Primitives

Primitive	Pattern	When to Use
Sequential	Pipeline—A's output feeds B	Dependent phases, ordered transformations
Parallel	Fan-out/gather—concurrent execution, collected results	Independent analysis, embarrassingly parallel tasks
Loop	Iterate until condition met	Refinement cycles, retry-with-feedback

Why This Matters

Traditional orchestration requires an LLM call to decide "what next?" after each step. For deterministic workflows—where the pattern is always "run A, then B, then C"—that's wasted inference.

Workflow primitives eliminate this overhead:

SequentialAgent knows it runs steps in order
ParallelAgent knows it runs steps concurrently
LoopAgent knows it runs until a condition

The orchestrator LLM focuses on decisions that require reasoning: which primitive to invoke, how to handle failures, when to escalate.

Composition

Primitives compose naturally:

SequentialAgent([
    ParallelAgent([scout_a, scout_b, scout_c]),  # Fan-out exploration
    planning_agent,                               # Synthesize findings
    LoopAgent(build_agent, until=tests_pass),    # Iterate to completion
    ParallelAgent([reviewer_a, reviewer_b])      # Parallel review
])

The meta-principle: when orchestrators themselves become boilerplate, extract them as parameterizable primitives.

Framework Implementations

Google ADK: Native SequentialAgent, ParallelAgent, LoopAgent classes
LangGraph: StateGraph with conditional edges
Claude Code: Single-message parallelism + explicit sequencing (no native primitives)

Claude Code achieves similar patterns through discipline (single-message parallel Task calls) rather than framework primitives. ADK makes the patterns explicit and removes the possibility of accidental serialization.

See Also: Google ADK — Concrete implementation of workflow primitives

Communication Excellence

[2026-01-30]: Orchestration skill research from cc-mirror reveals sophisticated communication patterns that transform user experience from "watching a machine work" to "collaborating with intelligence."

The Conductor Philosophy

Core Identity: "Absorb complexity, radiate simplicity"

Users describe outcomes; orchestrators decompose and coordinate; workers execute with tools. The orchestrator shields users from internal machinery—task graphs, parallel execution, dependency resolution—presenting only progress and results.

Forbidden Vocabulary:

Never expose orchestration mechanics in user-facing communication:

Forbidden	Natural Alternative
"Spawning subagents"	"Breaking this into parallel tracks"
"Task graph analysis"	"Checking a few angles"
"Fan-out pattern"	"Got several threads running"
"Map-reduce phase"	"Pulling it together now"
"Background agent count"	"Early returns looking promising"

Vibe-Reading Adaptation

Orchestrators detect and adapt to user energy:

User State	Signals	Orchestrator Response
Excited	Exclamation marks, rapid messages	Match energy, celebrate wins visibly
Overwhelmed	Long pauses, vague requests	Simplify, break into smaller steps
Frustrated	Repeated questions, short responses	Direct solutions, skip process exposition
Curious	Detail questions, exploratory tone	Share insights, explain discoveries
Rushed	"quick", "fast", time pressure	Cut ceremony, prioritize completion

Progress Communication Patterns

Maintain engagement without revealing machinery:

Starting:

"Breaking this into parallel tracks"
"Checking a few angles on this"

Working:

"Got several threads running"
"Early returns look promising"

Synthesis:

"Pulling it together now"
"Building on what I found"

Completion: Meaningful celebration with results, not process description. Show findings with file:line references, unexpected discoveries highlighted, connection to user intent explicit.

Generic synthesis (forbidden):

❌ "I analyzed the code"
❌ "Task completed successfully"
✅ "Found SQL injection vulnerability in auth.py line 147"
✅ "Unexpected: Authentication bypasses rate limiting entirely"

Sources: cc-mirror orchestration skill, Communication Guide

Read vs. Delegate Guidelines

[2026-01-30]: Research from production orchestrators reveals clear thresholds for when orchestrators should read directly versus delegating to agents.

Mandatory Direct Reads

Always read directly, never delegate:

Skill references - Core orchestration knowledge
Domain guides - Project-specific standards
Agent output files - Results from completed subagents (for synthesis)

These are coordination context, not search operations. Delegating them breaks orchestrator reasoning.

The 1-2 File Threshold

Orchestrator reads directly (1-2 files):

Quick index lookups
Small configuration files
Single-file verification
Specification documents

Delegate to agents (3+ files):

Codebase exploration
Multiple source file analysis
Deep documentation review
Pattern searching across repository

Why This Threshold Matters

Aspect	Direct Read (1-2 files)	Delegate (3+ files)
Context cost	Minimal overhead	High if done directly
Parallelism	Serial bottleneck	Concurrent execution
Orchestrator focus	Maintains synthesis capacity	Preserves decision-making clarity
Total latency	Faster for small scope	Faster for large scope

Delegation Rationale

Orchestrators coordinate; workers execute. When file reading becomes exploration rather than reference lookup, delegation:

Frees orchestrator context for synthesis
Enables parallel investigation
Returns summaries instead of raw data
Maintains clean separation of concerns

Anti-pattern: Orchestrator reads 15 files to understand a module, fills context with code, then struggles to synthesize findings. Pattern: Orchestrator spawns analyze-module agent, receives "Module uses OAuth2 with custom middleware in 3 layers" summary, decides next action with clean context.

Sources: cc-mirror orchestration patterns, Tool ownership guidelines

Background Execution Mechanics

[2026-01-30]: TeammateTool and orchestration skill research document background execution as fundamental to parallelism, not a feature toggle.

Default to Background

Always use run_in_background=True when spawning agents. This is the default in production orchestrators and should be the default mental model.

Why background-first:

Enables true parallelism (multiple agents executing concurrently)
Orchestrator continues coordination work while agents execute
Automatic completion notifications prevent polling overhead
Foreground execution serializes work (one agent at a time)

Non-Blocking vs. Blocking Checks

Non-blocking check (block=False):

TaskOutput(task_id="task-123", block=False)
→ Returns current progress or "still running"

Use for status updates while orchestrator continues other work.

Blocking wait (block=True):

TaskOutput(task_id="task-123", block=True)
→ Waits for completion, returns full results

Use only when results immediately needed for next decision.

Notification Handling

Background agents automatically notify orchestrator on completion. No polling required.

Workflow:

Orchestrator spawns agents with run_in_background=True
Orchestrator continues coordination work (spawn more agents, synthesize partial results)
Agents complete and send notifications
Orchestrator processes notifications sequentially
Orchestrator fetches results via TaskOutput when synthesis begins

Trade-offs

Pattern	Parallelism	Orchestrator Throughput	Use Case
Background (default)	High	High	Multi-agent workflows
Foreground	None	Blocked	Simple single-agent delegation

The mental model: Background execution is not about "running things later"—it's about enabling the orchestrator to do multiple things concurrently. The orchestrator becomes a coordination hub, not a sequential task runner.

Sources: cc-mirror orchestration skill, TeammateTool documentation

Pattern Composition

[2026-01-30]: Real-world orchestration combines fundamental patterns into complex workflows. Research from production orchestrators documents common compositions.

PR Review (Fan-Out + Map-Reduce)

Structure:

1. Read PR metadata (orchestrator, 1 file)
2. Fan-Out: Spawn 3 parallel reviewers (single message)
   - security-reviewer (Opus): Check vulnerabilities
   - performance-auditor (Sonnet): Analyze efficiency
   - code-quality-checker (Sonnet): Review style/patterns
3. Continue: Load domain guide for PR standards
4. Receive completion notifications (3 agents)
5. Map-Reduce: Synthesize findings into unified review
6. Deliver: Prioritized issues with severity levels

User experience:

"Got several threads running on this review..."

[30 seconds later]

"Interesting findings across security, performance, and code quality:

🔴 Critical: SQL injection vulnerability in auth.py line 147
🟡 Performance: N+1 query pattern in posts.controller (3 locations)
🟢 Quality: Consider extracting validation logic to service layer

The SQL issue needs immediate attention before merge."

Feature Implementation (Pipeline + Fan-Out + Background)

Structure:

1. Clarify via AskUserQuestion (4×4 rich questions)
2. Pipeline Phase 1: Research (Haiku)
   - Find existing theme code
   - Check component library support
3. Pipeline Phase 2: Plan (Opus)
   - Design architecture
   - Identify component changes
4. Fan-Out: Implement (Sonnet, 4 agents parallel)
   - Agent A: Theme provider setup
   - Agent B: Component updates
   - Agent C: Storage logic
   - Agent D: Toggle UI component
5. Pipeline Phase 3: Integration (Sonnet)
6. Background: Run tests while continuing

Key composition: Pipeline for dependent phases, Fan-Out for independent work, Background for long-running validation.

Bug Diagnosis (Fan-Out + Pipeline)

Structure:

1. Fan-Out: Parallel investigation (Haiku, 3 agents)
   - Agent A: Analyze error logs
   - Agent B: Trace code flow
   - Agent C: Review system monitoring
2. Synthesis: Identify pattern (orchestrator)
3. Pipeline: Sequential fix (Sonnet)
   - Implement connection pooling
   - Add retry logic
   - Update error handling
4. Background: Run integration tests
5. Verification (Haiku)

Pattern insight: Start broad (fan-out exploration), narrow to root cause (synthesis), apply fix sequentially (pipeline dependencies), validate in background.

Sources: cc-mirror orchestration examples, Orchestration patterns documentation