Sophisticated patterns for managing context in complex scenarios: progressive disclosure for unlimited expertise, context loading for precise payloads, and the ACE framework for knowledge-intensive domains.
Progressive Disclosure Pattern
[2025-12-09]: Progressive disclosure addresses context window limits by loading information in tiers based on relevance, enabling effectively unlimited expertise within fixed context budgets.
The Pattern: Information loads in three tiers:
- Metadata first — Names, descriptions, summaries (~50-200 characters per item)
- Full content on selection — Complete documentation when explicitly chosen (~500-5,000 words)
- Detailed resources on-demand — Supporting files, source code, references (unbounded)
This creates a semantic index in the initial context, allowing the agent to navigate a vast information space without loading everything upfront.
Concrete Example: Claude Skills
Claude Skills demonstrate this pattern in production:
- Initial load: ~50-200 chars per skill (description and when to use it)
- Activation: 500-5,000 words of expertise per selected skill
- References: Unlimited supporting files via Read tool when skill is active
With 10 skills at 100 chars each, the metadata costs ~1,000 characters. This buys semantic awareness of all available expertise. When a specific skill activates, its full context loads—but only that one, not all ten simultaneously.
Cognitive Parallel
Humans don't memorize encyclopedias. We build indexing systems—file systems, bookmarks, tables of contents—for on-demand retrieval. Progressive disclosure mirrors this: maintain an index in working memory, fetch details when needed.
Contrast with Alternatives
| Approach | Upfront Cost | Discoverability | Capacity |
|---|---|---|---|
| Eager Loading | Massive (tens of thousands of tokens) | Perfect | Limited by context window |
| Lazy Loading | Zero | Poor (agent doesn't know what exists) | Theoretically unlimited |
| Progressive Disclosure | Small (metadata only) | Good (semantic index) | Effectively unlimited |
Anthropic's GitHub MCP integration illustrates the eager loading trap: "tens of thousands of tokens" consumed just to make repositories and issues accessible. Progressive disclosure would load repo names/descriptions first, then fetch specific repos on-demand.
The Trade-off
Slight latency on selection (additional tool call to fetch full content) for dramatic capacity gains. A system with 100 items × 1,000 tokens each costs 100k tokens with eager loading, but only ~5k tokens with progressive disclosure (metadata + one activated item).
When to Use Progressive Disclosure
- Large knowledge bases where most content won't be needed for any single task
- Multi-domain expertise where the agent needs awareness but not full activation
- Tight context budgets where capability breadth is essential but space is limited
- Dynamic capability selection where the agent should choose expertise based on task requirements
When NOT to Use
- Small, static knowledge sets where eager loading costs less than infrastructure
- Guaranteed access patterns where you know exactly which content will be needed
- Latency-critical paths where additional tool calls are unacceptable
- Simple retrieval where a single Read or Grep suffices
Implementation Patterns
- Tool descriptions as metadata layer (Read/Grep as on-demand fetchers)
- Structured indices with
descriptionfields (see MCP Tool Declarations) - Skills systems (see Claude Code: Skills)
Context Loading vs. Context Accumulation
[2025-12-09]: Most LLM interaction patterns treat context as accumulated—chat history grows, tool results append, context fills passively until you hit limits. Context loading flips this: context is curated, deliberately constructed for each call.
The Default Mental Model (Accumulation)
User message → append to context
Tool result → append to context
Agent response → append to context
... context fills until you hit limits
Context Loading Mental Model
For this specific call:
├── Load: base config (always)
├── Load: project context (if relevant)
├── Load: tool definitions (only what this agent needs)
├── Load: query (the specific task)
├── Load: retrieved facts (verified, not raw)
└── Nothing else
The precision is the point. You're not asking "what has accumulated?" You're asking "what does this agent need for this exact call?"
Why This Is Counterintuitive
Standard patterns assume context is a log—append-only, grows over time, summarize when full. Context loading treats context as a payload—constructed fresh, minimal, purpose-built.
This flips the default question:
- Log model: "What can I remove to fit?"
- Payload model: "What must I include to succeed?"
Connection to Small Models
Context loading explains why small models work in orchestrator patterns. Haiku doesn't accumulate—it receives a curated payload (base config, project prompt, tool info, query) and returns a focused result. The orchestrator handles accumulation; scouts receive loads.
See Model Selection: Small Models Are RAG for the context staging breakdown.
When to Use Context Loading
- Multi-agent systems where orchestrators coordinate specialized scouts
- High-precision tasks requiring exact context composition
- Small model deployments where context budgets are tight
- Quality-critical paths where context noise degrades outputs
- Stateless operations where each call should be independent
When Accumulation Works Better
- Conversational interfaces where continuity matters more than precision
- Learning workflows where context should grow with discoveries
- Long-running sessions where recomputing context is expensive
- Debugging scenarios where full interaction history provides diagnostic value
See Also
Context Loading Demo — Working implementation showing orchestrator → scout context staging with optional verification layer. Demonstrates payload construction, verification contract, and token economics.
Open Questions
- Could a verification layer (like KotaDB) fact-check context before loading? Scout A says X, Scout B says Y—verify before the orchestrator loads either.
- What's the contract for verified context? Confidence scores? Source citations? Contradiction flags?
- Does this change agent architecture? Instead of scouts → orchestrator, maybe scouts → verification layer → orchestrator?
Agentic Context Engineering (ACE)
[2025-12-10]: The ACE framework from Stanford/SambaNova challenges a core assumption in agent design: that context should shrink over time. Instead, ACE argues contexts should grow—comprehensive evolving playbooks outperform compressed prompts in complex domains.
The Core Insight
Traditional optimization creates "brevity bias"—the assumption that shorter contexts are better. This leads to "context collapse" where critical learned information gets summarized away. ACE flips this: contexts should expand with learned knowledge, not compress it.
The Tension with Frequent Intentional Compaction
This creates an interesting contrast with frequent intentional compaction. Both approaches reject reactive emergency compaction (waiting until 95% capacity). But they differ in philosophy:
| Approach | Philosophy | When to Use |
|---|---|---|
| Frequent Intentional Compaction | Compress proactively at 40-60% | General-purpose coding, bounded tasks |
| ACE (Growing Contexts) | Expand deliberately with learned patterns | Knowledge-intensive domains, tool-heavy tasks |
The key: both are proactive strategies that beat reactive summarization. Choose based on task type, not as universal defaults.
Three-Role Architecture
ACE organizes agents into three complementary roles:
- Generator — Executes tasks using current playbook
- Reflector — Analyzes outcomes and extracts learnings
- Curator — Evolves the playbook based on reflections
This mirrors software development: execute code (generator), learn from errors (reflector), update documentation (curator). The context is the playbook—a living document that grows more comprehensive over time.
Structured Playbook Format
Instead of prose instructions, ACE uses itemized bullets with metadata:
## Authentication Patterns
- [AUTH-001] Use JWT tokens for stateless sessions
Helpful: 12 | Harmful: 1
- [AUTH-002] Validate tokens on every API call
Helpful: 15 | Harmful: 0
- [AUTH-003] Store refresh tokens in httpOnly cookies
Helpful: 8 | Harmful: 2
Reason harmful: Doesn't work with mobile clientsEach item has an ID for tracking, helpful/harmful counters from feedback, and explanations for anti-patterns. The structure makes it easy to add, update, or remove specific guidance without rewriting entire sections.
Grow-and-Refine Principle
The playbook evolution follows a two-phase cycle:
-
Growth Phase: Add new learnings from reflections
- Don't prune yet—accumulate insights
- Capture both successful patterns and failures
- Tag items with context (which tasks, which tools)
-
Refinement Phase: Semantic deduplication
- Merge redundant items (AUTH-001 + AUTH-012 → AUTH-001-v2)
- Remove contradicted patterns (harmful count exceeds helpful)
- Consolidate related guidance into categories
The key insight: growth then refinement, not growth versus refinement. You need accumulation to see patterns before intelligent compression becomes possible.
When to Use ACE
ACE shines in specific scenarios:
- Knowledge-intensive domains: Medical diagnosis, legal reasoning, scientific analysis where comprehensive playbooks matter
- Complex tool use: Multi-tool workflows (AppWorld benchmark) where learned tool patterns accumulate
- Natural feedback loops: Tasks with clear success/failure signals for helpful/harmful tracking
- Long-running projects: Where context grows across many sessions, not just one
When NOT to Use ACE
- Simple QA: Factual lookup doesn't benefit from playbook evolution
- Fixed-strategy problems: If the approach is deterministic, no learning needed
- Short-lived tasks: Single-session work lacks the horizon for playbook growth
- Unbounded domains: Without natural categories, playbooks become unwieldy
Performance Results
The Stanford/SambaNova paper demonstrates concrete gains:
- +12.5% improvement on AppWorld benchmark (complex multi-tool agent tasks)
- 82.3% latency reduction compared to GEPA (Graph-Enhanced Planning Approach)
- Better sample efficiency: Fewer attempts needed to learn effective patterns
The latency reduction is particularly striking—growing contexts performed faster than compressed ones. The hypothesis: well-structured comprehensive playbooks reduce trial-and-error during execution. The generator doesn't need to rediscover patterns; they're already documented.
Practical Implementation Pattern
A simplified ACE cycle for coding:
## Session Start
Load: Base playbook (accumulated patterns from previous sessions)
## During Task Execution (Generator)
Agent executes using playbook guidance
Logs decisions and outcomes
## After Each Subtask (Reflector)
Analyze: What worked? What didn't?
Extract: New patterns worth capturing
Tag: Which tools, which contexts, which outcomes
## End of Session (Curator)
Review: All extracted patterns
Add: New items to playbook with IDs
Update: Helpful/harmful counts based on outcomes
Merge: Semantically duplicate items
Prune: Contradicted or obsolete guidanceThe playbook grows session-over-session. Early sessions add rapidly; later sessions mostly increment counters and merge duplicates. Over time, you build a comprehensive knowledge base in context, not external to it.
Connection to Other Patterns
ACE complements several existing patterns:
- Persistent State vs. Ephemeral Context: ACE playbooks are persistent state loaded into context. The playbook survives sessions; the working context does not.
- Progressive Disclosure: Playbook categories could use progressive disclosure—load category summaries first, expand specific sections on-demand.
- Multi-Agent Context Isolation: Each agent role (generator/reflector/curator) maintains separate context. Reflector accumulates learnings; curator synthesizes; generator receives refined playbook.
- Context Loading vs. Accumulation: ACE is deliberate accumulation—curated growth, not passive appending.
The Mental Shift
Traditional context management asks: "How do I fit within limits?"
ACE asks: "How do I grow knowledge within structure?"
It's a shift from context as constraint to context as knowledge base. The context window isn't just working memory—it's the accumulated expertise of previous runs. This only works with structure (itemized bullets, IDs, counters) and discipline (grow-then-refine, not append-forever).
Open Questions
- How large can playbooks grow before structure breaks down? Is there a practical limit to itemized guidance?
- Can helpful/harmful counters be tracked automatically via tool success/failure, or do they require human feedback?
- Does ACE work for domains without clear success signals? What replaces helpful/harmful in ambiguous tasks?
- Could reflector/curator roles be automated, or do they need human-in-the-loop validation?
Connections
- To Context Strategies: Frequent Intentional Compaction as complementary compression strategy
- To Multi-Agent Context: How generator/reflector/curator roles maintain separate contexts
- To Tool Use: MCP tool declarations and progressive disclosure via tool metadata
- To Claude Code: Skills implement progressive disclosure in production
Sources
- Effective Context Engineering for AI Agents — Anthropic on progressive disclosure
- Claude Code Skills Documentation — Production implementation
- Simon Willison: Claude Skills — Practitioner perspective
- Agentic Context Engineering: Enhancing AI Agents with Self-Evolving, Structured Contexts — Stanford/SambaNova ACE framework paper