Context Management Strategies | Agentic Engineering

Practical approaches for managing context windows, balancing injection vs. retrieval, and structuring context for optimal performance.

Managing Context Window Limits

Context availability serves as a rough proxy for remaining model capability. A model with 25% context utilization retains approximately 75% capability capacity. At 90% utilization, effective capability drops to roughly 10%.

Current Practice: Boot Fresh Agents

[2025-12-10]: Context compaction via secondary agents is not a viable strategy in current implementations (December 2025). When agents near context limits or output quality degrades, boot a fresh instance rather than attempting salvage through compression.

Tooling support improves this workflow. Claude Code's hook system can automatically log agent actions—reads, writes, tool calls—enabling rapid reconstruction of relevant context for successor agents. This makes "boot fresh" practical even for complex workflows.

When Compaction Makes Sense

See Frequent Intentional Compaction below for the exception: proactive compression at 40-60% utilization as a deliberate quality maintenance strategy, not emergency salvage.

Context Compression and Summarization

Compression works in conversational exchanges where information density matters less than continuity. Technical work like coding demands precision—context compression typically introduces gaps that manifest as "context rot" or "context bloat."

The "one agent, one task" principle applies: if an agent can't complete a task within its context budget, adjust the prompt or scope. Chaining multiple agents to compact and summarize context invites hallucination without guaranteeing quality preservation.

Injection vs. Retrieval Balance

Injection for Priming

Context injection typically occurs at session start to prime agent behavior. Common injection targets include:

Base configuration (CLAUDE.md, project structure)
Recent activity context (git history, previous session summaries)
Domain knowledge (relevant documentation, coding standards)

Retrieval for Discovery

Agent-driven retrieval works well for codebase exploration, particularly in large repositories. The agent discovers relevant files based on task requirements rather than receiving a predetermined payload.

The Balance Point

Codebase scale determines the injection-retrieval balance. Small codebases tolerate more retrieval (agents find relevant files efficiently). Large codebases risk context bloat from excessive exploration—agents read tangential files searching for relevant content.

Open Questions

Areas requiring deeper exploration:

What constitutes optimal "priming" payload for typical coding sessions? CLAUDE.md alone? Recent git history? Documentation excerpts?
When retrieval produces context bloat (irrelevant file reads), can mid-session intervention recover quality, or does it require agent restart?
Does a codebase size threshold exist where injection-heavy approaches become mandatory due to retrieval unreliability?

Structured Context Patterns

Structured context (markdown, JSON, XML) consistently outperforms unstructured prose. Structure provides:

Focus mechanisms — Headers, sections, and delimiters guide attention
Parsing support — Both models and humans parse structured content more reliably
Prompt engineering aid — Templates and schemas assist both human developers and meta-prompting agents

Markdown offers the best balance: human-readable, model-friendly, and widely supported. JSON and XML work for machine-readable payloads where parsing guarantees matter.

Understanding Context Rot

Observable Symptoms

Areas requiring characterization:

Context rot manifests as quality degradation distinct from simple model errors. Symptoms likely include:

Inconsistent outputs despite identical prompts
Drift from original task specifications
Increasing reliance on context window periphery
Degraded reasoning chains compared to early-session outputs

Reversibility Question

Whether context rot can be reversed within a session or requires agent restart remains uncharacterized. Current practice defaults to "boot fresh" when symptoms appear, suggesting low confidence in within-session recovery.

Distinction from Model Confusion

Context rot differs from standard model errors or confusion. Model errors stem from capability limits or ambiguous prompts. Context rot results from degraded working memory—the model's capability remains intact, but its operational context has compromised output quality.

Context Window Percentage Monitoring

[2026-01-17]: Claude Code 2.1.9 introduced real-time context utilization percentage display, transforming context management from guesswork to data-driven decision-making.

Display Format

The session header shows context usage as:

[45K/200K tokens] 22%

This updates in real-time as conversation progresses, accounting for both input and output tokens consumed.

Operational Thresholds

Range	Signal	Recommended Action
0-30%	Healthy	Continue normally
30-60%	Monitor	Good checkpoint for intentional compaction
60-80%	Caution	Consider fresh session if major phase ends
80-95%	Warning	Begin graceful task wrap-up
95%+	Critical	Boot new agent immediately

These thresholds complement the frequent intentional compaction strategy described below. The 30-60% "Monitor" range aligns directly with the 40-60% compaction trigger.

Percentage-Driven Decision Framework

Instead of guessing when context nears capacity:

Set mental alert at 60%: "Should I compact or continue?"
At natural break points: Compact if above 50%
At 80%: Stop accepting new work, finish current task
At 95%: Force new session (no negotiation)

This framework removes ambiguity from the "when to compact" decision. The percentage provides objective data; the thresholds provide clear action triggers.

Multi-Agent Context Tracking

In orchestrated workflows, track per-agent percentages to enable proactive scheduling:

# Example subagent completion output
{
    "agent": "builder-agent",
    "context_used": 156000,
    "context_limit": 200000,
    "percentage": 78,
    "recommendation": "boot-fresh-next-task"
}

This enables capacity-aware task routing:

Route new work to agents with headroom
Reboot agents approaching limits before task assignment
Predict whether an agent can complete a task within remaining capacity

Integration with Hooks

PostToolUse hooks can monitor percentage progression automatically:

def post_tool_use(event):
    percentage = event.get("context_percentage", 0)
    if percentage > 75:
        log_alert(f"Context at {percentage}%—consider compaction")

This provides operational awareness without manual monitoring, surfacing warnings at configurable thresholds.

Practical Application

The percentage display transforms context management from reactive ("the agent seems confused") to proactive ("we're at 65%, compact before the next task").

Combined with the compaction strategies below, percentage monitoring enables:

Evidence-based compaction timing — Compact at 50% rather than guessing
Predictable session planning — Know when you'll need a fresh agent
Quality maintenance — Intervene before degradation, not after

Frequent Intentional Compaction

[2025-12-10]: Most teams compact context reactively—when the agent hits 95% capacity and auto-summarization kicks in as an emergency measure. By then, quality has already degraded. Frequent intentional compaction flips this: compact proactively at 40-60% utilization to maintain quality, not salvage it.

The Pattern: Instead of waiting for context limits to force compression, compact deliberately and frequently throughout a session. Target 40-60% context utilization as your compaction trigger, not 90%+.

Optimization Priority Order:

Correctness — Preserve factual accuracy above all else
Completeness — Ensure all critical information survives compaction
Signal-to-noise — Remove redundancy, keep high-value context
Trajectory — Maintain the narrative thread of what's been done and why

This ordering matters. Emergency compaction at 95% often sacrifices correctness for brevity. Intentional compaction at 50% can optimize all four dimensions without forced trade-offs.

Concrete Technique: Status-to-Plan Compaction

One practical pattern is compacting status updates back into plan documents:

# Before (context bloat)
## Plan
- Implement user authentication
- Add database schema
- Create API endpoints
 
## Status Updates
- [14:23] Started auth implementation
- [14:45] Auth working, moving to database
- [15:12] Database schema complete
- [15:30] Schema had bug, fixed
- [15:45] API endpoints half done
...

# After (intentional compaction)
## Plan
- ✓ Implement user authentication — Complete, no issues
- ✓ Add database schema — Complete after fixing validation bug
- ⧗ Create API endpoints — In progress (3/5 complete)
 
## Current Work
Working on remaining API endpoints (POST /users, DELETE /users)

The compacted version preserves correctness (what's done), completeness (including the bug fix), signal (current state), and trajectory (what's next)—all in a fraction of the context.

Why This Works

Proactive = Quality: Compacting at 50% gives you room to optimize. At 95%, you're in triage mode.
Frequent = Fresh: Small, regular compactions are easier to verify than massive emergency summarizations.
Intentional = Controlled: You decide what to preserve based on priority order, not panic.

The Trade-off

Requires active monitoring of context utilization and deliberate intervention. You can't set-and-forget. The investment is ongoing attention; the return is sustained quality.

Real-World Results: HumanLayer's ACE-FCA framework demonstrated this approach shipping 35,000 lines of code in 7 hours. The key wasn't speed—it was maintaining quality through aggressive proactive compaction. When you compact intentionally at 40-60%, you avoid the quality degradation that comes from emergency summarization at 95%.

Contrast with Emergency Compaction

Approach	Trigger Point	Quality Impact	Control
Emergency Auto-Compact	95%+ capacity	Degrades (forced summarization)	Low (automatic)
Frequent Intentional	40-60% capacity	Maintains (controlled compression)	High (deliberate)
No Compaction	Never (boot fresh agents)	High (fresh context)	Highest (manual restart)

Our default advice remains "boot a new agent" for most cases. Frequent intentional compaction is for scenarios where agent continuity matters—long-running sessions, accumulated state, or workflows where restarting is expensive.

When to Use

Multi-hour coding sessions where restarting loses momentum
Workflows that accumulate valuable learned context
Situations where handoff overhead exceeds compaction cost
Teams optimizing for sustained agent performance over time

When to Boot Fresh Instead

Short, focused tasks (< 30 minutes)
Clear task boundaries where restart is natural
Quality concerns outweigh continuity needs
Agent output shows signs of degradation

Connections

To Context Fundamentals: The "One Agent, One Task" principle this technique extends
To Advanced Context Patterns: How frequent intentional compaction relates to ACE's growing contexts
To Patterns: Emergency Context Rewriting anti-pattern demonstrates why reactive compaction fails

Sources

ACE-FCA framework by HumanLayer — Demonstrated in production shipping 35K LOC in 7 hours using proactive compaction at 40-60% utilization