Context as Code | Agentic Engineering

Treat agent knowledge like software, not documents.

This extends the "Specs as Source Code" mental model beyond specifications to all context artifacts: knowledge bases, expertise files, tool descriptions, and system prompts. If it shapes agent behavior, it's source code.

Your Mental Model

Knowledge artifacts are source code. Version control them, test them, refactor them, and document them with the same rigor you apply to Python or JavaScript. When you edit an agent's context without tracking what changed and why, you're cowboy-coding in production.

Knowledge bases aren't documentation—they're the runtime instructions that determine agent behavior. Treating them as "just text files" is like treating your application code as "just text files" and editing it in Notepad without version control.

What This Looks Like in Practice

The ACE playbook format exemplifies context as code:

[str-00001] helpful=5 harmful=0 :: Use structured output for complex tasks
[cal-00003] helpful=8 harmful=0 :: Token cost = (input + output) * rate
[mis-00004] helpful=6 harmful=0 :: Don't retry on rate limits without backoff
[con-00002] helpful=7 harmful=0 :: Context window = working memory
[too-00001] helpful=9 harmful=0 :: Grep before Edit to avoid blind writes

Each line is:

Uniquely identified ([prefix-ID]) - enables precise references, easy refactoring
Performance tested (helpful=X harmful=Y) - like unit tests for knowledge
Category-organized (str-, cal-, mis-, con-, too-) - modular design
Self-describing (the content itself explains what it does)

Software Engineering Patterns Applied to Context

Version Control: Track Changes, Enable Rollback

# Traditional documentation
docs/agent-knowledge.md  # edited directly, no history
 
# Context as code
git log --oneline knowledge/strategies.md
a3f2b1c Add retry strategy for transient failures
e4d5c6f Remove deprecated authentication approach
f7g8h9i Refactor error handling strategies
 
git diff e4d5c6f..a3f2b1c knowledge/strategies.md
- [str-00015] helpful=2 harmful=3 :: Use basic auth for API calls
+ [str-00015] helpful=8 harmful=0 :: Use OAuth2 with refresh tokens

When agent behavior regresses, you can git bisect to find which knowledge change caused it.

Testing: Helpful/Harmful Counters as Unit Tests

# Before testing
[str-00023] :: Always validate user input

# After testing (knowledge has test results)
[str-00023] helpful=12 harmful=0 :: Always validate user input

The counters are like test pass/fail metrics:

helpful > 0, harmful = 0 → Proven valuable, keep it
helpful = 0, harmful > 0 → Causes problems, remove or refactor
helpful > 0, harmful > 0 → Context-dependent, needs conditions

You can track these over time like code coverage metrics.

Modular Organization: Category Prefixes, Unique IDs

strategies/
  str-00001.md  # High-level approaches
  str-00002.md
calculations/
  cal-00001.md  # Formulas and computations
  cal-00003.md
mistakes/
  mis-00001.md  # Anti-patterns to avoid
  mis-00004.md
concepts/
  con-00001.md  # Domain knowledge
  con-00002.md
tools/
  too-00001.md  # Tool usage patterns
  too-00005.md

Just like code modules, categories enable:

Focused loading: Only load relevant categories for specific tasks
Dependency tracking: [str-00012] references [con-00003] and [too-00007]
Easier refactoring: Move entries between categories without breaking references

Refactoring: Semantic Deduplication

# Before refactoring (duplication)
[str-00008] :: For database queries, use connection pooling
[str-00015] :: When connecting to databases, use a connection pool
[too-00023] :: Database access should use connection pooling

# After refactoring (DRY principle)
[str-00008] helpful=15 harmful=0 :: Use connection pooling for database access
# References: too-00023, db-architecture.md

Like code refactoring, you extract common patterns, eliminate redundancy, and maintain a single source of truth.

Documentation: Each Entry is Self-Describing

# Weak (requires external context)
[str-00042] :: Use the pattern

# Strong (self-contained)
[str-00042] helpful=6 harmful=0 :: For multi-step workflows, use plan-build-review pattern to separate planning from execution

Like good function names and docstrings, each knowledge entry should be understandable in isolation.

When to Apply This Model

Good Fit

Production agent systems: When agents run in production, their knowledge determines user-facing behavior. Treat it with production code rigor.

Multi-agent systems: When knowledge is shared across multiple agents, version control and testing prevent one agent's changes from breaking another.

Evolving domains: When knowledge needs frequent updates (new APIs, changing policies), treating context as code makes evolution traceable and reversible.

Team collaboration: When multiple people contribute to agent knowledge, version control and structure prevent conflicts and enable review.

Poor Fit

Prototype exploration: When you're still figuring out what knowledge the agent needs, heavyweight structure slows discovery. Start informal, formalize later.

Static, finished systems: If the knowledge is complete and won't change, the overhead of treating it as source code isn't justified.

Single-person, short-lived projects: For quick experiments, simple text files work fine. Add structure when the project grows.

The Continuum: From Documents to Code

Documents                                                    Code
    │                                                         │
    ├─ Plain text notes (no structure)                       │
    ├─ Markdown with sections (light structure)              │
    ├─ Structured markdown with metadata (ACE playbook)      │
    ├─ Machine-parseable format with schema (JSON/YAML)      │
    └─ Formal specifications with validation (contracts) ────┘

You don't need to jump straight to the "code" end. The ACE playbook hits a sweet spot: human-readable markdown with just enough structure (IDs, counters, categories) to enable software engineering practices.

Implications

Knowledge Reviews Like Code Reviews

# PR: Update authentication strategies
 
Changes to knowledge/strategies/:
  - [str-00042] helpful=2 harmful=5 :: Use basic auth
  + [str-00042] helpful=8 harmful=0 :: Use OAuth2 with PKCE flow
  + [str-00058] helpful=0 harmful=0 :: For mobile apps, use refresh token rotation
 
Reviewer: "str-00042 improvement looks good. For str-00058, have we tested
harmful=0? Refresh token rotation can cause issues if not handled correctly."

Knowledge Regression Testing

def test_agent_follows_knowledge():
    """Verify agent applies knowledge correctly."""
    agent = load_agent_with_knowledge("knowledge/strategies.md")
 
    # Test that str-00042 is applied
    response = agent.handle_auth_request(mock_request)
    assert response.auth_method == "OAuth2"
    assert response.uses_pkce == True
 
    # Increment helpful counter if successful
    increment_helpful("str-00042")

Knowledge Metrics

# Knowledge health dashboard
Total entries: 247
  Proven (helpful > 5, harmful = 0): 89 (36%)
  Untested (helpful = 0, harmful = 0): 143 (58%)
  Problematic (harmful > 0): 15 (6%)

Recent changes (last 7 days):
  + 12 new entries
  ~ 8 modified entries
  - 3 removed entries

Coverage by category:
  str- (strategies): 67 entries
  cal- (calculations): 23 entries
  mis- (mistakes): 34 entries
  con- (concepts): 89 entries
  too- (tools): 34 entries

Common Pitfalls

Over-Engineering Early

Problem: Creating elaborate versioning and testing infrastructure before you know what knowledge the agent needs.

Solution: Start with simple markdown. Add structure (IDs, categories, counters) when you have enough entries that organization becomes painful. Add testing when you have enough history to know what "helpful" looks like.

Treating All Context Equally

Problem: Applying heavy structure to ephemeral context that doesn't need it (one-off prompts, temporary instructions).

Solution: Distinguish between:

Core knowledge (long-lived, reused, tested) → Treat as code
Task-specific context (one-off, temporary) → Keep lightweight
Generated content (can be regenerated) → Don't version control

Losing the Human-Readable Aspect

Problem: Making context so structured and formal that humans can't easily read and edit it.

Solution: The ACE playbook maintains readability. Avoid formats that require parsing tools to understand. Markdown with light structure is the sweet spot.

Connections

To Specs as Source Code: Context as code extends this mental model beyond specs to all knowledge artifacts
To Knowledge Evolution: Practical patterns for evolving knowledge bases over time
To Self-Improving Experts: Expertise files are context that agents execute—they're source code for behavior
To Context: The mechanics of how context enters the agent's working memory