Treat agent knowledge like software, not documents.
This extends the "Specs as Source Code" mental model beyond specifications to all context artifacts: knowledge bases, expertise files, tool descriptions, and system prompts. If it shapes agent behavior, it's source code.
Your Mental Model
Knowledge artifacts are source code. Version control them, test them, refactor them, and document them with the same rigor you apply to Python or JavaScript. When you edit an agent's context without tracking what changed and why, you're cowboy-coding in production.
Knowledge bases aren't documentation—they're the runtime instructions that determine agent behavior. Treating them as "just text files" is like treating your application code as "just text files" and editing it in Notepad without version control.
What This Looks Like in Practice
The ACE playbook format exemplifies context as code:
[str-00001] helpful=5 harmful=0 :: Use structured output for complex tasks
[cal-00003] helpful=8 harmful=0 :: Token cost = (input + output) * rate
[mis-00004] helpful=6 harmful=0 :: Don't retry on rate limits without backoff
[con-00002] helpful=7 harmful=0 :: Context window = working memory
[too-00001] helpful=9 harmful=0 :: Grep before Edit to avoid blind writes
Each line is:
- Uniquely identified (
[prefix-ID]) - enables precise references, easy refactoring - Performance tested (
helpful=X harmful=Y) - like unit tests for knowledge - Category-organized (
str-,cal-,mis-,con-,too-) - modular design - Self-describing (the content itself explains what it does)
Software Engineering Patterns Applied to Context
Version Control: Track Changes, Enable Rollback
# Traditional documentation
docs/agent-knowledge.md # edited directly, no history
# Context as code
git log --oneline knowledge/strategies.md
a3f2b1c Add retry strategy for transient failures
e4d5c6f Remove deprecated authentication approach
f7g8h9i Refactor error handling strategies
git diff e4d5c6f..a3f2b1c knowledge/strategies.md
- [str-00015] helpful=2 harmful=3 :: Use basic auth for API calls
+ [str-00015] helpful=8 harmful=0 :: Use OAuth2 with refresh tokensWhen agent behavior regresses, you can git bisect to find which knowledge change caused it.
Testing: Helpful/Harmful Counters as Unit Tests
# Before testing
[str-00023] :: Always validate user input
# After testing (knowledge has test results)
[str-00023] helpful=12 harmful=0 :: Always validate user input
The counters are like test pass/fail metrics:
helpful > 0, harmful = 0→ Proven valuable, keep ithelpful = 0, harmful > 0→ Causes problems, remove or refactorhelpful > 0, harmful > 0→ Context-dependent, needs conditions
You can track these over time like code coverage metrics.
Modular Organization: Category Prefixes, Unique IDs
strategies/
str-00001.md # High-level approaches
str-00002.md
calculations/
cal-00001.md # Formulas and computations
cal-00003.md
mistakes/
mis-00001.md # Anti-patterns to avoid
mis-00004.md
concepts/
con-00001.md # Domain knowledge
con-00002.md
tools/
too-00001.md # Tool usage patterns
too-00005.md
Just like code modules, categories enable:
- Focused loading: Only load relevant categories for specific tasks
- Dependency tracking:
[str-00012]references[con-00003]and[too-00007] - Easier refactoring: Move entries between categories without breaking references
Refactoring: Semantic Deduplication
# Before refactoring (duplication)
[str-00008] :: For database queries, use connection pooling
[str-00015] :: When connecting to databases, use a connection pool
[too-00023] :: Database access should use connection pooling
# After refactoring (DRY principle)
[str-00008] helpful=15 harmful=0 :: Use connection pooling for database access
# References: too-00023, db-architecture.md
Like code refactoring, you extract common patterns, eliminate redundancy, and maintain a single source of truth.
Documentation: Each Entry is Self-Describing
# Weak (requires external context)
[str-00042] :: Use the pattern
# Strong (self-contained)
[str-00042] helpful=6 harmful=0 :: For multi-step workflows, use plan-build-review pattern to separate planning from execution
Like good function names and docstrings, each knowledge entry should be understandable in isolation.
When to Apply This Model
Good Fit
Production agent systems: When agents run in production, their knowledge determines user-facing behavior. Treat it with production code rigor.
Multi-agent systems: When knowledge is shared across multiple agents, version control and testing prevent one agent's changes from breaking another.
Evolving domains: When knowledge needs frequent updates (new APIs, changing policies), treating context as code makes evolution traceable and reversible.
Team collaboration: When multiple people contribute to agent knowledge, version control and structure prevent conflicts and enable review.
Poor Fit
Prototype exploration: When you're still figuring out what knowledge the agent needs, heavyweight structure slows discovery. Start informal, formalize later.
Static, finished systems: If the knowledge is complete and won't change, the overhead of treating it as source code isn't justified.
Single-person, short-lived projects: For quick experiments, simple text files work fine. Add structure when the project grows.
The Continuum: From Documents to Code
Documents Code
│ │
├─ Plain text notes (no structure) │
├─ Markdown with sections (light structure) │
├─ Structured markdown with metadata (ACE playbook) │
├─ Machine-parseable format with schema (JSON/YAML) │
└─ Formal specifications with validation (contracts) ────┘
You don't need to jump straight to the "code" end. The ACE playbook hits a sweet spot: human-readable markdown with just enough structure (IDs, counters, categories) to enable software engineering practices.
Implications
Knowledge Reviews Like Code Reviews
# PR: Update authentication strategies
Changes to knowledge/strategies/:
- [str-00042] helpful=2 harmful=5 :: Use basic auth
+ [str-00042] helpful=8 harmful=0 :: Use OAuth2 with PKCE flow
+ [str-00058] helpful=0 harmful=0 :: For mobile apps, use refresh token rotation
Reviewer: "str-00042 improvement looks good. For str-00058, have we tested
harmful=0? Refresh token rotation can cause issues if not handled correctly."Knowledge Regression Testing
def test_agent_follows_knowledge():
"""Verify agent applies knowledge correctly."""
agent = load_agent_with_knowledge("knowledge/strategies.md")
# Test that str-00042 is applied
response = agent.handle_auth_request(mock_request)
assert response.auth_method == "OAuth2"
assert response.uses_pkce == True
# Increment helpful counter if successful
increment_helpful("str-00042")Knowledge Metrics
# Knowledge health dashboard
Total entries: 247
Proven (helpful > 5, harmful = 0): 89 (36%)
Untested (helpful = 0, harmful = 0): 143 (58%)
Problematic (harmful > 0): 15 (6%)
Recent changes (last 7 days):
+ 12 new entries
~ 8 modified entries
- 3 removed entries
Coverage by category:
str- (strategies): 67 entries
cal- (calculations): 23 entries
mis- (mistakes): 34 entries
con- (concepts): 89 entries
too- (tools): 34 entries
Common Pitfalls
Over-Engineering Early
Problem: Creating elaborate versioning and testing infrastructure before you know what knowledge the agent needs.
Solution: Start with simple markdown. Add structure (IDs, categories, counters) when you have enough entries that organization becomes painful. Add testing when you have enough history to know what "helpful" looks like.
Treating All Context Equally
Problem: Applying heavy structure to ephemeral context that doesn't need it (one-off prompts, temporary instructions).
Solution: Distinguish between:
- Core knowledge (long-lived, reused, tested) → Treat as code
- Task-specific context (one-off, temporary) → Keep lightweight
- Generated content (can be regenerated) → Don't version control
Losing the Human-Readable Aspect
Problem: Making context so structured and formal that humans can't easily read and edit it.
Solution: The ACE playbook maintains readability. Avoid formats that require parsing tools to understand. Markdown with light structure is the sweet spot.
Connections
- To Specs as Source Code: Context as code extends this mental model beyond specs to all knowledge artifacts
- To Knowledge Evolution: Practical patterns for evolving knowledge bases over time
- To Self-Improving Experts: Expertise files are context that agents execute—they're source code for behavior
- To Context: The mechanics of how context enters the agent's working memory