Twelve Leverage Points of Agentic Coding

A framework for understanding where to intervene in agentic systems. The hierarchy follows Donella Meadows' "Places to Intervene in a System" pattern—lower numbers indicate higher leverage points that affect the entire system. Changes at the top (#1-#4) cascade throughout the system; changes at the bottom (#9-#12) produce local fixes.

Guiding philosophy: "One agent, one purpose, one prompt."

The Hierarchy

AI Developer Workflows (ADWs) define how work flows between agents in multi-agent systems—the highest leverage intervention point in the framework.

#	Leverage Point	Core Question
12	Context	What does the agent actually know?
11	Model	What tradeoffs exist: cost, speed, intelligence?
10	Prompt	Are instructions concrete and followable?
9	Tools	What actions can agents take, and in what form?
8	Standard Out	Can agents and operators see what's happening?
7	Types	Is typing consistent and enforced?
6	Documentation	Can agents navigate and trust the documentation?
5	Tests	Are tests helping agents or just theatre?
4	Architecture	Is the codebase agentically intuitive?
3	Plans	Can agents complete tasks without further input?
2	Templates	Do agents know what good output looks like?
1	ADWs	How does work flow between agents?

Low Leverage (Local Fixes)

12. Context

What does the agent actually know? What's in the context window? Is it all necessary? Will the current context increase the likelihood of outputting the next token correctly?

Key considerations:

How to audit what's actually in an agent's context at decision time
The cost of irrelevant context and methods to measure it
How to distinguish "necessary" context from "nice to have" information
Processes for pruning context that isn't contributing to outcomes

Example: Good vs. Poor Context Management

Poor: Loading entire documentation suite into context for every task (500k+ tokens, most irrelevant).

Good: Loading only the relevant module documentation based on task requirements (15k tokens, high signal-to-noise ratio).

11. Model

What model is the system using? How capable is it? What tradeoffs exist? Cost vs speed vs intelligence.

Key considerations:

How to map tasks to the right model tier
Identifying cases where over-indexing on capability wastes resources
Staying current on model capabilities without constant churn

Example: Good vs. Poor Model Selection

Poor: Using GPT-4o for simple text extraction tasks that GPT-3.5-turbo handles perfectly (10x cost overhead).

Good: Using Claude 3.5 Sonnet for complex code generation, GPT-3.5-turbo for text classification (matched capability to task requirements).

10. Prompt

What instructions does the agent have? Are they concrete? Can they be followed properly?

Key considerations:

What makes a prompt "concrete" vs. "vague"
How to test whether a prompt can actually be followed
Setting quality bars for prompts before looking elsewhere for the problem

Example: Good vs. Poor Prompting

Poor: "Make this code better" (vague, no success criteria).

Good: "Refactor the authentication module to use dependency injection. Extract the token validation logic into a separate class. Maintain existing test coverage." (concrete, testable, clear success criteria).

9. Tools

What actions can agents take? What form are these tools available in? Internal tooling vs MCP vs CLI vs something else.

Key considerations:

How to decide between internal tools, MCP servers, and CLI wrappers
The tradeoff between tool flexibility and tool reliability
When tool limitations become the bottleneck

Medium Leverage (System Properties)

8. Standard Out

Can agents (and operators) actually SEE what the code is doing? Is it being output centrally? Is it self-documenting? Do logs have clear sources and descriptions?

Key considerations:

How observable agent systems are
What information is missing from current visibility
How to balance verbose logging with signal-to-noise ratio
What "self-documenting" output looks like in practice

7. Types

Is the codebase typing consistent? Are agents aware of it? To what extent is it being policed and are the agents informed if/when they make infractions?

Key considerations:

How strong typing helps agents write correct code
How to surface type errors to agents in actionable ways
The relationship between type coverage and agent success rate

6. Documentation

Where is the documentation? What's in there? Can agents easily navigate? Is it out of date? Is it being updated constantly? Is it "self-improving"?

Key considerations:

What makes documentation "agent-navigable"
How to keep docs in sync with code when agents are making changes
What "self-improving" documentation looks like in practice
Where agents look for documentation, and whether that's where it lives

5. Tests

What are the tests doing and how do they help agents? Are agents conducting "testing theatre"? Is testing using mock implementations or running against actual code?

Key considerations:

The difference between tests that help agents and tests that don't
How to detect "testing theatre" (tests that pass but don't verify anything real)
When mocks help and when they hide problems
How test failures guide agent behavior

High Leverage (Structural Changes)

4. Architecture

What patterns does the codebase follow? How is it structured? Is it "agentically-intuitive"—following historically-popular structures with higher likelihood of existing in training data?

Key considerations:

What makes an architecture "agentically intuitive"
How to balance "what's in training data" with "what's right for the problem"
What architectural patterns agents handle well vs. poorly
How much codebase structure affects agent success

3. Plans

Plans are MASSIVE prompts passed to an agent with the expectation that no more user interaction must happen in that session for the agent to finish its task.

Key considerations:

What makes a plan "complete enough" to run without human intervention
How to scope a plan—what's too big, what's too small
How to handle plans that fail partway through
The relationship between plan quality and task success
How to write plans that are robust to unexpected situations

2. Templates

Do agents know what docs, code, prompts, etc. should look like? Are prompts/plans reusable? Are templates structured consistently? Are all elements necessary?

Key considerations:

What templates get used most frequently
How to decide what goes in a template vs. what's generated fresh
How to prevent templates from becoming bloated over time
The relationship between template quality and output consistency

1. ADWs (AI Developer Workflows)

How does work carry between agents? How are multiple agents working together to accomplish a shared goal? To what extent are ADWs deterministic (based in code) vs. stochastic/agentic (using orchestrator agent, agents can invoke other agents, etc.)?

Key considerations:

What ADWs have been built or used, and what worked
How to decide between deterministic workflows and agentic orchestration
The handoff problem between agents and how to solve it
When multi-agent coordination helps vs. adds unnecessary complexity
How to debug workflows that span multiple agents

Connections

To Core Four: The twelve leverage points expand on the four pillars. Context (#12), Model (#11), Prompt (#10), and Tools (#9) map directly to the core four. The higher leverage points (Plans, Templates, ADWs) represent system-level patterns built on top of the pillars.
To Evaluation: Each leverage point requires different evaluation approaches. Low leverage points (context, model, prompt) can be evaluated per-task. High leverage points (architecture, templates, ADWs) require system-level metrics across multiple tasks.
To Patterns: ADWs (#1) and Plans (#3) directly correspond to the orchestrator and plan-build-review patterns. Templates (#2) enable self-improving expert patterns.

Notes on the Source

This framework comes from agenticengineer.com. The hierarchy draws from Donella Meadows' "Leverage Points: Places to Intervene in a System" (1999), applying systems thinking to agentic engineering.