Well-designed tools make the difference between an agent that can accomplish tasks and one that constantly struggles with its own interface.
Tool Examples as Design Pattern
[2025-12-09]: JSON schemas define structural validity but can't teach usage—that's what examples are for.
The Gap: Schemas tell the model what parameters exist and their types, but not:
- When to include optional parameters
- Which parameter combinations make sense together
- API conventions not expressible in JSON Schema
The Pattern: Provide 1-5 concrete tool call examples demonstrating correct usage at varying complexity levels.
Example Structure (for a support ticket API):
- Minimal: Title-only task—shows the floor
- Partial: Feature request with some reporter info—shows selective use
- Full: Critical bug with escalation, full metadata—shows the ceiling
This progression teaches when to use optional fields, not just that they exist.
Results: Internal testing showed accuracy improvement from 72% → 90% on complex parameter handling.
Best Practices:
- Use realistic data, not placeholder values ("John Doe", not "{name}")
- Focus examples on ambiguity areas not obvious from schema alone
- Show the minimal case first—don't always demonstrate full complexity
- Keep examples concise (1-5 per tool, not 20)
See Also:
- Prompt: Structuring — How tool examples relate to broader prompt design principles
Source: Advanced Tool Use - Anthropic
Rich User Questioning Patterns
[2026-01-30]: Orchestration research from cc-mirror reveals sophisticated user clarification patterns that replace text-based menus with rich, decision-guiding question structures.
Philosophy: Users Need to See Options
Core insight: "Users don't know what they want until they see the options"
Asking "What should I prioritize?" yields vague answers. Showing 4 options with descriptions, trade-offs, and recommendations enables informed decisions.
The 4×4 Maximal Pattern
Structure:
- 4 questions addressing different decision dimensions
- 4 options per question with rich descriptions
- Trade-offs and implications explicit
- Recommended option guides optimal choice
- Multi-select support when appropriate
Example: Feature Implementation Scope Clarification
AskUserQuestion(
questions=[
{
"text": "What's the primary goal?",
"options": [
{
"label": "Performance optimization",
"description": "Focus on speed and efficiency. May increase code complexity and require more thorough testing.",
"recommended": True
},
{
"label": "Code simplicity",
"description": "Prioritize readability and maintainability over raw speed. Easier onboarding, longer execution time."
},
{
"label": "Feature completeness",
"description": "Cover all edge cases and scenarios. Comprehensive but longer timeline. Best for user-facing features."
},
{
"label": "Quick MVP",
"description": "Fast delivery with core features only. Technical debt acceptable. Good for validation before full build."
}
]
},
{
"text": "How should we handle errors?",
"options": [
{
"label": "Fail fast with clear messages",
"description": "Immediate error visibility. Easier debugging but more interruptions. Good for development."
},
{
"label": "Graceful degradation",
"description": "Continue operation with reduced functionality. Better UX but errors may go unnoticed."
},
{
"label": "Retry with exponential backoff",
"description": "Automatic recovery from transient failures. Adds complexity and potential latency."
},
{
"label": "Log and continue",
"description": "Silent failure recovery. Best for non-critical paths. Risk of hidden issues accumulating."
}
]
},
{
"text": "What testing level?",
"options": [
{
"label": "Unit tests only",
"description": "Fast feedback, isolated verification. Misses integration issues. Good for pure logic."
},
{
"label": "Integration tests",
"description": "Verify component interactions. Slower but catches real-world failures. Recommended for APIs."
},
{
"label": "Full E2E suite",
"description": "Complete user flow validation. Highest confidence but longest execution. Expensive to maintain."
},
{
"label": "Manual testing",
"description": "Skip automated tests initially. Fast development but manual verification burden. Technical debt."
}
]
},
{
"text": "Deployment strategy?",
"options": [
{
"label": "Feature flag rollout",
"description": "Deploy to all, enable gradually. Instant rollback. Requires flag infrastructure."
},
{
"label": "Canary deployment",
"description": "Small user percentage first. Catch issues early. Needs monitoring and rollback automation."
},
{
"label": "Blue-green deployment",
"description": "Full environment switch. Zero downtime. Doubles infrastructure cost temporarily."
},
{
"label": "Direct deployment",
"description": "Immediate production rollout. Fastest but highest risk. Acceptable for low-traffic features."
}
]
}
]
)Anti-Pattern: Text-Based Menus
Avoid:
Please choose:
1. Fast
2. Cheap
3. Good quality
Which do you prefer?
Problems:
- No context about trade-offs
- Binary thinking (can't combine attributes)
- Vague options without implications
- No guidance toward optimal choice
When to Use Maximal Questions
Good fit:
- Request admits multiple valid interpretations
- Choices meaningfully affect implementation approach
- Actions carry risk or are difficult to reverse
- User preferences influence trade-offs
- Scope clarification prevents rework
Poor fit:
- Decisions are obvious from context
- Only one reasonable approach exists
- User already provided detailed requirements
- Questions would annoy rather than clarify
Implementation Guidelines
Option descriptions should include:
- What this choice means concretely
- Primary trade-off or cost
- When this choice makes sense
- What happens if this choice is wrong
The recommended flag signals:
- Optimal choice given typical constraints
- Not forcing—user can override
- Guides users unfamiliar with domain
Multi-select enables:
- "Performance AND simplicity" combinations
- "All of these except X" selections
- Prioritization without forced ranking
Sources: cc-mirror orchestration tools, AskUserQuestion pattern documentation
Leading Questions
- How do you write tool descriptions that work for both humans and LLMs?
- When should you split one complex tool into multiple simple ones?
- What parameters should be required vs. optional? How do you decide?
- How do naming conventions affect tool selection accuracy?
- What makes a tool description actionable vs. confusing?
Connections
- To Tool Selection: Design choices directly impact selection accuracy
- To Prompt: Tool descriptions are prompts themselves—see Model-Invoked vs. User-Invoked Prompts
- To Scaling Tools: Good design becomes critical when managing dozens of tools
- To Orchestrator Pattern: AskUserQuestion maximal pattern demonstrates rich clarification as coordination tool. Orchestrators use structured questioning before delegation to absorb complexity and radiate simplicity.