Scaling Tool Use

    Patterns for managing large numbers of tools without overwhelming context or degrading selection accuracy.


    Dynamic Tool Discovery

    [2025-12-09]: When tool definitions consume significant tokens (>10K), dynamic discovery trades search overhead for context efficiency.

    The Problem: Tool definitions create substantial token overhead before conversation even begins. A typical MCP setup with GitHub, Slack, Sentry, Grafana, and Splunk consumes ~55K tokens in tool schemas alone—and can exceed 100K+ with additional services.

    The Solution: Mark rarely-used tools with defer_loading: true. The agent searches for relevant tools dynamically, receiving references that expand into full definitions only when discovered.

    Results (Anthropic internal testing):

    • 85% reduction in token usage while maintaining full tool library access
    • Improved selection accuracy: Opus 4 from 49% → 74%, Opus 4.5 from 79.5% → 88.1%

    Best Practices:

    • Use when tool definitions consume >10K tokens total
    • Employ clear, descriptive tool names and descriptions (discovery depends on search)
    • Keep 3-5 most-used tools always loaded (no discovery overhead for common operations)
    • Compatible with prompt caching—deferred tools remain excluded from cache

    Decision Point: If you're managing <20 simple tools, eager loading is fine. If you're connecting MCP servers with dozens of endpoints each, dynamic discovery becomes essential.

    See Also:

    Source: Advanced Tool Use - Anthropic


    Programmatic Tool Orchestration

    [2025-12-09]: For complex workflows, let the agent write orchestration code instead of calling tools directly.

    The Problem: Traditional tool calling creates context pollution. Processing 2,000+ expense line items requires results to re-enter the agent's context, consuming 50KB+ and requiring a full inference pass per tool invocation.

    The Inversion: Instead of agent → tool → agent → tool sequences, the agent writes Python code that orchestrates multiple tool calls. Results process in-place within a sandboxed environment, never re-entering the agent's context until final output.

    Implementation Pattern:

    • Mark tools with allowed_callers: ["code_execution_20250825"]
    • Agent generates async Python using asyncio.gather() for parallel execution
    • Code pauses at tool calls, processes results, continues without model intervention
    • Only final output appears in agent's view

    Results (Anthropic internal testing):

    • 37% token reduction (43,588 → 27,297 average)
    • Eliminates 19+ inference passes in 20-tool workflows
    • Improved accuracy on knowledge retrieval: 25.6% → 28.5%

    When to Use:

    • Multi-step workflows with 3+ dependent tool calls
    • Parallel operations where results need filtering/transforming
    • Large result sets that would bloat context if returned directly

    When NOT to Use:

    • Simple single-tool invocations (overhead isn't worth it)
    • Tasks requiring agent reasoning between each tool call

    See Also:

    Source: Advanced Tool Use - Anthropic


    MCP Deployment Architecture

    [2025-12-09]: MCP deployment isn't one-size-fits-all. The transport mechanism determines scaling characteristics and operational constraints.

    Three Deployment Patterns

    Pattern Mechanism Scaling Best For
    stdio Local server proxying remote services Doesn't scale horizontally—single process Local development, simple integrations
    Streamable HTTP SSE-based, independent process Stateless replication possible Cloud services, production APIs
    Sidecar Kubernetes co-located container Lifecycle-coupled with main app Orchestrated container environments

    The Statefulness Challenge

    MCP connections are stateful. This creates real constraints:

    • Load balancing requires session affinity
    • Horizontal scaling needs sticky sessions or connection migration
    • Failure handling must account for lost connection state

    For simple deployments, this doesn't matter. At scale, it determines your architecture.

    Pattern Selection Guide

    Local dev only?
      └── Yes → stdio (simplest, no scaling needed)
      └── No → Cloud deployment?
               └── Container orchestration (K8s)?
                    └── Yes → Sidecar (lifecycle coupling)
                    └── No → Streamable HTTP (stateless scaling)
    

    Google ADK's MCP Integration

    ADK supports both directions:

    • Client mode: ADK agents consume tools from external MCP servers via McpToolset
    • Server mode: Expose ADK tools as MCP servers for other clients (Claude Code, Cursor)

    The bidirectional nature enables framework interoperability—ADK tools accessible from Claude Code, MCP servers accessible from ADK.

    Tool Filtering at Runtime

    tool_filter in McpToolset enables dynamic capability restriction:

    mcp_toolset = McpToolset(
        server_config=StdioServerConfig(command="npx", args=["mcp-server-github"]),
        tool_filter=lambda tool: tool.name in ["list_issues", "create_issue"]
    )

    This serves two purposes:

    1. Security: Expose only needed capabilities per agent
    2. Model focus: Fewer tools means better tool selection (less overwhelm)

    See Also: Google ADK: MCP Integration — Concrete implementation patterns


    MCP Auto-Selection Mode

    [2026-01-17]: Claude Code 2.1.7 introduced automatic MCP tool selection, enabled by default. This changes how tools surface in agent workflows.

    Default Behavior

    Starting with version 2.1.7:

    • MCP tools auto-discover based on task context relevance
    • Tool descriptions load automatically without explicit configuration
    • System uses keyword matching against tool names and descriptions

    Threshold Configuration (2.1.9+)

    Control when tool descriptions defer vs. load upfront:

    // In .claude/settings.json or project config
    {
      "mcpConfig": {
        "toolSearch": "auto:15"  // Defer when >15% of context window
      }
    }

    Threshold values:

    • auto:5 — Conservative: defer early, preserve context
    • auto:10 — Default: balanced approach
    • auto:15-20 — Aggressive: load more tools upfront
    • auto:0 — Disable auto mode (manual tool management)

    When to Adjust Threshold

    Scenario Recommended Threshold
    Few MCP servers, small descriptions Default (auto:10)
    Many servers, large descriptions Lower (auto:5)
    Context-constrained workflows Lower (auto:5)
    Discovery-heavy exploration Higher (auto:20)

    Trade-offs

    Benefits of auto mode:

    • Fewer "tool not available" errors
    • Reduced manual configuration burden
    • Dynamic tool discovery adapts to task

    Costs:

    • Slower startup if many tools load unnecessarily
    • Context overhead for tool descriptions
    • Less predictable tool availability

    Disabling Auto Mode

    For deterministic workflows where tool availability must be explicit:

    {
      "mcpConfig": {
        "toolSearch": "manual"
      }
    }

    Manual mode requires explicit tool declarations in prompts or configuration. Use this when reproducibility matters more than convenience.

    See Also:


    Leading Questions

    • How do you monitor tool usage patterns to optimize discovery configurations?
    • When should you cache tool definitions vs. re-fetch them?
    • How do you handle versioning when tools are dynamically discovered?
    • What's the right balance between eager loading and discovery overhead?
    • How do you test programmatic orchestration patterns?

    Connections