Registering many tools significantly increases token consumption and degrades tool selection accuracy; mitigation strategies include tool filtering middleware, dynamic loading, and cost-aware model cascading.
Each tool added to an agent increases the prompt size because all tool schemas (names, descriptions, parameter schemas) are sent to the LLM on every call. With 20+ tools, token waste becomes significant, tool selection accuracy degrades as the model struggles to choose from too many options, and latency increases [citation:5]. To mitigate this, use a LLMToolSelectorMiddleware that pre-filters tools before each model call, invoking a cheap model (like Haiku) to select only the top-k relevant tools based on the user's query [citation:5]. For production optimization, the CascadeFlow library implements intelligent routing where simple queries are handled by cheap models (gpt-5-mini) and only complex queries escalate to expensive models (claude-opus-4-6), achieving 40-60% cost savings [citation:10].
Tool pre-selection middleware: Use cheap LLM to filter tools before each call [citation:5]
Always-include list: Critical tools bypass pre-filtering [citation:5]
Tool count caps: Limit selected tools (e.g., top 10) [citation:5]
CascadeFlow routing: Simple queries handled by cheap models; complex queries escalate [citation:10]
Dynamic loading: Only load tools needed for current user context [citation:3]