AIMay 21, 2026Updated: May 21, 20266 min read

Writing Effective Tools for AI Agents

Agents are only as effective as the tools we give them. Anthropic shares the key techniques behind building high-quality MCP tools that actually improve agent performance in production.

L

Lugon

Vibe Engineer

Share article
Writing Effective Tools for AI Agents

Agents are only as effective as the tools we give them. In a new engineering post, Anthropic breaks down exactly what makes an AI agent tool actually useful in production — and the answer isn't obvious.

Tools Are a New Kind of Software Contract

Traditional software is deterministic: getWeather("NYC") fetches NYC weather the same way every time. Tools for agents are different. When a user asks "Should I bring an umbrella?", an agent might call the weather tool, answer from memory, or ask a clarifying question first.

This means writing tools for agents requires rethinking the contract entirely. You're not writing for other developers — you're writing for a non-deterministic system that can misinterpret, hallucinate, or fail to grasp how to use your tool at all.

Build a Prototype, Then Evaluate

Anthropic's recommended workflow: prototype fast, evaluate rigorously, iterate with agents.

Prototype locally by wrapping tools in a local MCP server or Desktop extension (DXT). Connect to Claude Code with claude mcp add <name> <command> or to Claude Desktop via Settings. Test the tools yourself before handing them off.

Run structured evaluations. The key is generating evaluation tasks grounded in real-world use cases — not toy examples. Strong tasks require multiple tool calls and should be paired with verifiable outcomes. Use simple agentic loops wrapping LLM API calls: one loop per evaluation task.

Key Principles for High-Quality MCP Tools

Anthropic distilled years of tool-building experience into five principles:

  • Choose the right tools to implement — and not to implement. Not every capability deserves a tool. Tools add surface area for the agent to navigate; unnecessary tools add noise.
  • Namespace tools to define clear boundaries. Each tool should have a single, well-defined purpose. Ambiguity in tool scope confuses agents.
  • Return meaningful context from tools. Verbose, structured responses with reasoning context outperform sparse, minimal outputs. The agent needs enough signal to decide next steps.
  • Optimize tool responses for token efficiency. Every token spent on tool output is a token not spent on the final answer. Balance richness with concision.
  • Prompt-engineer tool descriptions and specs. Just as you tune user-facing prompts, you should tune tool descriptions. Descriptions are the interface between you and the agent's decision-making.
  • Use Claude to Optimize Its Own Tools

    One of the most powerful patterns Anthropic describes: use Claude Code to automatically optimize the tools it has access to. Give Claude the evaluation results and ask it to rewrite tool descriptions, adjust response formats, and refine parameters. The agent becomes part of its own improvement loop.

    What Weakens Agent Performance

    Anthropic's held-out test set on internal Slack tools revealed predictable failure modes:

    • Tool descriptions too vague for the agent to disambiguate
    • Response formats that omit critical context the agent needs
    • Tools with overlapping scope, causing the agent to call the wrong one
    • Evaluations too simple to catch real-world failure patterns
    The takeaway: the bottleneck on agent quality is rarely the model — it's the tools.

    Credit

    aiagentsmcptoolinganthropic
    Share article
    Start Your Project

    Ready to transform?

    Discover how TeguFy can help your business simplify, amplify, and fortify with AI, Blockchain, and cutting-edge technology.