Agents are only as effective as the tools we give them. In a new engineering post, Anthropic breaks down exactly what makes an AI agent tool actually useful in production — and the answer isn't obvious.
Tools Are a New Kind of Software Contract
Traditional software is deterministic: getWeather("NYC") fetches NYC weather the same way every time. Tools for agents are different. When a user asks "Should I bring an umbrella?", an agent might call the weather tool, answer from memory, or ask a clarifying question first.
This means writing tools for agents requires rethinking the contract entirely. You're not writing for other developers — you're writing for a non-deterministic system that can misinterpret, hallucinate, or fail to grasp how to use your tool at all.
Build a Prototype, Then Evaluate
Anthropic's recommended workflow: prototype fast, evaluate rigorously, iterate with agents.
Prototype locally by wrapping tools in a local MCP server or Desktop extension (DXT). Connect to Claude Code with claude mcp add <name> <command> or to Claude Desktop via Settings. Test the tools yourself before handing them off.
Run structured evaluations. The key is generating evaluation tasks grounded in real-world use cases — not toy examples. Strong tasks require multiple tool calls and should be paired with verifiable outcomes. Use simple agentic loops wrapping LLM API calls: one loop per evaluation task.
Key Principles for High-Quality MCP Tools
Anthropic distilled years of tool-building experience into five principles:
Use Claude to Optimize Its Own Tools
One of the most powerful patterns Anthropic describes: use Claude Code to automatically optimize the tools it has access to. Give Claude the evaluation results and ask it to rewrite tool descriptions, adjust response formats, and refine parameters. The agent becomes part of its own improvement loop.
What Weakens Agent Performance
Anthropic's held-out test set on internal Slack tools revealed predictable failure modes:
- Tool descriptions too vague for the agent to disambiguate
- Response formats that omit critical context the agent needs
- Tools with overlapping scope, causing the agent to call the wrong one
- Evaluations too simple to catch real-world failure patterns
Credit
- Original article: Writing effective tools for AI agents — with agents
- Source: Anthropic Engineering Blog
- Published: September 11, 2025