AIJune 7, 2026Updated: June 7, 20265 min read

Tokenomics of LLM Multi-Agent Systems: Code Review Is the Hidden Cost

A new empirical study analyzing 30 software development tasks in ChatDev reveals that Code Review consumes nearly 60% of all tokens — and it's not even the most obvious stage. Here's what technical founders and builders need to know.

L

Lugon

Vibe Engineer

Share article

Tokenomics of LLM Multi-Agent Systems: Code Review Is the Hidden Cost

The Cost Nobody Talks About

When teams talk about the cost of AI-powered code generation, they usually focus on generation itself. But a new empirical study on token consumption in LLM Multi-Agent (LLM-MA) systems flips that assumption entirely.

Analyzing 30 software development tasks performed by the ChatDev framework with a GPT-5 reasoning model, researchers mapped internal phases to standard SDLC stages: Design, Coding, Code Completion, Code Review, Testing, and Documentation. The results are striking.

Key Finding: Code Review Is the Hunger

The iterative Code Review stage accounts for 59.4% of total token consumption on average. That's not a rounding error — it's the dominant cost driver in agentic software engineering.

Worse, input tokens consistently make up 53.9% of total consumption across all stages. This means agents are spending more tokens reading and understanding code than producing new code. The bottleneck isn't generation — it's the endless back-and-forth of automated refinement.

What This Means for Builders

For technical founders and product-minded developers, these numbers carry real implications:

Cost prediction is broken. Most teams estimate AI costs based on output tokens. But input-heavy stages like Code Review can silently inflate bills by 2–3×.
Agent collaboration has hidden overhead. Multi-agent pipelines feel efficient because tasks parallelize. But each agent's iteration cycle compounds token usage in ways a single-agent setup doesn't.
The real cost is verification, not creation. Automated refinement and quality checks are where agentic engineering burns budget. This suggests the biggest gains come from optimizing review loops, not generation prompts.

Token Distribution by Stage

Stage	Token Share
Code Review	59.4%
Coding + Completion	~25%
Testing + Documentation	~15%
Design	<1%

Practical Takeaways

Audit your agent loops. If you're running multi-agent pipelines, instrument token tracking per stage. Code Review is likely your biggest line item.

Reduce iteration cycles. Each review pass costs tokens. Tighter agent prompts and better initial code quality cut review rounds.

Don't assume generation is the cost. When budgeting for AI-assisted development, include input token costs from context windows, tool calls, and state passing between agents.

The Bigger Picture

This study is preliminary — 30 tasks in one framework (ChatDev) with one model (GPT-5). But the methodology is sound and the direction is clear: agentic software engineering's cost structure is fundamentally different from single-shot AI coding.

The next frontier isn't faster models. It's smarter collaboration protocols that reduce unnecessary review cycles. Teams that understand this will build more cost-efficient AI systems — and more importantly, they'll know where to actually optimize.

Source: Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering — arXiv:2601.14470

llmmulti-agenttokenomicscode-reviewai-engineeringcost-optimization

Share article

Start Your Project

Ready to transform?

Discover how TeguFy can help your business simplify, amplify, and fortify with AI, Blockchain, and cutting-edge technology.

Request Consultation View Projects