What Claude Opus 4.8 Actually Ships
Anthropic's latest flagship model lands today with one promise: same price, meaningfully better results. Opus 4.8 builds on 4.7 with across-the-board benchmark improvements — and a few features that actually matter for builders shipping AI-powered products.
The Numbers That Stand Out
| Benchmark | Opus 4.8 Result | Context |
|---|---|---|
| Online-Mind2Web (browser agent) | 84% | Beats Opus 4.7 and GPT-5.5 |
| CursorBench | Exceeds prior Opus across every effort level | Fewer steps for same intelligence |
| Legal Agent Benchmark | Highest recorded score | First model above 10% on all-pass standard |
| Super-Agent (end-to-end) | Only model to complete every case | Beats GPT-5.5 at parity cost |
Fast Mode is 3x Cheaper
The fast mode — where Opus 4.8 works at 2.5x speed — is now three times cheaper than it was for previous models. This is a meaningful cost reduction for teams running high-volume agentic workloads.
Claude Code Gets Dynamic Workflows
Claude Code now has a "dynamic workflows" feature designed for very large-scale problems. Early testers describe it as better at asking the right questions, catching its own mistakes, pushing back when a plan isn't sound, and building confidence around complex multi-service explorations before making big changes.
The Honesty Upgrade
Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked. The model proactively flags uncertainties about its work — something earlier Opus versions sometimes glossed over. Early testers confirm: less confident-bullshit, more actual collaboration.
What This Means for Builders
- Coding agents get fewer tool-call steps for the same intelligence output
- Browser automation hits 84% on Online-Mind2Web — a jump that makes reliable computer-use actually viable
- Legal/professional workflows break the 10% all-pass threshold for the first time
- Cost per fast-mode token dropped 3x — relevant for high-volume production deployments