Anthropic released Claude Sonnet 5 on June 30, calling it the most agentic mid-tier model it has built and making it the default on Claude.ai for both Free and Pro users from launch day. The company said the model can plan, use tools such as browsers and terminals, and run multi-step tasks autonomously — work that until recently was the preserve of larger Opus-class systems.

Opus-class work at Sonnet prices

Benchmark scores back the pitch. Sonnet 5 posts 63.2% on SWE-bench Pro against 69.2% for the flagship Opus 4.8, and 81.2% on the OSWorld-Verified computer-use test versus 83.4% for Opus. On Terminal-Bench 2.1 it scores 80.4%, ahead of Opus 4.8's 74.6%, and it reaches 84.7% on the BrowseComp agentic-search benchmark. The pattern is consistent: Sonnet 5 lands within a few points of a far costlier model across agentic tasks.

Pricing reflects that gap. Anthropic set introductory rates of $2 per million input tokens and $10 per million output tokens through August 31, after which they rise to $3 and $15. That keeps Sonnet 5 well below Opus-tier pricing even as it approaches Opus-tier capability on several tests.

A bumpy first week

The rollout was not friction-free. Developers reported that Sonnet 5's tokenizer produces between 1.0 and 1.35 times as many tokens as its predecessor for the same text, meaning the real cost per task can run higher than the headline per-token price suggests. Anthropic also removed temperature and other sampling parameters, which broke some existing integrations that had hard-coded them.

The company acknowledged an error in how it had measured the BrowseComp benchmark and republished the chart using a 10-million-token budget methodology. It also deployed a new safety classifier aimed at a specific jailbreak technique, which it said blocks the method more than 99% of the time — at the cost of more false positives on legitimate security-coding queries, some of which are now routed to Opus 4.8.

Why Anthropic is pushing the middle tier

Anthropic's strategy leans on making capable agents cheap enough to run at scale. By pushing Opus-level autonomy into the Sonnet price band, the company is targeting developers who want agents in production without flagship costs. The early tokenizer and API complaints show the trade-offs of moving fast on a model that thousands of teams wire into live systems on day one.