Anthropic Launches Claude Opus 4.8 with Enhanced Honesty, Effort Controls, and Dynamic Workflows

Anthropic has released Claude Opus 4.8, an upgrade to Claude Opus 4.7 that delivers improvements across benchmarks and serves as a more effective collaborator. It is available at the same price as its predecessor.

Opus 4.8 launches alongside several new features. Users on claude.ai now have control over how much effort Claude dedicates to a task. Claude Code introduces a "dynamic workflows" feature that enables it to handle very large-scale problems. Additionally, fast mode for Opus 4.8-where the model operates at 2.5× speed-is now three times cheaper than it was for previous models.

Opus 4.8 Capabilities

Opus 4.8 shows gains over its predecessor and competing models on tests of coding, agentic skills, reasoning, and practical knowledge work tasks. Full details and a broader range of capability evaluations are provided in the Claude Opus 4.8 System Card.

Collaborating with Opus 4.8

Early testers have found Claude Opus 4.8 to be more reliable and sharper in judgment when performing agentic tasks. Highlights from tester feedback include:

Cursor (Tom Pritchard, Staff Engineer): Opus 4.8 demonstrates noticeably better judgment in Claude Code-asking the right questions, catching its own mistakes, and pushing back when a plan isn't sound.
Wordware (Kay Zhu, Co-Founder and CTO): On their Super-Agent benchmark, Opus 4.8 is the only model to complete every case end-to-end, outperforming prior Opus models and GPT-5.5 at cost parity.
Cursor (Michael Truell, Co-Founder and CEO): On CursorBench, Opus 4.8 exceeds prior Opus models at every effort level with meaningfully more efficient tool calling.
Casetext (Niko Grupen, Head of Applied Research): Opus 4.8 achieved the highest score recorded on their Legal Agent Benchmark and is the first model to break 10% overall on the all-pass standard.
Jasper (Katie Parrott, Staff Writer): Opus 4.8 is faster, easier to collaborate with, and better at maintaining context and style direction across long sessions.
Browserbase (Miguel Gonzalez, Tech Lead): Opus 4.8 is the strongest computer-use and browser-agent model they've tested, scoring 84% on Online-Mind2Web-a meaningful jump over both Opus 4.7 and GPT-5.5.
Cognition/Devin (Scott Wu, CEO): Opus 4.8 uses tools cleanly and follows instructions with the consistency needed for autonomous engineering workloads, fixing comment-verbosity and tool-calling issues from Opus 4.7.
Bridgewater (Michael Ran, Sr. Investment Associate): Opus 4.8 delivered consistently higher-quality analysis than prior Opus models with richer, more information-dense outputs and a better signal-to-noise ratio.
Thomson Reuters (Joel Hron, CTO): Across CoCounsel Legal, Opus 4.8 delivered meaningful improvements in consistency and reasoning quality for high-stakes professional workflows.
Databricks (Hanlin Tang, CTO, Neural Networks): Opus 4.8 sets a new bar for enterprise AI, unlocking a step change in agentic reasoning and multimodal strength at 61% cheaper token cost than Opus 4.7.
Hebbia (Aabhas Sharma, CTO): Opus 4.8 delivers the same strong quality as Opus 4.7 with noticeably better citation precision and more token efficiency on retrieval.

Improved Honesty

One of the most notable improvements in Opus 4.8 is its honesty. While all of Anthropic's models are trained to avoid making unsupported claims, AI models sometimes jump to conclusions. Early testers report that Opus 4.8 is more likely to flag uncertainties and less likely to make unsupported claims. Anthropic's evaluations show that Opus 4.8 is approximately four times less likely than its predecessor to let flaws in code it has written pass unremarked.

Alignment Assessment

Anthropic's Alignment team conducted a detailed assessment before release and concluded that Opus 4.8 "reaches new highs on measures of prosocial traits like supporting user autonomy and acting in the user's best interest." Rates of misaligned behavior-such as deception or cooperation with misuse-are substantially lower than Opus 4.7 and comparable to Anthropic's best-aligned model, Claude Mythos Preview. The full alignment assessment and pre-deployment safety tests are reported in the Claude Opus 4.8 System Card.

Additional Launches

Alongside Opus 4.8, Anthropic is introducing:

Dynamic workflows: Available in research preview, this Claude Code feature allows Claude to plan work and run hundreds of parallel subagents in a single session, then verify outputs before reporting back. For example, Claude Code with Opus 4.8 can carry out codebase-scale migrations across hundreds of thousands of lines of code from kickoff to merge. Dynamic workflows are available in Claude Code for Enterprise, Team, and Max plans.
Effort control in claude.ai and Cowork: A new control alongside the model selector lets users choose how much effort Claude puts into a response. Higher effort settings prompt deeper thinking for better responses; lower settings yield faster responses and slower rate-limit consumption. Available on all plans.
System entries in Messages API: Developers can now include system entries inside the messages array, allowing mid-task instruction updates without breaking the prompt cache or routing through a user turn. This supports updating permissions, token budgets, or environment context as an agent runs.

A Note on Effort

Opus 4.8 defaults to high effort, which Anthropic considers the best overall balance of quality and user experience. On coding tasks, this effort level uses a similar number of tokens as Opus 4.7's default but with better performance. Users can select "extra" (xhigh in Claude Code) or "max" for spending more tokens on better results; "extra" is recommended for difficult tasks and long-running asynchronous workflows. Rate limits in Claude Code have been increased to accommodate higher token usage.

What's Next

Opus 4.8 represents a modest but tangible improvement over its predecessor. Anthropic is also working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost.

Beyond that, Anthropic plans to release a new class of model with even higher intelligence than Opus. As part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work. Models at this capability level require stronger cyber safeguards before general release, and Anthropic expects to bring Mythos-class models to all customers in the coming weeks.

Availability

Claude Opus 4.8 is available everywhere today. Pricing for regular usage is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens. Fast mode pricing is $10 per million input tokens and $50 per million output tokens. Developers can use claude-opus-4-8 via the Claude API.

Footnotes

Terminal-Bench 2.1: Scores for all models were reported using the Terminus-2 public harness. GPT-5.5's reported score with the Codex CLI harness is 83.4%.
OSWorld-Verified: Changes were made to the OSWorld-Verified evaluation methodology to more accurately reflect real-world model performance; the Opus 4.7 score has been updated to 82.3%. More details are in the System Card.
Finance Agent v2: Gemini 3.5 Flash scores 57.9% on Finance Agent v2, a significant improvement over Gemini 3.1 Pro.

View source Back to news