Claude Opus 4.6: Enhanced Coding and Agentic Capabilities

Anthropic has launched Claude Opus 4.6, a significant upgrade to their flagship AI model with dramatically improved coding, planning, and autonomous task capabilities. The enhanced model demonstrates superior performance in sustained agentic tasks, operates more reliably within large codebases, and features advanced code review and debugging abilities for self-correction. Notably, this marks the first Opus-class model to include a 1M token context window in beta.

Opus 4.6 achieves state-of-the-art results across multiple benchmarks. It leads all frontier models on Terminal-Bench 2.0 for agentic coding evaluation and Humanity's Last Exam for complex multidisciplinary reasoning. On GDPval-AA, which measures performance on economically valuable knowledge work across finance, legal, and other professional domains, Opus 4.6 surpasses the next-best industry model by approximately 144 Elo points and its predecessor by 190 points. The model also excels on BrowseComp for locating difficult online information.

Key Capabilities and Performance

The model brings enhanced focus to challenging task components while moving efficiently through straightforward sections, demonstrating improved judgment on ambiguous problems and maintaining productivity during extended sessions. It exhibits deeper, more careful reasoning patterns before settling on answers, producing superior results on complex problems.

On long-context performance, Opus 4.6 shows dramatic improvements in information retention and retrieval across hundreds of thousands of tokens. In the 8-needle 1M variant of MRCR v2, a needle-in-a-haystack benchmark, Opus 4.6 achieves 76% compared to just 18.5% for Sonnet 4.5, representing a qualitative shift in usable context capacity.

Early Access Partner Feedback

Organizations testing Opus 4.6 report transformative improvements. NBIM observed the model producing superior results in 38 of 40 cybersecurity investigations compared to Claude 4.5 models. Rakuten successfully deployed it to autonomously manage 13 issues and assign 12 tasks across a 50-person organization spanning 6 repositories in a single day. Harvey reported a 90.2% BigLaw Bench score with 40% perfect scores.

Developer Platform Updates

The Claude Developer Platform introduces several new features:

Adaptive thinking: The model can autonomously determine when deeper reasoning would be beneficial
Effort levels: Four configurable levels (low, medium, high, max) for controlling intelligence, speed, and cost balance
Context compaction (beta): Automatic summarization and replacement of older context for longer-running tasks
128K output tokens: Support for larger outputs without breaking tasks into multiple requests
US-only inference: Available for workloads requiring US-based processing at 1.1× token pricing

Product Enhancements

Claude Code now features agent teams in research preview, allowing multiple agents to work in parallel and coordinate autonomously. This capability proves particularly effective for tasks that split into independent, read-heavy work like codebase reviews.

Integration improvements include enhanced Claude in Excel functionality for long-running and complex tasks, with improved planning capabilities and unstructured data handling. Claude in PowerPoint launches in research preview, enabling users to process data in Excel and create visual presentations while maintaining brand consistency.

Safety Advancements

Despite intelligence gains, Opus 4.6 maintains robust safety standards. Automated behavioral audits show low rates of misaligned behaviors including deception, sycophancy, and cooperation with misuse. The model demonstrates the lowest rate of over-refusals among recent Claude models while maintaining alignment comparable to Opus 4.5.

Anthropic conducted comprehensive safety evaluations including new tests for user wellbeing, complex dangerous request refusal capabilities, and the model's ability to detect its own potentially harmful actions. New cybersecurity probes help track different forms of potential misuse, while the model is actively deployed for defensive cybersecurity applications.

Availability and Pricing

Claude Opus 4.6 is immediately available on claude.ai, Anthropic's API, and major cloud platforms. Developers can access it via the Claude API using claude-opus-4-6. Standard pricing remains at $5/$25 per million tokens, with premium pricing of $10/$37.50 per million tokens for prompts exceeding 200k tokens.

View source Back to news