Claude Opus 4.5: State-of-the-Art Coding, Agents, and Computer Use

Anthropic's Latest Model Sets New Standards

Anthropic's newest model, Claude Opus 4.5, launched today as an intelligent and efficient system that leads the industry in coding capabilities, agent functionality, and computer use. The model demonstrates significant improvements in everyday applications including deep research, slide presentations, and spreadsheet manipulation. Opus 4.5 represents a significant advancement in AI system capabilities and offers a glimpse into future workplace transformation.

Performance and Availability

Claude Opus 4.5 achieves state-of-the-art results on real-world software engineering tests, scoring highest on SWE-bench Verified among all frontier models. The model is accessible through Anthropic's applications, API, and major cloud platforms. Developers can implement it using claude-opus-4-5-20251101 through the Claude API. The pricing structure of $5 input and $25 output per million tokens makes Opus-level capabilities more widely accessible to users, teams, and enterprises.

Anthropic has also introduced updates to the Claude Developer Platform, Claude Code, and consumer applications. New features include tools for extended agent operations and enhanced integration with Excel, Chrome, and desktop environments. The Claude applications now support lengthy conversations without interruption.

User Feedback and Real-World Application

Internal testers at Anthropic consistently report that Claude Opus 4.5 manages ambiguity effectively and reasons through tradeoffs independently. The model successfully identifies and resolves complex multi-system bugs and accomplishes tasks that were challenging for Sonnet 4.5 mere weeks ago. Users describe the experience as the model simply "understanding" their needs.

Early access customers across various industries have shared positive experiences:

Grit: CEO Jeff Wang notes that Opus models represent the true state-of-the-art, now accessible at a practical price point for most tasks
GitHub: Chief Product Officer Mario Rodriguez reports the model exceeds internal coding benchmarks while reducing token usage by half
Glean: President Michele Catasta highlights superior performance compared to Sonnet 4.5 with improved token efficiency
Lovable: CTO Fabian Hedin emphasizes frontier reasoning capabilities that transform project planning
Warp: Founder Zach Lloyd reports 15% improvement over Sonnet 4.5 on Terminal Bench for long-horizon autonomous tasks

Technical Evaluation and Benchmarks

In a notable achievement, Claude Opus 4.5 scored higher than any human candidate on Anthropic's challenging performance engineering take-home exam within the prescribed 2-hour limit. While this test focuses on technical abilities and judgment under pressure rather than collaboration or communication skills, it demonstrates the model's exceptional technical proficiency.

The model shows improvements across multiple domains beyond software engineering:

Enhanced vision, reasoning, and mathematics capabilities
State-of-the-art performance in numerous benchmark categories
Superior results across 7 of 8 programming languages on SWE-bench Multilingual
10.6% improvement over Sonnet 4.5 on Aider Polyglot for complex coding problems
Significant advancement in agentic search capabilities on BrowseComp-Plus
29% higher earnings than Sonnet 4.5 on Vending-Bench for long-duration tasks

The model demonstrates creative problem-solving abilities that sometimes exceed benchmark expectations. In τ2-bench testing, Opus 4.5 found innovative solutions to customer service scenarios by identifying legitimate workarounds within policy constraints that evaluators hadn't anticipated.

Safety and Alignment Progress

According to Anthropic's system card, Claude Opus 4.5 represents their most robustly aligned model release and potentially the best-aligned frontier model from any developer. The model shows continued improvement in safety and security metrics, with reduced concerning behavior scores measuring various forms of misaligned conduct.

Opus 4.5 demonstrates enhanced resistance to prompt injection attacks, which attempt to deceive models into harmful behavior through smuggled instructions. Testing shows it's more difficult to compromise than other frontier models in the industry, providing critical security for customers using Claude for sensitive tasks.

Developer Platform Enhancements

As models become more sophisticated, they solve problems more efficiently with reduced backtracking, exploration, and verbose reasoning. Claude Opus 4.5 requires substantially fewer tokens than predecessors while achieving similar or superior outcomes.

Anthropic introduced an effort parameter in the Claude API, allowing developers to balance between minimizing time and cost versus maximizing capability. At medium effort, Opus 4.5 matches Sonnet 4.5's best SWE-bench Verified score using 76% fewer output tokens. At maximum effort, it exceeds Sonnet 4.5 performance by 4.3 percentage points while using 48% fewer tokens.

New features include:

Effort control for task optimization
Context compaction for efficiency
Advanced tool use capabilities
Context management and memory features
Support for complex multi-agent system coordination

Internal testing shows these combined techniques improved Opus 4.5's deep research evaluation performance by nearly 15 percentage points.

Product Updates and Integrations

Claude Code benefits from two major upgrades with Opus 4.5. Plan Mode now creates more accurate plans and executes more comprehensively, asking clarifying questions before building user-editable plan.md files. The desktop app version enables multiple parallel local and remote sessions for concurrent task management.

Additional product improvements include:

Automatic context summarization for extended conversations in the Claude app
Claude for Chrome availability for all Max users
Expanded Claude for Excel beta access to Max, Team, and Enterprise users
Removal of Opus-specific usage caps for qualified users
Increased overall usage limits for Max and Team Premium subscribers

These updates leverage Opus 4.5's industry-leading performance in computer use, spreadsheet handling, and long-running task management, with usage limits adjusted to support daily work requirements.

View source Back to news