Claude Sonnet 4.5 represents Anthropic's most capable coding model to date, establishing new benchmarks in software development, complex agent building, and computer interaction. The model demonstrates exceptional performance in reasoning and mathematical tasks.
Technical Excellence and Performance
The model achieves state-of-the-art results on the SWE-bench Verified evaluation, which assesses real-world software coding capabilities. In practical applications, it maintains focus for over 30 hours on intricate, multi-step tasks. On OSWorld, a benchmark for real-world computer tasks, Sonnet 4.5 leads with 61.4% performance, a significant improvement from the previous 42.2%.
Product Enhancements and Features
Anthropic has released major upgrades alongside the model:
- Claude Code improvements: New checkpoint feature enabling progress saving and instant rollback to previous states, refreshed terminal interface, and a native VS Code extension
- API enhancements: Context editing features and memory tools allowing agents to operate longer with greater complexity
- Application updates: Direct code execution and file creation (spreadsheets, slides, documents) within conversations
- Chrome extension: Now accessible to Max users from the waitlist
Claude Agent SDK Introduction
Developers now have access to the same infrastructure that powers Claude Code through the Claude Agent SDK. This toolkit provides the foundation used by Anthropic to create their frontier products, enabling developers to build sophisticated agents for various tasks beyond coding.
Safety and Alignment Improvements
Claude Sonnet 4.5 stands as Anthropic's most aligned frontier model, showing substantial improvements in reducing problematic behaviors including sycophancy, deception, power-seeking, and encouragement of delusional thinking. The model demonstrates enhanced resistance to prompt injection attacks, particularly important for agentic and computer use capabilities.
The release operates under Anthropic's AI Safety Level 3 (ASL-3) protections, implementing appropriate safeguards matched to model capabilities. These include classifiers designed to detect potentially dangerous inputs and outputs, particularly those related to CBRN (chemical, biological, radiological, and nuclear) weapons.
Industry Reception and Applications
Early adopters report significant improvements across various domains:
- Software Development: Organizations like Cursor and GitHub Copilot report enhanced performance in complex, codebase-spanning tasks
- Security: Hai security agents achieved 44% reduction in vulnerability intake time with 25% accuracy improvement
- Legal: State-of-the-art performance on complex litigation tasks, including full briefing cycle analysis
- Finance: Delivery of investment-grade insights requiring less human review for risk analysis and portfolio screening
- Design: Improved functionality in tools like Figma Make for creating more functional prototypes
Benchmark Performance
The model shows improved capabilities across various evaluations:
- Leading performance on coding benchmarks
- Enhanced reasoning and mathematical capabilities
- Superior domain-specific knowledge in finance, law, medicine, and STEM fields compared to previous models including Opus 4.1
Research Preview: Imagine with Claude
Anthropic has launched a temporary research preview called "Imagine with Claude," demonstrating the model's ability to generate software dynamically. This experiment showcases real-time creation and adaptation without predetermined functionality or prewritten code, available to Max subscribers for a limited period.
Availability and Pricing
Claude Sonnet 4.5 is immediately available across all platforms. Developers can access it through the Claude API using claude-sonnet-4-5. Pricing remains consistent with Claude Sonnet 4 at $3/$15 per million tokens. Anthropic recommends upgrading to Claude Sonnet 4.5 for all applications, as it provides superior performance at the same price point as a direct replacement option.