Claude Sonnet 4.6: Major Upgrade with Enhanced Coding and Computer Use

Claude Sonnet 4.6 represents Anthropic's most advanced Sonnet model to date, delivering comprehensive improvements across coding, computer use, long-context reasoning, agent planning, knowledge work, and design capabilities. The model includes a beta feature of a 1 million token context window.

Availability and Access

For users on Free and Pro plans, Claude Sonnet 4.6 has become the default model in claude.ai and Claude Cowork. The pricing structure remains unchanged from Sonnet 4.5, beginning at $3/$15 per million tokens.

Coding Excellence and Developer Preference

Sonnet 4.6 demonstrates substantially enhanced coding capabilities available to a broader user base. Early-access developers show strong preference for Sonnet 4.6 compared to its predecessor, with notable improvements in consistency and instruction following. Remarkably, developers frequently favor it over Anthropic's November 2025 flagship model, Claude Opus 4.5.

Enterprise-Level Performance

Capabilities that previously necessitated an Opus-class model-particularly for real-world, economically valuable office tasks-are now accessible through Sonnet 4.6. The model exhibits significant advancement in computer use abilities compared to earlier Sonnet versions.

Safety Evaluations

Comprehensive safety assessments of Sonnet 4.6 demonstrate safety levels equal to or exceeding Anthropic's other recent Claude models. Safety researchers characterize Sonnet 4.6 as possessing "a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment."

Computer Use Innovation

Many organizations operate software that resists easy automation: legacy systems and tools predating modern API interfaces. Traditional approaches required custom connectors for AI integration. However, models capable of computer use like humans fundamentally alter this dynamic.

Anthropic pioneered general-purpose computer-using models in October 2024. While initially described as "experimental-at times cumbersome and error-prone," rapid improvements have followed. OSWorld, the benchmark for AI computer use, demonstrates substantial progress. This benchmark presents hundreds of tasks across real applications (Chrome, LibreOffice, VS Code, and others) on simulated computers. Models interact through virtual mouse clicks and keyboard inputs, without special APIs or custom connectors.

Across sixteen months, Anthropic's Sonnet models have achieved consistent OSWorld improvements. Early Sonnet 4.6 users report human-level performance in tasks like complex spreadsheet navigation or multi-step web form completion across multiple browser tabs.

While the model hasn't yet matched skilled humans in all computer use scenarios, the progress rate remains exceptional. Computer use has become significantly more practical for various work tasks, with substantially more capable models approaching feasibility.

Security Considerations

Computer use presents risks: malicious actors may attempt model hijacking through website-hidden instructions in prompt injection attacks. Anthropic has enhanced models' prompt injection resistance-safety evaluations indicate Sonnet 4.6 significantly improves upon Sonnet 4.5 and matches Opus 4.6 performance.

Benchmark Performance

Claude Sonnet 4.6 demonstrates improvements across all benchmarks, approaching Opus-level intelligence at a more practical price point for expanded use cases.

In Claude Code, early testing revealed users preferred Sonnet 4.6 over Sonnet 4.5 approximately 70% of the time. Users noted superior context reading before code modification and better logic consolidation without duplication, reducing frustration during extended sessions.

Users preferred Sonnet 4.6 to Opus 4.5 (Anthropic's November frontier model) 59% of the time, rating it significantly less prone to overengineering and "laziness," with markedly improved instruction following. Reports indicated fewer false success claims, reduced hallucinations, and more consistent multi-step task completion.

Extended Context Capabilities

Sonnet 4.6's 1M token context window accommodates entire codebases, extensive contracts, or numerous research papers in single requests. Critically, Sonnet 4.6 maintains effective reasoning across this entire context, enabling superior long-horizon planning. This proved particularly evident in the Vending-Bench Arena evaluation, testing model business simulation abilities with competitive elements.

Sonnet 4.6 employed an innovative strategy: heavy capacity investment during the initial ten simulated months, exceeding competitor spending, then sharp profitability focus in the final phase. This strategic timing secured substantial competitive advantage.

Customer Feedback

Early customers report comprehensive improvements, particularly in frontend code and financial analysis. Customers independently characterize Sonnet 4.6's visual outputs as notably refined, featuring superior layouts, animations, and design sensibility compared to previous models. Production-quality results require fewer iterations.

Multiple enterprise customers highlight specific improvements:

Databricks: Sonnet 4.6 matches Opus 4.6 on OfficeQA, measuring enterprise document comprehension (charts, PDFs, tables) and fact-based reasoning
Replit: Exceptional performance-to-cost ratio with superior orchestration evaluation results and complex agentic workload handling
Cursor: Notable improvements for long-horizon tasks and difficult problems
GitHub: Excellence in complex code fixes, particularly when searching large codebases
Cognition: Meaningful gap closure with Opus on bug detection capabilities
Windsurf: Frontier-level reasoning in smaller, cost-effective form factor
Hebbia: Significant answer match rate improvement in Financial Services Benchmark
Box: 15 percentage point improvement over Sonnet 4.5 in heavy reasoning Q&A
Pace: 94% accuracy on insurance benchmark for computer use
Bolt: Frontier-level results on complex app builds and bug-fixing
Rakuten: Superior iOS code quality with better spec compliance and architecture
Zapier: Strong performance on branched and multi-step tasks
Convey: Clear improvement in complex computer use accuracy
Triple Whale: Perfect design taste for frontend pages and data reports
Harvey: Exceptional responsiveness to direction in legal applications

Product Features and Updates

On the Claude Developer Platform, Sonnet 4.6 supports adaptive thinking, extended thinking, and beta context compaction, which automatically summarizes older context as conversations approach limits.

Anthropic's API now features enhanced web search and fetch tools that automatically write and execute code for filtering and processing search results, improving response quality and token efficiency. Code execution, memory, programmatic tool calling, tool search, and tool use examples have reached general availability.

Sonnet 4.6 delivers strong performance at any thinking effort level, even without extended thinking. Migration from Sonnet 4.5 should include exploration across the performance spectrum to optimize speed and reliability balance.

Opus 4.6 remains optimal for tasks requiring deepest reasoning, including codebase refactoring, multi-agent workflow coordination, and precision-critical problems.

Claude in Excel now supports MCP connectors, enabling Claude integration with tools like S&P Global, LSEG, Daloopa, PitchBook, Moody's, and FactSet. Users can request external context without leaving Excel. MCP connectors configured in Claude.ai automatically function in Excel for Pro, Max, Team, and Enterprise plans.

Implementation and Access

Claude Sonnet 4.6 is currently available across all Claude plans, Claude Cowork, Claude Code, Anthropic's API, and major cloud platforms. The free tier upgrade to Sonnet 4.6 by default includes file creation, connectors, skills, and compaction.

Developers can implement quickly using claude-sonnet-4-6 through the Claude API.

View source Back to news