GPT-5.2 - Advanced Frontier Model for Professional Work

OpenAI has launched GPT-5.2, described as the most capable model series to date for professional knowledge work. The model delivers significant improvements across various domains including spreadsheet creation, presentation building, code writing, image perception, long context understanding, tool usage, and complex multi-step project management.

Performance Benchmarks

GPT-5.2 establishes new state-of-the-art performance across multiple benchmarks. On GDPval, which measures well-specified knowledge work tasks across 44 occupations, GPT-5.2 Thinking outperforms or matches industry professionals in 70.9% of comparisons. The model achieves 55.6% on SWE-Bench Pro for software engineering tasks and 80.0% on SWE-bench Verified.

In scientific domains, GPT-5.2 demonstrates exceptional capabilities with 92.4% on GPQA Diamond for science questions, 88.7% on CharXiv Reasoning for scientific figure questions with Python support, and perfect 100% performance on AIME 2025 competition math problems. For advanced mathematics, it achieves 40.3% on FrontierMath Tier 1-3 and 14.6% on Tier 4.

Enhanced Professional Capabilities

The model shows marked improvements in creating professional work products. Early testers report that GPT-5.2 produces outputs 11 times faster and at less than 1% of the cost compared to expert professionals, suggesting significant potential for accelerating professional work when combined with human oversight.

For coding tasks, GPT-5.2 delivers stronger performance in debugging production code, implementing feature requests, refactoring large codebases, and shipping fixes end-to-end with minimal manual intervention. The model particularly excels at front-end development and complex UI work involving 3D elements.

Improved Accuracy and Reliability

GPT-5.2 demonstrates 30% fewer hallucinations compared to GPT-5.1 on de-identified ChatGPT queries, making it more dependable for research, writing, analysis, and decision support tasks. The model achieves near-perfect accuracy on long-context reasoning tasks, successfully handling documents with hundreds of thousands of tokens while maintaining coherence and accuracy.

In vision tasks, GPT-5.2 cuts error rates approximately in half for chart reasoning and software interface understanding, enabling more accurate interpretation of dashboards, product screenshots, technical diagrams, and visual reports.

Tool Calling Excellence

The model achieves 98.7% on Tau2-bench Telecom, demonstrating exceptional ability to reliably use tools across lengthy, multi-turn tasks. This translates to stronger end-to-end workflows including customer support resolution, multi-system data retrieval, analysis execution, and final output generation with fewer breakdowns between steps.

ChatGPT Integration

GPT-5.2 is available in ChatGPT through three variants:

GPT-5.2 Instant: A fast, capable model for everyday work and learning
GPT-5.2 Thinking: Designed for deeper work on complex tasks requiring greater polish
GPT-5.2 Pro: The smartest option for difficult questions where higher quality justifies longer wait times

Safety Enhancements

GPT-5.2 builds upon OpenAI's safe completion research, teaching the model to provide helpful answers while maintaining safety boundaries. The model shows meaningful improvements in responding to sensitive conversations, including prompts indicating suicide or self-harm, mental health distress, or emotional reliance on the model.

API Availability and Pricing

In the API, GPT-5.2 Thinking is available at $1.75 per million input tokens and $14 per million output tokens, with a 90% discount on cached inputs. GPT-5.2 Pro is priced at $21 per million input tokens and $168 per million output tokens. Despite higher per-token costs compared to GPT-5.1, the model's greater token efficiency often results in lower overall costs for achieving given quality levels.

Partner Testimonials

Early adopters including Notion, Box, Shopify, Harvey, Zoom, Databricks, Hex, Triple Whale, Cognition, Warp, Charlie Labs, JetBrains, and Augment Code report exceptional performance in long-horizon reasoning, tool-calling, agentic data science, document analysis, and agentic coding tasks. Several partners note the model enables architectural simplifications, reducing complex multi-agent systems to single mega-agents with superior performance.

View source Back to news