OpenAI Launches GPT-5.5: A New Intelligence Tier for Real-World Tasks

OpenAI is releasing GPT-5.5, described as the company's smartest and most intuitive model to date, representing the next step toward a new way of accomplishing work on a computer.

GPT-5.5 grasps user intent more quickly and can shoulder more of the workload independently. It excels at writing and debugging code, online research, data analysis, document and spreadsheet creation, software operation, and moving across tools until a task is complete. Rather than requiring careful management of every step, users can hand GPT-5.5 a messy, multi-part task and rely on it to plan, use tools, verify its own work, navigate ambiguity, and persist until finished.

The most pronounced improvements appear in agentic coding, computer use, knowledge work, and early scientific research-domains where progress hinges on reasoning across context and taking action over time. GPT-5.5 delivers this intelligence upgrade without sacrificing speed: it matches GPT-5.4's per-token latency in real-world serving while performing at a significantly higher level. It also consumes substantially fewer tokens to complete the same Codex tasks, making it both more efficient and more capable.

OpenAI is releasing GPT-5.5 with the company's strongest set of safeguards to date, designed to reduce misuse while preserving access for beneficial work. The model was evaluated across OpenAI's full suite of safety and preparedness frameworks, tested by internal and external red-teamers, subjected to targeted testing for advanced cybersecurity and biology capabilities, and refined with feedback from nearly 200 trusted early-access partners before release.

GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, while GPT-5.5 Pro is rolling out to Pro, Business, and Enterprise users in ChatGPT. API deployments require different safeguards, and OpenAI is working closely with partners and customers on the safety and security requirements for serving it at scale. GPT-5.5 and GPT-5.5 Pro will come to the API very soon.

Model Capabilities

OpenAI is building global infrastructure for agentic AI, enabling people and businesses worldwide to get work done with AI. Over the past year, AI has dramatically accelerated software engineering. With GPT-5.5 in Codex and ChatGPT, that same transformation is beginning to extend into scientific research and the broader work people do on computers.

Across these domains, GPT-5.5 is not just more intelligent-it is more efficient in how it works through problems, often reaching higher-quality outputs with fewer tokens and fewer retries. On Artificial Analysis's Coding Index, GPT-5.5 delivers state-of-the-art intelligence at half the cost of competitive frontier coding models.

Agentic Coding

GPT-5.5 is OpenAI's strongest agentic coding model to date. On Terminal-Bench 2.0, which tests complex command-line workflows requiring planning, iteration, and tool coordination, it achieves a state-of-the-art accuracy of 82.7%. On SWE-Bench Pro, which evaluates real-world GitHub issue resolution, it reaches 58.6%, solving more tasks end-to-end in a single pass than previous models. On Expert-SWE, OpenAI's internal frontier eval for long-horizon coding tasks with a median estimated human completion time of 20 hours, GPT-5.5 also outperforms GPT-5.4.

Across all three evals, GPT-5.5 improves on GPT-5.4's scores while using fewer tokens.

The model's coding strengths are especially evident in Codex, where it can take on engineering work ranging from implementation and refactors to debugging, testing, and validation. Early testing suggests GPT-5.5 is better at the behaviors real engineering work depends on-holding context across large systems, reasoning through ambiguous failures, checking assumptions with tools, and carrying changes through the surrounding codebase.

Beyond benchmarks, early testers reported that GPT-5.5 shows a stronger ability to understand the shape of a system: why something is failing, where the fix needs to land, and what else in the codebase would be affected.

Dan Shipper, Founder and CEO of Every, described GPT-5.5 as "the first coding model I've used that has serious conceptual clarity." After launching an app, he spent days debugging a post-launch issue before bringing in one of his best engineers to rewrite part of the system. To test GPT-5.5, he effectively rewound the clock: could the model look at the broken state and produce the same kind of rewrite the engineer eventually decided on? GPT-5.4 could not. GPT-5.5 could.

Pietro Schirano, CEO of MagicPath, saw a similar leap when GPT-5.5 merged a branch with hundreds of frontend and refactor changes into a main branch that had also changed substantially, resolving the work in one shot in about 20 minutes.

Senior engineers who tested the model said GPT-5.5 was noticeably stronger than GPT-5.4 and Claude Opus 4.7 at reasoning and autonomy, catching issues in advance and predicting testing and review needs without explicit prompting. In one case, an engineer asked it to re-architect a comment system in a collaborative markdown editor and returned to a 12-diff stack that was nearly complete. Others said they needed surprisingly little implementation correction and felt more confident in GPT-5.5's plans compared with GPT-5.4.

One engineer at NVIDIA with early access went as far as to say: "Losing access to GPT-5.5 feels like I've had a limb amputated."

Knowledge Work

The same strengths that make GPT-5.5 excel at coding also make it powerful for everyday computer work. Because the model is better at understanding intent, it moves more naturally through the full loop of knowledge work: finding information, understanding what matters, using tools, checking output, and turning raw material into something useful.

In Codex, GPT-5.5 outperforms GPT-5.4 at generating documents, spreadsheets, and slide presentations. Alpha testers said it outperformed past models on work like operational research, spreadsheet modeling, and turning messy business inputs into plans. When combined with Codex's computer use skills, GPT-5.5 brings the experience closer to the feeling that the model can actually use the computer alongside the user: seeing what's on screen, clicking, typing, navigating interfaces, and moving across tools with precision.

Teams at OpenAI are already using these strengths in real workflows. More than 85% of the company uses Codex every week across functions including software engineering, finance, communications, marketing, data science, and product management. The Comms team used GPT-5.5 in Codex to analyze six months of speaking request data, build a scoring and risk framework, and validate an automated Slack agent so low-risk requests could be handled automatically while higher-risk requests still route to human review. The Finance team used Codex to review 24,771 K-1 tax forms totaling 71,637 pages, accelerating the task by two weeks compared to the prior year. On the Go-to-Market team, an employee automated generating weekly business reports, saving 5–10 hours per week.

In ChatGPT, GPT-5.5 Thinking unlocks faster help for harder problems, with smarter and more concise answers to help users move through complex work more efficiently. It excels at professional work like coding, research, information synthesis and analysis, and document-heavy tasks, especially when using plugins.

In GPT-5.5 Pro, early testers are seeing a significant step up in both the difficulty and quality of work ChatGPT can take on, with latency improvements that make it much more practical for demanding tasks. Compared to GPT-5.4 Pro, testers found GPT-5.5 Pro's responses significantly more comprehensive, well-structured, accurate, relevant, and useful, with especially strong performance in business, legal, education, and data science.

GPT-5.5 reaches state-of-the-art performance across multiple benchmarks reflecting this kind of work. On GDPval, which tests agents' abilities to produce well-specified knowledge work across 44 occupations, GPT-5.5 scores 84.9%. On OSWorld-Verified, which measures whether a model can operate real computer environments on its own, it reaches 78.7%. On Tau2-bench Telecom, which tests complex customer-service workflows, it reaches 98.0% without prompt tuning. GPT-5.5 also performs strongly across other knowledge work benchmarks: 60.0% on FinanceAgent, 88.5% on internal investment-banking modeling tasks, and 54.1% on OfficeQA Pro.

Scientific Research

GPT-5.5 also shows gains on scientific and technical research workflows, which require more than answering a hard question. Researchers need to explore an idea, gather evidence, test assumptions, interpret results, and decide what to try next. GPT-5.5 is better at persisting across that loop than other models.

Notably, GPT-5.5 shows a clear improvement over GPT-5.4 on GeneBench, a new eval focusing on multi-stage scientific data analysis in genetics and quantitative biology. These problems require models to reason about potentially ambiguous or errorful data with minimal supervisory guidance, address realistic obstacles such as hidden confounders or QC failures, and correctly implement and interpret modern statistical methods.

Similarly, on BixBench, a benchmark designed around real-world bioinformatics and data analysis, GPT-5.5 achieved leading performance among models with published scores. The model's scientific capabilities are now strong enough to meaningfully accelerate progress at the frontiers of biomedical research as a genuine co-scientist.

An internal version of GPT-5.5 with a custom harness helped discover a new proof about Ramsey numbers, one of the central objects in combinatorics. GPT-5.5 found a proof of a longstanding asymptotic fact about off-diagonal Ramsey numbers, later verified in Lean. The result is a concrete example of GPT-5.5 contributing not just code or explanation, but a surprising and useful mathematical argument in a core research area.

Early testers used GPT-5.5 Pro in ChatGPT less like a one-shot answer engine and more like a research partner: critiquing manuscripts over multiple passes, stress-testing technical arguments, proposing analyses, and working with code, notes, and PDF context.

Derya Unutmaz, an immunology professor and researcher at the Jackson Laboratory for Genomic Medicine, used GPT-5.5 Pro to analyze a gene-expression dataset with 62 samples and nearly 28,000 genes, producing a detailed research report that surfaced key questions and insights-work he said would have taken his team months.

Bartosz Naskręcki, assistant professor of mathematics at Adam Mickiewicz University in Poznań, Poland, used GPT-5.5 in Codex to build an algebraic-geometry app from a single prompt in 11 minutes, visualizing the intersection of quadratic surfaces and converting the resulting curve into a Weierstrass model.

Next-Generation Inference Efficiency

Serving GPT-5.5 at GPT-5.4 latency required rethinking inference as an integrated system rather than a set of isolated optimizations. GPT-5.5 was co-designed for, trained with, and served on NVIDIA GB200 and GB300 NVL72 systems. Codex and GPT-5.5 were instrumental in achieving OpenAI's performance targets. Codex helped the team move faster from idea to benchmarkable implementation, sketching approaches, wiring experiments, and helping identify which optimizations were worth deeper investment. GPT-5.5 itself helped find and implement key improvements in the stack-effectively, the model helped improve the infrastructure that serves it.

One such improvement involved load balancing and partitioning heuristics. Before GPT-5.5, OpenAI split requests on an accelerator into a fixed number of chunks to balance work across computing cores. However, a pre-determined number of static chunks is not optimal for all traffic shapes. To better utilize GPUs, Codex analyzed weeks' worth of production traffic patterns and wrote custom heuristic algorithms to optimally partition and balance work. The effort had an outsized impact, increasing token generation speeds by over 20%.

Advancing Cybersecurity for Everyone's Safety

Preparing the world for models that are very good at finding and patching security vulnerabilities is a team sport and will require the entire ecosystem to work hard to build resilience, with democratized model access and iterative deployment for the next era of cyber defense.

Frontier models are becoming increasingly more capable in cybersecurity. Those capabilities will become broadly distributed, and OpenAI believes the best path forward is ensuring they can be put to use for accelerating cyber defense and strengthening the ecosystem.

GPT-5.5 is an incremental but important step towards AI that can solve some of the world's toughest challenges like cybersecurity. With GPT-5.2, OpenAI proactively deployed the necessary cyber safeguards to limit potential cyber abuse; now with GPT-5.5, stricter classifiers for potential cyber risk are being deployed, which some users may find annoying initially as they are tuned over time.

Key cybersecurity measures include:

Industry-leading safeguards for this level of cyber capability. OpenAI first introduced cyber-specific safeguards with GPT-5.2, which have been continuously tested, refined, and built on in subsequent deployments. For GPT-5.5, tighter controls are in place around higher-risk activity and sensitive cyber requests, with added protections for repeated misuse. External experts collaborated for months to develop, test, and iterate on the robustness of these safeguards.
Expanded access to accelerate cyber defense at every level. Cyber-permissive models are available through Trusted Access for Cyber, starting with Codex, which includes expanded access to GPT-5.5's advanced cybersecurity capabilities with fewer restrictions for verified users meeting certain trust signals at launch. Organizations responsible for defending critical infrastructure can apply to access cyber-permissive models like GPT-5.4-Cyber.
Collaboration with government partners to help protect critical infrastructure. OpenAI is exploring how advanced AI can support the defensive work of trusted officials responsible for systems people rely on, from digital systems securing taxpayer data to the power grid and water supplies in local communities.

OpenAI is treating the biological/chemical and cybersecurity capabilities of GPT-5.5 as High under OpenAI's Preparedness Framework. While GPT-5.5 did not reach the Critical cybersecurity capability level, evaluations and testing showed that its cybersecurity capabilities are a step up compared to GPT-5.4.

Availability and Pricing

GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, and GPT-5.5 Pro is rolling out to Pro, Business, and Enterprise users in ChatGPT. GPT-5.5 and GPT-5.5 Pro will come to the API very soon.

In ChatGPT, GPT-5.5 Thinking is available to Plus, Pro, Business, and Enterprise users. GPT-5.5 Pro, designed for even harder questions and higher-accuracy work, is available to Pro, Business, and Enterprise users.

In Codex, GPT-5.5 is available for Plus, Pro, Business, Enterprise, Edu, and Go plans with a 400K context window. GPT-5.5 is also available in Fast mode, generating tokens 1.5x faster for 2.5x the cost.

For API developers, gpt-5.5 will soon be available in the Responses and Chat Completions APIs at $5 per 1M input tokens and $30 per 1M output tokens, with a 1M context window. Batch and Flex pricing are available at half the standard API rate, while Priority processing is available at 2.5x the standard rate. gpt-5.5-pro will also be released in the API for even higher accuracy, priced at $30 per 1M input tokens and $180 per 1M output tokens.

While GPT-5.5 is priced higher than GPT-5.4, it is both more intelligent and much more token efficient. In Codex, OpenAI has carefully tuned the experience so GPT-5.5 delivers better results with fewer tokens than GPT-5.4 for most users, while continuing to offer generous usage across subscription levels.

View source Back to news