OpenAI has released GPT-5.4 across ChatGPT (as GPT-5.4 Thinking), the API, and Codex. It represents OpenAI's most capable and efficient frontier model for professional work. A GPT-5.4 Pro variant is also available in ChatGPT and the API for users who need maximum performance on complex tasks.
GPT-5.4 unifies OpenAI's recent advances in reasoning, coding, and agentic workflows into a single frontier model. It incorporates the industry-leading coding capabilities of GPT-5.3-Codex while enhancing how the model operates across tools, software environments, and professional tasks involving spreadsheets, presentations, and documents. The result is a model that completes complex real work accurately, effectively, and efficiently-delivering what users ask for with less back and forth.
In ChatGPT, GPT-5.4 Thinking can now present an upfront plan of its reasoning, allowing users to adjust course mid-response while it works, arriving at a final output more closely aligned with user needs without additional turns. GPT-5.4 Thinking also improves deep web research, particularly for highly specific queries, while better maintaining context for questions requiring longer thinking. These improvements translate to higher-quality answers that arrive faster and stay relevant.
In Codex and the API, GPT-5.4 is the first general-purpose model OpenAI has released with native, state-of-the-art computer-use capabilities, enabling agents to operate computers and carry out complex workflows across applications. It supports up to 1M tokens of context, allowing agents to plan, execute, and verify tasks across long horizons. GPT-5.4 also improves how models work across large ecosystems of tools and connectors with tool search, helping agents find and use the right tools more efficiently without sacrificing intelligence. Additionally, GPT-5.4 is OpenAI's most token-efficient reasoning model yet, using significantly fewer tokens to solve problems compared to GPT-5.2-resulting in reduced token usage and faster speeds.
| GPT-5.4 | GPT-5.3-Codex | GPT-5.2 | |
|---|---|---|---|
| GDPval (wins or ties) | 83.0% | 70.9% | 70.9% |
| SWE-Bench Pro (Public) | 57.7% | 56.8% | 55.6% |
| OSWorld-Verified | 75.0% | 74.0% | 47.3% |
| Toolathlon | 54.6% | 51.9% | 46.3% |
| BrowseComp | 82.7% | 77.3% | 65.8% |
Knowledge Work
Building on GPT-5.2's general reasoning capabilities, GPT-5.4 delivers even more consistent and polished results on real-world professional tasks.
On GDPval, which tests agents' abilities to produce well-specified knowledge work across 44 occupations, GPT-5.4 achieves a new state of the art, matching or exceeding industry professionals in 83.0% of comparisons, compared to 70.9% for GPT-5.2.
OpenAI placed particular focus on improving GPT-5.4's ability to create and edit spreadsheets, presentations, and documents. On an internal benchmark of spreadsheet modeling tasks typical of a junior investment banking analyst, GPT-5.4 achieves a mean score of 87.3%, compared to 68.4% for GPT-5.2. On a set of presentation evaluation prompts, human raters preferred presentations from GPT-5.4 68.0% of the time over those from GPT-5.2, citing stronger aesthetics, greater visual variety, and more effective use of image generation.
These capabilities are available in ChatGPT via GPT-5.4 Thinking or Pro. Enterprise customers can also use the newly released ChatGPT for Excel add-in. Updated spreadsheet and presentation skills are available in Codex and the API.
To make GPT-5.4 better at real-world work, OpenAI continued driving down hallucinations and errors. GPT-5.4 is OpenAI's most factual model to date: on a set of de-identified prompts where users flagged factual errors, GPT-5.4's individual claims are 33% less likely to be false and its full responses are 18% less likely to contain any errors, relative to GPT-5.2.
Computer Use and Vision
GPT-5.4 is OpenAI's first general-purpose model with native computer-use capabilities and represents a major step forward for developers and agents. It is the best model currently available for developers building agents that complete real tasks across websites and software systems.
OpenAI designed GPT-5.4 to be performant across a wide range of computer-use workloads. It excels at writing code to operate computers via libraries like Playwright, as well as issuing mouse and keyboard commands in response to screenshots. Its behavior is steerable via developer messages, and developers can configure the model's safety behavior for different risk tolerances by specifying custom confirmation policies.
On OSWorld-Verified, which measures a model's ability to navigate a desktop environment through screenshots and keyboard/mouse actions, GPT-5.4 achieves a state-of-the-art 75.0% success rate, far exceeding GPT-5.2's 47.3% and surpassing human performance at 72.4%.
On WebArena-Verified, which tests browser use, GPT-5.4 achieves a leading 67.3% success rate using both DOM- and screenshot-driven interaction, compared to GPT-5.2's 65.4%. On Online-Mind2Web, GPT-5.4 achieves a 92.8% success rate using screenshot-based observations alone, improving over ChatGPT Atlas's Agent Mode at 70.9%.
GPT-5.4's improved computer use builds on enhanced general visual perception capabilities. On MMMU-Pro, a test of visual understanding and reasoning, GPT-5.4 achieves 81.2% without tool use, up from GPT-5.2's 79.5%. On OmniDocBench, GPT-5.4 without reasoning effort achieves an average normalized edit distance error of 0.109, improved from GPT-5.2's 0.140.
OpenAI is also improving visual understanding for dense, high-resolution images. Starting with GPT-5.4, a new original image input detail level supports full-fidelity perception up to 10.24M total pixels or 6000-pixel maximum dimension, whichever is lower; the high level now supports up to 2.56M total pixels or a 2048-pixel maximum dimension.
Coding
GPT-5.4 combines the coding strengths of GPT-5.3-Codex with leading knowledge work and computer-use capabilities, which matter most on longer-running tasks where the model can use tools, iterate, and push work further with less manual intervention. It matches or outperforms GPT-5.3-Codex on SWE-Bench Pro while offering lower latency across reasoning efforts.
When toggled on, /fast mode in Codex delivers up to 1.5x faster token velocity with GPT-5.4-the same model and same intelligence, just faster. Developers can access the same fast speeds via the API using priority processing.
In evaluation and internal testing, OpenAI found that GPT-5.4 excels at complex frontend tasks, with noticeably more aesthetic and more functional results than any previously launched models.
As a demonstration of the model's improved computer-use and coding capabilities working together, OpenAI is also releasing an experimental Codex skill called "Playwright (Interactive)", which allows Codex to visually debug web and Electron apps-it can even test an app as it builds it.
Tool Use
With GPT-5.4, OpenAI has significantly improved how models work with external tools. Agents can now operate across larger tool ecosystems, choose the right tools more reliably, and complete multi-step workflows with lower cost and latency.
Tool Search
In the API, GPT-5.4 introduces tool search, which allows models to work efficiently when given many tools. Previously, all tool definitions were included in the prompt upfront, potentially adding thousands or tens of thousands of tokens to every request. With tool search, GPT-5.4 instead receives a lightweight list of available tools along with a tool search capability. When the model needs to use a tool, it looks up that tool's definition and appends it to the conversation at that moment.
This approach dramatically reduces the number of tokens required for tool-heavy workflows and preserves the cache, making requests faster and cheaper. It also enables agents to reliably work with much larger tool ecosystems. On 250 tasks from Scale's MCP Atlas benchmark with all 36 MCP servers enabled, the tool-search configuration reduced total token usage by 47% while achieving the same accuracy.
Agentic Tool Calling
GPT-5.4 also improves tool calling, making it more accurate and efficient when deciding when and how to use tools during reasoning, particularly in the API. Compared to GPT-5.2, it achieves higher accuracy in fewer turns on Toolathlon, a benchmark testing how well AI agents can use real-world tools and APIs to complete multi-step tasks.
Improved Web Search
GPT-5.4 is better at agentic web search. On BrowseComp, which measures how well AI agents can persistently browse the web to find hard-to-locate information, GPT-5.4 leaps 17% absolute over GPT-5.2, and GPT-5.4 Pro sets a new state of the art at 89.3%.
In practice, GPT-5.4 Thinking is stronger at answering questions that require pulling together information from many web sources. It can more persistently search across multiple rounds to identify the most relevant sources, particularly for "needle-in-a-haystack" questions, and synthesize them into clear, well-reasoned answers.
Steerability
GPT-5.4 Thinking in ChatGPT now outlines its work with a preamble for longer, more complex queries, similar to how Codex outlines its approach. Users can also add instructions or adjust direction mid-response, making it easier to guide the model toward exact outcomes without starting over. This feature is available on chatgpt.com and the Android app, with iOS support coming soon.
The model can also think longer on difficult tasks while maintaining stronger awareness of earlier conversation steps, allowing it to handle longer workflows and more complex prompts while keeping answers coherent and relevant.
Safety
OpenAI has continued improving the safeguards introduced with GPT-5.3-Codex while preparing GPT-5.4 for deployment. Similar to GPT-5.3-Codex, OpenAI is treating GPT-5.4 as High cyber capability under its Preparedness Framework, deploying it with corresponding protections as documented in the system card. These include an expanded cyber safety stack with monitoring systems, trusted access controls, and asynchronous blocking for higher-risk requests for customers on Zero Data Retention (ZDR) surfaces.
Because cybersecurity capabilities are inherently dual-use, OpenAI maintains a precautionary approach to deployment while continuing to calibrate policies and classifiers. For certain ZDR surface customers, request-level blocking remains part of the cyber risk mitigation stack; because classifiers are still improving, some false positives may occur. These updates aim to improve how safeguards operate in practice, including reducing unnecessary refusals and overly caveated responses, while preserving strong protections against misuse.
OpenAI has also continued safety research on Chain-of-Thought (CoT) monitorability. As part of this work, a new open-source evaluation called CoT controllability has been introduced, measuring whether models can deliberately obfuscate their reasoning to evade monitoring. GPT-5.4 Thinking's ability to control its CoT is found to be low, which is a positive safety property, suggesting the model lacks the ability to hide its reasoning and that CoT monitoring remains effective.
Availability and Pricing
GPT-5.4 is rolling out gradually across ChatGPT and Codex. In the API, GPT-5.4 is available as gpt-5.4. GPT-5.4 Pro is also available as gpt-5.4-pro for developers needing maximum performance.
In ChatGPT, GPT-5.4 Thinking is available to Plus, Team, and Pro users, replacing GPT-5.2 Thinking. GPT-5.2 Thinking will remain available for three months for paid users in the Legacy Models section, after which it will be retired on June 5, 2026. Enterprise and Edu plans can enable early access via admin settings. GPT-5.4 Pro is available to Pro and Enterprise plans.
GPT-5.4 is OpenAI's first mainline reasoning model incorporating the frontier coding capabilities of GPT-5.3-Codex, rolling out across ChatGPT, the API, and Codex. The GPT-5.4 naming reflects that jump and simplifies model choice when using Codex.
GPT-5.4 in Codex includes experimental support for a 1M context window. Requests exceeding the standard 272K context window count against usage limits at 2x the normal rate.
In the API, GPT-5.4 is priced higher per token than GPT-5.2 to reflect improved capabilities, while its greater token efficiency helps reduce total tokens required for many tasks. Batch and Flex pricing are available at half the standard rate, and Priority processing at twice the standard rate.
| API model | Input price | Cached input price | Output price |
|---|---|---|---|
| gpt-5.2 | $1.75 / M tokens | $0.175 / M tokens | $14 / M tokens |
| gpt-5.4 | $2.50 / M tokens | $0.25 / M tokens | $15 / M tokens |
| gpt-5.2-pro | $21 / M tokens | - | $168 / M tokens |
| gpt-5.4-pro | $30 / M tokens | - | $180 / M tokens |