OpenAI has released GPT-5.4 mini and nano, OpenAI's most capable small models to date. These models bring many of the strengths of GPT-5.4 to faster, more efficient architectures designed for high-volume workloads.
GPT-5.4 mini delivers substantial improvements over GPT-5 mini in coding, reasoning, multimodal understanding, and tool use, while running more than 2x faster. It also approaches the performance of the larger GPT-5.4 model on several evaluations, including SWE-Bench Pro and OSWorld-Verified.
GPT-5.4 nano is the smallest and cheapest variant of GPT-5.4, aimed at tasks where speed and cost are the top priorities. It represents a significant upgrade over GPT-5 nano. OpenAI recommends it for classification, data extraction, ranking, and coding subagents that handle simpler supporting tasks.
These models target workloads where latency directly affects the product experience: coding assistants that need to feel responsive, subagents that quickly complete supporting tasks, computer-using systems that capture and interpret screenshots, and multimodal applications that reason over images in real time. In such settings, the best model is often not the largest one-it's the one that can respond quickly, use tools reliably, and still perform well on complex professional tasks.
| GPT-5.4 (xhigh) | GPT-5.4 mini (xhigh) | GPT-5.4 nano (xhigh) | GPT-5 mini (high¹) | |
|---|---|---|---|---|
| SWE-Bench Pro (Public) | 57.7% | 54.4% | 52.4% | 45.7% |
| Terminal-Bench 2.0 | 75.1% | 60.0% | 46.3% | 38.2% |
| Toolathlon | 54.6% | 42.9% | 35.5% | 26.9% |
| GPQA Diamond | 93.0% | 88.0% | 82.8% | 81.6% |
| OSWorld-Verified | 75.0% | 72.1% | 39.0% | 42.0% |
¹ The highest reasoning_effort available for GPT-5 mini is 'high'.
Coding
GPT-5.4 mini and nano are particularly effective in coding workflows that benefit from fast iteration. The models handle targeted edits, codebase navigation, front-end generation, and debugging loops with low latency, making them well-suited for coding tasks that need to be completed at faster speeds and lower costs.
In benchmarks, GPT-5.4 mini consistently outperforms GPT-5 mini at similar latencies and approaches GPT-5.4-level pass rates while running much faster, delivering one of the strongest performance-per-latency tradeoffs for coding workflows.
Subagents
GPT-5.4 mini is also well-suited for systems that combine models of different sizes. In Codex, for example, a larger model like GPT-5.4 can handle planning, coordination, and final judgment, while delegating to GPT-5.4 mini subagents that handle narrower subtasks in parallel-such as searching a codebase, reviewing a large file, or processing supporting documents.
This pattern becomes increasingly useful as smaller models get faster and more capable. Rather than using one model for everything, developers can compose systems where larger models decide what to do and smaller models execute quickly at scale. GPT-5.4 mini is OpenAI's strongest mini model yet for that style of workflow.
Computer Use
GPT-5.4 mini also performs well on multimodal tasks, particularly those related to computer use. The model can quickly interpret screenshots of dense user interfaces to complete computer use tasks with speed. On OSWorld-Verified, GPT-5.4 mini approaches GPT-5.4 while substantially outperforming GPT-5 mini.
Availability and Pricing
GPT-5.4 mini is available in the API, Codex, and ChatGPT.
In the API, GPT-5.4 mini supports text and image inputs, tool use, function calling, web search, file search, computer use, and skills. It has a 400k context window and costs $0.75 per 1M input tokens and $4.50 per 1M output tokens.
In Codex, GPT-5.4 mini is available across the Codex app, CLI, IDE extension, and web. It uses only 30% of the GPT-5.4 quota, letting developers handle simpler coding tasks in Codex for about one-third the cost. Codex can also delegate to GPT-5.4 mini subagents so that less reasoning-intensive work runs on the cheaper model.
In ChatGPT, GPT-5.4 mini is available to Free and Go users via the "Thinking" feature in the + menu. For all other users, GPT-5.4 mini is available as a rate limit fallback for GPT-5.4 Thinking.
GPT-5.4 nano is only available in the API and costs $0.20 per 1M input tokens and $1.25 per 1M output tokens.
For more information on the models' safeguards, OpenAI directs users to the System Card addendum on the Deployment Safety Hub.
Detailed Benchmarks
Coding
| GPT-5.4 (xhigh) | GPT-5.4 mini (xhigh) | GPT-5.4 nano (xhigh) | GPT-5 mini (high¹) | |
|---|---|---|---|---|
| SWE-bench Pro (Public) | 57.7% | 54.4% | 52.4% | 45.7% |
| Terminal-Bench 2.0 | 75.1% | 60.0% | 46.3% | 38.2% |
Tool-calling
| GPT-5.4 (xhigh) | GPT-5.4 mini (xhigh) | GPT-5.4 nano (xhigh) | GPT-5 mini (high¹) | |
|---|---|---|---|---|
| MCP Atlas | 67.2% | 57.7% | 56.1% | 47.6% |
| Toolathlon | 54.6% | 42.9% | 35.5% | 26.9% |
| τ2-bench (telecom) | 98.9% | 93.4% | 92.5% | 74.1% |
Intelligence
| GPT-5.4 (xhigh) | GPT-5.4 mini (xhigh) | GPT-5.4 nano (xhigh) | GPT-5 mini (high¹) | |
|---|---|---|---|---|
| GPQA Diamond | 93.0% | 88.0% | 82.8% | 81.6% |
| HLE w/ tool | 52.1% | 41.5% | 37.7% | 31.6% |
| HLE w/o tools | 39.8% | 28.2% | 24.3% | 18.3% |
Multimodal / Vision / CUA
| GPT-5.4 (xhigh) | GPT-5.4 mini (xhigh) | GPT-5.4 nano (xhigh) | GPT-5 mini (high¹) | |
|---|---|---|---|---|
| OSWorld-Verified | 75.0% | 72.1% | 39.0% | 42.0% |
| MMMUPro w/ Python | 81.5% | 78.0% | 69.5% | 74.1% |
| MMMUPro | 81.2% | 76.6% | 66.1% | 67.5% |
| OmniDocBench 1.5 (no tools)² - lower is better | 0.109 | 0.1263 | 0.2419 | 0.1791 |
Long Context
| GPT-5.4 (xhigh) | GPT-5.4 mini (xhigh) | GPT-5.4 nano (xhigh) | GPT-5 mini (high¹) | |
|---|---|---|---|---|
| OpenAI MRCR v2 8-needle 64K–128K | 86.0% | 47.7% | 44.2% | 35.1% |
| OpenAI MRCR v2 8-needle 128K–256K | 79.3% | 33.6% | 33.1% | 19.4% |
| Graphwalks BFS 0K–128K | 93.1% | 76.3% | 73.4% | 73.4% |
| Graphwalks parents 0–128K (accuracy) | 89.8% | 71.5% | 50.8% | 64.3% |
¹ The highest reasoning_effort available for GPT-5 mini is 'high'. ² Overall Edit Distance. OmniDocBench was run with reasoning_effort set to 'none' to reflect low-cost, low-latency performance.