OpenAI Launches o3 and o4-mini Advanced Reasoning Models

OpenAI has released o3 and o4-mini, their most advanced reasoning models to date, featuring multimodal capabilities, comprehensive tool integration, and significant performance improvements. The models can now reason directly with images, use multiple tools strategically, and achieve state-of-the-art results across coding, mathematics, science, and visual perception tasks while maintaining cost efficiency.

openai Apr 16, 2025

OpenAI has announced two new reasoning models, o3 and o4-mini, as successors to their previous o-series models. These represent OpenAI's most intelligent models released so far, featuring enhanced capabilities across multiple domains.

Key Model Features

OpenAI o3 is OpenAI's most powerful reasoning model, excelling in coding, mathematics, science, and visual perception tasks. The model establishes new state-of-the-art performance on benchmarks like Codeforces, SWE-bench, and MMMU. External experts report it makes 20% fewer major errors compared to OpenAI o1 on challenging real-world tasks, with particular strengths in programming, business consulting, and creative ideation.

OpenAI o4-mini offers rapid, cost-effective reasoning optimized for mathematics, coding, and visual tasks. It achieves top performance on the AIME 2024 and 2025 mathematics competitions. When given access to a Python interpreter, o4-mini reached 99.5% pass@1 and 100% consensus@8 on AIME 2025. The model provides significantly higher usage limits than o3, making it suitable for high-volume applications.

Multimodal and Tool Integration

For the first time, OpenAI's reasoning models can incorporate images directly into their reasoning process. They can interpret whiteboard photos, textbook diagrams, or hand-drawn sketches, even when images are blurry, reversed, or low quality. The models can manipulate images during reasoning, including rotating, zooming, or transforming them.

Both models have comprehensive access to tools within ChatGPT, including:

  • Web search capabilities
  • File and data analysis using Python
  • Deep reasoning about visual inputs
  • Image generation

The models are trained through reinforcement learning to determine not just how to use tools, but when to employ them strategically for problem-solving.

Performance Improvements

OpenAI observed that large-scale reinforcement learning follows a "more compute equals better performance" trend similar to GPT-series pretraining. By scaling an additional order of magnitude in both training compute and inference-time reasoning, the models continue to show clear performance gains.

At equivalent latency and cost to OpenAI o1, o3 delivers superior performance in ChatGPT. When allowed longer thinking time, its performance continues to improve.

Cost-Efficiency Advances

On the 2025 AIME mathematics competition, o3's cost-performance frontier strictly improves over o1, while o4-mini's frontier strictly improves over o3-mini. For most real-world applications, o3 and o4-mini are expected to be both smarter and cheaper than o1 and o3-mini respectively.

Safety Enhancements

OpenAI completely rebuilt their safety training data for these models, incorporating new refusal prompts for biological threats, malware generation, and jailbreak attempts. The refreshed data enables strong performance on internal refusal benchmarks.

A reasoning LLM monitor was trained to work from human-written safety specifications. When applied to biological risk, this monitor successfully flagged approximately 99% of conversations in human red-teaming campaigns.

Both models underwent OpenAI's most rigorous safety program, with evaluations across biological and chemical, cybersecurity, and AI self-improvement capabilities. Both o3 and o4-mini remain below the "High" threshold in all three categories according to OpenAI's Preparedness Framework.

Codex CLI Experiment

OpenAI introduced Codex CLI, a lightweight coding agent that runs directly from the terminal. It maximizes the reasoning capabilities of models like o3 and o4-mini, with planned support for additional API models including GPT-4.1.

The tool enables multimodal reasoning from the command line by processing screenshots or sketches combined with local code access. Codex CLI is fully open-source and available at github.com/openai/codex.

A $1 million initiative supports projects using Codex CLI and OpenAI models, with grants available in $25,000 USD increments as API credits.

Availability and Access

ChatGPT Plus, Pro, and Team users can access o3, o4-mini, and o4-mini-high in the model selector starting today, replacing o1, o3-mini, and o3-mini-high. Enterprise and Edu users will receive access within one week. Free users can try o4-mini by selecting 'Think' in the composer before submitting queries.

Developers can access both o3 and o4-mini through the Chat Completions API and Responses API. The Responses API supports reasoning summaries and the ability to preserve reasoning tokens around function calls for improved performance.

OpenAI o3-pro is expected to release in a few weeks with full tool support. Pro users currently retain access to o1-pro.

Future Direction

These updates indicate OpenAI's direction toward converging the specialized reasoning capabilities of the o-series with the natural conversational abilities and tool use of the GPT-series. By unifying these strengths, future models will support seamless, natural conversations alongside proactive tool use and advanced problem-solving.