OpenAI has released a research preview of GPT-5.3-Codex-Spark, a compact variant of GPT-5.3-Codex built specifically for real-time coding applications. This model represents the initial achievement from OpenAI's collaboration with Cerebras announced in January. Codex-Spark is engineered to provide nearly instantaneous responses when running on ultra-low latency infrastructure, generating over 1,000 tokens per second while maintaining strong capabilities for practical coding applications.
The model is being made available through Cerebras as a research preview exclusively for ChatGPT Pro users, enabling developers to begin experimenting while OpenAI and Cerebras expand datacenter infrastructure, strengthen the complete user experience, and prepare to deploy larger frontier models.
Speed and Intelligence
Codex-Spark has been designed for interactive scenarios where response time is as crucial as model capability. Users can work with the model in real-time, modifying or redirecting its output during execution, and quickly iterating with near-instantaneous feedback. Due to its speed optimization, Codex-Spark maintains a streamlined default approach: implementing minimal, focused modifications and running tests only upon request.
Coding Performance
Codex-Spark demonstrates robust performance as a compact model optimized for rapid inference. On SWE-Bench Pro and Terminal-Bench 2.0, which assess agentic software engineering abilities, GPT-5.3-Codex-Spark achieves strong results while completing tasks significantly faster than GPT-5.3-Codex.
Latency Enhancements Across All Models
During Codex-Spark's development, it became clear that model speed represented only one component of enabling real-time collaboration-comprehensive latency reduction throughout the entire request-response pipeline was essential. OpenAI has implemented system-wide latency optimizations that will benefit all models. These improvements include streamlining response streaming between client and server, reconstructing critical inference stack components, and redesigning session initialization for faster first token delivery and improved responsiveness during iteration. By implementing a persistent WebSocket connection and optimizing the Responses API, OpenAI achieved an 80% reduction in client/server roundtrip overhead, 30% decrease in per-token overhead, and 50% improvement in time-to-first-token.
Cerebras Partnership
Codex-Spark operates on Cerebras' Wafer Scale Engine 3, a specialized AI accelerator built for high-velocity inference that provides Codex with a latency-focused serving tier. OpenAI collaborated with Cerebras to integrate this low-latency capability into the existing production serving infrastructure, enabling seamless operation within Codex and preparing for future model deployments.
GPUs continue to serve as the foundation for OpenAI's training and inference operations, providing the most economical tokens for general usage. Cerebras enhances this foundation by specializing in workflows requiring minimal latency, creating a more responsive experience during iterative work. Both GPUs and Cerebras can be utilized together for individual workloads to achieve optimal performance.
Availability and Specifications
Codex-Spark is launching as a research preview for ChatGPT Pro users in the current versions of the Codex application, command-line interface, and VS Code extension. Running on specialized low-latency hardware means usage is managed through separate rate limits that may be adjusted based on demand during the preview period. OpenAI is also providing API access to a select group of design partners to understand integration preferences. Access will expand progressively as OpenAI refines the integration based on actual usage patterns.
Currently, Codex-Spark operates with text-only input, features a 128k context window, and is the initial offering in a series of ultra-fast models. As OpenAI learns alongside the developer community about optimal applications for fast models in coding, additional capabilities will be introduced, including larger models, extended context windows, and multimodal input support.
Codex-Spark incorporates identical safety training to OpenAI's primary models, including cyber-relevant training. OpenAI evaluated Codex-Spark through standard deployment procedures, including baseline assessments for cyber and other capabilities, determining it does not pose a credible risk of reaching OpenAI's Preparedness Framework threshold for advanced capability in cybersecurity or biology.
Future Directions
Codex-Spark represents the initial phase toward a Codex system with two complementary operating modes: extended reasoning and execution capabilities, and real-time collaboration for quick iteration. Eventually, these modes will integrate-Codex will maintain an interactive feedback loop while assigning longer-duration tasks to background sub-agents, or distributing tasks across multiple models concurrently for comprehensive and rapid results.
As models advance in capability, interaction speed increasingly becomes a limiting factor. Ultra-fast inference addresses this constraint, creating a more intuitive Codex experience and broadening possibilities for converting ideas into functional software.