OpenAI has released a research preview of GPT-5.3-Codex-Spark, a new AI model specifically designed for real-time programming tasks.
GPT-5.3-Codex-Spark is the latest coding model from OpenAI and is currently available in preview. It is a lightweight variant of GPT-5.3-Codex and represents the first result of the collaboration between OpenAI and Cerebras. It is optimized for extremely low latency and can generate more than 1,000 tokens per second, making code adjustments visible almost instantaneously.
Focused on speed and interaction
While previous Codex models primarily excel at long-term, autonomous tasks, Codex-Spark focuses explicitly on direct collaboration with developers. The model is intended for quick tasks: small code adjustments, refactoring logic, or refining interfaces without any wait time. By default, it does not run tests unless the user specifically requests it.
Runs on Cerebras hardware
Codex-Spark is hosted on Cerebras’ Wafer Scale Engine 3, an accelerator built for high inference speeds. According to OpenAI, not only has the model been accelerated, but the entire request-response chain has as well. By using a persistent WebSocket connection, among other things, the time to first token has been reduced by 50%.
The preview is available starting today for ChatGPT Pro users via the Codex app, CLI, and VS Code extension. The model has a context window of 128,000 tokens and can currently only process text. Usage falls outside standard limits but may be temporarily restricted during periods of high demand.
