OpenAI News·2026年2月12日 19:00·約2分

GPT-5.3-Codex-Sparkの紹介

#LLM #AIコーディング #低遅延推論 #ハードウェア最適化

TL;DR

初のリアルタイムコーディングモデルGPT-5.3-Codex-Sparkを発表。生成速度15倍、128kコンテキストを実現し、ChatGPT Proユーザー向けに研究プレビュー開始。

AI深層分析2026年2月24日 14:40

重要/ 5段階

キーポイント

GPT-5.3-Codex-Sparkは、リアルタイムコーディングに特化したOpenAIの新モデル

Cerebrasとの提携による超低遅延ハードウェア向け最適化で、1000トークン/秒以上の高速処理を実現

研究プレビューとしてChatGPT Proユーザーに提供開始、長期的タスクと即時作業の両方をCodexでサポート

影響分析・編集コメントを表示

影響分析

この発表は、AI支援コーディングを「バッチ処理」から「リアルタイム協業」の段階へ引き上げる重要な進展。開発者体験の根本的な変革と、ハードウェア連携による最適化の新たな方向性を示している。

編集コメント

「遅延」と「知性」のトレードオフを解消する試み。開発者のフローに溶け込むAIの実現に向けた具体的な一歩。

OpenAIは、リアルタイムコーディングに特化した新モデル「GPT-5.3-Codex-Spark」の研究プレビューをリリースした。これは、既存のGPT-5.3-Codexの小型版であり、Cerebras社との提携（2024年1月発表）における最初の成果となる。本モデルは、超低遅延ハードウェア上で動作させた際に「ほぼ瞬時」と感じられる速度を目指して最適化されており、高い性能を維持したまま毎秒1000トークン以上を生成できる。

Codex-Sparkの主な目的は、Codexを用いたリアルタイムな共同作業を可能にすることである。これにより、開発者はコードに対して対象を絞った編集を行ったり、ロジックを再構築したり、インターフェースを改良したりする際、即座に結果を確認しながら作業を進められる。従来の最先端モデルが長時間の自律的なタスク実行に優れていたのに対し、Codex-Sparkは「その場で仕事を片付ける」即応性に重点を置いている。ユーザーはモデルの作業を中断または方向転換させつつ、瞬時の応答で迅速に反復作業を行うことができる。

技術的特徴としては、128Kトークンのコンテキストウィンドウを持ち、現時点ではテキストのみ対応している。速度重視のため、デフォルトの動作は軽量に設定されており、最小限の対象編集を行い、指示がない限り自動的にテストを実行しない。SWE-Bench ProやTerminal-Bench 2.0といったエージェント的ソフトウェアエンジニアリング能力を評価するベンチマークでは、GPT-5.3-Codexと比較してタスク完了時間を大幅に短縮しつつ、高いパフォーマンスを示している。

OpenAIは、リアルタイム協業にはモデル速度だけでなく、要求から応答までの全パイプラインにおける遅延の低減も不可欠であるとし、その改善にも取り組んでいる。現在、本モデルはCerebras上で、ChatGPT Proユーザー向けの研究プレビューとして提供されている。開発者が早期に実験を始められるようにする一方で、OpenAIはCerebrasと協力してデータセンター容量の拡大、エンドツーエンドのユーザー体験の強化、そしてより大規模な最先端モデルの展開を進めていく方針だ。プレビュー期間中は独自の利用制限が設けられ、標準的な利用制限にはカウントされないが、需要が高い場合にはアクセス制限や一時的な待機が発生する可能性がある。開発者からの利用フィードバックを学び、今後のアクセス拡大に活かしていくことを期待している。

原文を表示

Introducing GPT-5.3-Codex-Spark | OpenAISwitch toChatGPT(opens in a new window)

API Platform(opens in a new window)

Introducing GPT‑5.3‑Codex‑Spark

An ultra-fast model for real-time coding in Codex.

Loading…ShareToday, we’re releasing a research preview of GPT‑5.3‑Codex‑Spark, a smaller version of GPT‑5.3‑Codex, and our first model designed for real-time coding. Codex-Spark marks the first milestone in our partnership with Cerebras, which we announced in January⁠. Codex-Spark is optimized to feel near-instant when served on ultra-low latency hardware—delivering more than 1000 tokens per second while remaining highly capable for real-world coding tasks.

We’re sharing Codex-Spark on Cerebras as a research preview to ChatGPT Pro users so that developers can start experimenting early while we work with Cerebras to ramp up datacenter capacity, harden the end-to-end user experience, and deploy our larger frontier models.

Our latest frontier models have shown particular strengths in their ability to do long-running tasks, working autonomously for hours, days or weeks without intervention. Codex-Spark is our first model designed specifically for working with Codex in real-time—making targeted edits, reshaping logic, or refining interfaces and seeing results immediately. With Codex-Spark, Codex now supports both long-running, ambitious tasks and getting work done in the moment. We hope to learn from how developers use it and incorporate feedback as we continue to expand access.

At launch, Codex-Spark has a 128k context window and is text-only. During the research preview, Codex-Spark will have its own rate limits and usage will not count towards standard rate limits. However, when demand is high, you may see limited access or temporary queuing as we balance reliability across users.

Codex-Spark is optimized for interactive work where latency matters as much as intelligence. You can collaborate with the model in real time, interrupting or redirecting it as it works, and rapidly iterate with near-instant responses. Because it’s tuned for speed, Codex-Spark keeps its default working style lightweight: it makes minimal, targeted edits and doesn’t automatically run tests unless you ask it to.

Codex-Spark is a highly capable small model optimized for fast inference. On SWE-Bench Pro and Terminal-Bench 2.0, two benchmarks evaluating agentic software engineering capability, GPT‑5.3‑Codex‑Spark demonstrates strong performance while accomplishing the tasks in a fraction of the time compared to GPT‑5.3‑Codex.

Duration is estimated as the sum of (1) output generation time (output tokens ÷ sampling speed), (2) prefill time (prefill tokens ÷ prefill speed), (3) total tool execution time, and (4) total network overhead.

Latency improvements for all models

As we trained Codex-Spark, it became apparent that model speed was just part of the equation for real-time collaboration—we also needed to reduce latency across the full request-response pipeline. We implemented end-to-end latency improvements in our harness that will benefit all models. Under the hood, we streamlined how responses stream from client to server and back, rewrote key pieces of our inference stack, and reworked how sessions are initialized so that the first visible token appears sooner and Codex stays responsive as you iterate. Through the introduction of a persistent WebSocket connection and targeted optimizations inside of Responses API, we reduced overhead per client/server roundtrip by 80%, per-token overhead by 30%, and time-to-first-token by 50%. The WebSocket path is enabled for Codex-Spark by default and will become the default for all models soon.

Codex-Spark runs on Cerebras’ Wafer Scale Engine 3⁠(opens in a new window)—a purpose-built AI accelerator for high-speed inference giving Codex a latency-first serving tier. We partnered with Cerebras to add this low-latency path to the same production serving stack as the rest of our fleet, so it works seamlessly across Codex and sets us up to support future models.

“What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible—new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning.”

GPUs remain foundational across our training and inference pipelines and deliver the most cost effective tokens for broad usage. Cerebras complements that foundation by excelling at workflows that demand extremely low latency, tightening the end-to-end loop so Codex feels more responsive as you iterate. GPUs and Cerebras can be combined for single workloads to reach the best performance.

Codex-Spark is rolling out today as a research preview for ChatGPT Pro users in the latest versions of the Codex app, CLI, and VS Code extension. Because it runs on specialized low-latency hardware, usage is governed by a separate rate limit that may adjust based on demand during th

この記事をシェア

TechCrunch AI重要度42026年7月3日 03:31

Anthropic、サムスンと新カスタムチップの検討中

MarkTechPost重要度42026年7月5日 11:31

Qwen の元リーダーが「ハイブリッド思考」の誤りと、なぜ今「エージェント」を支持するのか

Simon Willison Blog2026年7月5日 10:00

sqlite-utils 4.0rc2、主にClaude Fable（約149.25ドル分）が執筆

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む