The Decoder·2026年3月9日 22:41·約1分

OpenAI従業員が新たなオムニモデルを示唆

#OpenAI #マルチモーダル #音声認識 #双方向処理 #GPT-4o

TL;DR

OpenAIの従業員が新「オムニモデル」の開発を示唆し、GPT-4oの後継や双方向音声処理機能「BiDi」の実装に向けた準備が進んでいることが報じられた。

AI深層分析2026年3月9日 23:42

重要/ 5段階

深度40%

キーポイント

新オムニモデルの開発示唆

OpenAIの従業員がX上で新モデルへの期待を表明し、テキスト・画像・音声・動画などを統合処理する次世代オムニモデルの存在を示唆している。

双方向音声処理「BiDi」の試作

自然な会話を実現するための双方向音声モデル「BiDi」の試作版が存在し、リアルタイムでの割り込み処理が可能だが、長時間会話での安定性課題が残る。

GPT-5.4と今後のロードマップ

既存のGPT-5.4がコンピュータ操作機能をネイティブに統合している一方、BiDi関連機能のローンチは第2四半期以降にずれ込む可能性がある。

影響分析・編集コメントを表示

影響分析

OpenAIが音声対話の品質を飛躍的に向上させる「BiDi」技術を開発中であることは、AIアシスタントの実用性とユーザー体験に大きな影響を与える。リアルタイムでの自然な会話が可能になれば、车载システムやスマートホームなどの対話型インターフェースにおける障壁が大幅に低下し、市場の普及を加速させる要因となる。

編集コメント

OpenAIが音声対話の「自然さ」に注力していることは、次世代HCI（ヒューマンコンピュータインタラクション）の鍵を握る重要な指標である。技術的な課題（長時間会話での破綻）が解決されれば、市場リーダーとしての地位をさらに固める可能性が高い。

image

OpenAIによる新たなオムニモデルか？社員の投稿と「BiDi」と呼ばれる流出した音声プロジェクトは、同社が次なる大型のマルチモーダルアップグレードに取り組んでいることを示唆しています。

本記事「OpenAI employees hint at a new omni model」はThe Decoderで最初に公開されました。

原文を表示

OpenAI appears to be developing a new multimodal model, potentially a successor to GPT-4o.

Recent posts from OpenAI employees are fueling the speculation. Atty Eleti from the Voice team wrote that he's "so excited for what comes next" and asked users what they'd want from a new omni model. Brandon McKinzie, an OpenAI researcher with a multimodal background at Apple, responded that a potential omni model "sounds like a great idea."

OpenAI researcher Brandon McKinzie responds to speculation about a new omni model. | via X

Multimodal, or "omni," means a single model can process different formats like text, image, audio, and video instead of relying on separate models for each task. GPT-4o ("omni") was OpenAI's first model to combine text, image, and audio processing in one system. The company's latest model, GPT-5.4, already integrates "computer use" natively, meaning it can operate computer interfaces designed for humans.

According to The Information, OpenAI is also working on a new audio model called "BiDi" (bidirectional) that's designed to make conversations feel more natural. Current audio models work on a turn-by-turn basis, where the AI waits until the user finishes speaking before responding. BiDi is built to handle interruptions in real time. A prototype already exists, but it tends to break down after a few minutes of conversation. The launch could slip to the second quarter or later.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Subscribe now

この記事をシェア

LY Corp Tech Blog2026年4月20日 11:00

エンジニア以外にもCoding Agent活用を広げる架け橋に ─ 個人開発から始まった、Codex×Electron製GUIエージェント誕生秘話インタビュー

TLDR AI2026年5月19日 09:00

イーロン・マスク氏によるサム・アルトマン CEO に対する訴訟の全請求が棄却される

The Verge AI重要度42026年5月19日 04:00

ムスク対アルトマン裁判は、AI が不適切な人物に導かれていることを証明した

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む