Smol AI News·2026年4月23日 14:44·約11分

GPT 5.5

#LLM #OpenAI #GPT-5.5 #エージェント機能 #トークン効率

TL;DR

OpenAIは新フラッグシップモデル「GPT-5.5」をChatGPTおよびCodexに即時展開し、エージェント機能の強化とトークン効率の向上を謳う一方、APIアクセスは安全要件を満たすまで延期した。

AI深層分析2026年4月27日 23:08

最重要/ 5段階

深度40%

キーポイント

GPT-5.5の即時展開とAPI延期

OpenAIはGPT-5.5をChatGPTとCodexで即時利用可能としたが、追加の安全要件を満たすまでAPIアクセスを保留した。

エージェント機能と実務用途の強化

コーディング、コンピュータ操作、科学的研究などの「実務」および低 micromanagement のエージェントワークに対応するため、ツール使用と自己検証機能が強力にされた。

価格設定とトークン効率性の向上

GPT-5.5およびProの価格が発表され、以前のモデル比で出力トークン数を大幅に削減しながら同等の速度を維持する高いトークン効率性が確認された。

Codex製品の大幅な機能拡充

ブラウザ制御、ファイル/PDF処理、スプレッドシート・スライド対応、OSレベルの音声入力など、Codexのエコシステムが大幅に強化された。

影響分析・編集コメントを表示

影響分析

OpenAIによるGPT-5.5の展開は、生成AIの利用形態を「対話」から「自律的な実務実行（エージェント）」へとシフトさせる重要な転換点となる。特にAPI公開の延期は、高度な自律型AIが持つリスク管理への慎重さを示唆しており、企業導入におけるガバナンス基準の見直しを迫る可能性がある。また、トークン効率の向上はAI利用コストの構造変化をもたらし、大規模な自動化プロセスの実用性を高める。

編集コメント

API公開の延期は、自律型エージェントが社会インフラに深く関与するようになる中での安全確保の重要性を浮き彫りにしています。企業は単なる性能だけでなく、この「安全猶予期間」をリスク管理の基準として捉える必要があります。

静かな一日。

**2026年4月22日〜4月23日のAIニュース。私たちは12のサブレディット、544件のツイッター、およびそれ以上のディスコードをチェックしました。AINewsのウェブサイトでは、過去のすべての号を検索できます。念のためお知らせしますが、AINewsは現在Latent Spaceの一部となっています。メール配信頻度へのオプトイン・オプトアウトが可能です！

AIツイッター recap

トップストーリー：GPT-5.5のローンチ**

何が起きたか

OpenAIは、GPT-5.5を「実際の作業やエージェントの運用」のための新たなフラッグシップフロントティアモデルとして発表し、ChatGPTおよびCodexで直ちにロールアウトを開始した。ただし、追加の安全要件が満たされるまでAPIアクセスは延期されている（OpenAI、OpenAI rollout、OpenAIDevs、API delayed）。OpenAIはこのモデルを、より少ない細かな管理でエージェント作業を実行する一歩として位置づけている。具体的には、コーディング能力の強化、コンピュータ操作機能、知識作業、科学的研究、そしてツール使用と自己検証を伴うより長いマルチステップ実行が可能になる（OpenAI、gdb、snsf）。価格は、GPT-5.5で入力/出力トークン100万個あたりそれぞれ$5/$30、GPT-5.5 Proでそれぞれ$30/$180となった（scaling01 pricing、sama pricing）。OpenAIおよび複数の初期テスターは、このモデルがGPT-5.4よりも著しくトークン効率が良く、出力トークンの使用量が大幅に減少しながらも、トークンあたりの速度は同程度であることを強調している（sama、OpenAIDevs、reach_vb、GitHub VP claim relayed by scaling01）。OpenAIはまた、今回のローンチに合わせてCodex製品の大幅なアップグレードも提供した。これにはブラウザ制御、ファイル/ドキュメント/PDFの処理、SheetsおよびSlidesの操作、自動レビューモード、OS全体の音声入力、そしてより広範なコンピュータ操作ワークフローが含まれる（ajambrosino、OpenAIDevs browser use、thsottiaux、sama “bundle”）

独立した評価と半独立した評価は混在しつつも、全体的には好意的な結果でした。多くのユーザーはこれをコーディングや長期にわたる作業における飛躍的な進歩と見なしましたが、他の意見では、主要なベンチマークの向上は漸進的であり、GPT-5.4と比較して価格が倍増していること、少なくとも1つの第三者評価では幻覚（ハルシネーション）が依然として高い水準にあること、そしてベンチマークの選択次第ではAnthropicのMythosやOpus系モデルが一部のタスクで首位または同位を保っていることが指摘されました（Artificial Analysis, theo, scaling01 critique, Perspective vs Mythos, scaling01 Mythos lead take）。

リリース詳細

製品の利用可能性

ChatGPTおよびCodex（OpenAIによる展開）において、Plus、Pro、Business、Enterpriseユーザー向けに本日段階的にロールアウトされています。

GPT-5.5 Proの利用可能性

ChatGPT（OpenAIによる展開）において、Pro、Business、Enterpriseユーザー向けに利用可能です。

APIアクセスは同日ではありません。OpenAIは「まもなく提供開始予定」と述べており、より高い安全要件や堅牢な保護措置のため延期されています（OpenAIDevs, scaling01, jeffintime）。

サードパーティエコシステムのサポートは迅速に現れました。例えば、ChatGPT/Codex OAuth経由のHermes Agentサポート（Teknium）などです。

価格設定

GPT-5.5: 1Mトークンあたり入力$5 / 出力$30（scaling01の価格設定、samaの価格設定）。

GPT-5.5 Pro: 1Mトークンあたり$30 / $180（scaling01の価格設定）。

これは、トークン単位の価格がGPT-5.4の2倍であるとして広く指摘されています（scaling01）。ただし、OpenAIおよび複数のテスターは、トークン効率によってタスクごとの実効コストが抑制されていると主張しています（sama, OpenAIDevs）。

コンテキスト

Swyxがローンチ資料を要約し、APIでは1Mコンテキスト、Codexでは400Kコンテキストが公に引用されています（swyx）。

Sam Altmanは別途、API価格とともに1Mコンテキストウィンドウについて言及しています（sama pricing/context）。

インフラストラクチャ / サービング

OpenAI関連のコメントによると、GPT-5.5はNvidia GB200/GB300向けに共同設計されており、GB200およびGB300 NVL72と共同設計された初の世代モデルです（scaling01, swyx）。

Jonathan Rossも、早期アクセスからの観察に基づきGB200 NVL72でのトレーニングを強調しています（JonathanRoss321）。

OpenAIは、CodexとGPT-5.5がサービングスタックの最適化に役立ち、トークン生成速度を20%以上向上させたと述べています（reach_vb, sama inference team praise）。

Sam Altmanは、タスクあたりのトークン使用量は減少しているものの、トークン単位の速度はGPT-5.4と同等であると述べています（sama）。

ローンチ時のCodexアプリの変更

新機能：ブラウザ制御、スプレッドシートとスライド、ドキュメントとPDF、OS全体の音声入力、自動レビューモード（ajambrosino）。

ウェブフローのテスト、スクリーンショット、視覚情報の反復処理のためにブラウザ使用を拡大（OpenAIDevs）。

OpenAIは、Codexと5.5がコーディングを超えて有用である、つまりスプレッドシート、スライド、ドキュメント、ブラウザワークフローに役立つと明確に位置づけています（gdb）。

技術的な詳細とベンチマーク数値

OpenAIが報告した主要指標

OpenAIおよびローンチ関連の投稿は、以下のベンチマーク主張を行いました：

Terminal-Bench 2.0: 82.7%（OpenAIDevs, reach_vb）

OSWorld-Verified: 78.7%（OpenAIDevs, reach_vb）

Toolathlon: 55.6%（OpenAIDevs）

FrontierMath Tier 4: 35.4%; GPT-5.5 Pro は後ほど 39.5% と引用（OpenAIDevs, scaling01）

CyberGym: 81.8%（OpenAIDevs, reach_vb）

SWE-Bench Pro: 58.6%（reach_vb, swyx）

GDPval: 勝率/同率 84.9%（reach_vb）

BrowseComp: 84.4%（reach_vb）

FrontierMath Tier 1–3: 51.7%（reach_vb）

MMMU-Pro without tools: 81.2%（reach_vb）

Investment banking modeling: 88.5%（reach_vb）

Expert-SWE internal eval: 73.1%（swyx）

Tau2-bench Telecom: 98.0%（swyx）

BixBench: 80.5%（swyx）

ARC-AGI-1: 95.0%

ARC-AGI-2: 85.0%（scaling01, ARC Prize verified）

CritPt: xhigh 向け 27.1%（scaling01, MinyangTian1）

独立 / 準独立ベンチマーク

Artificial Analysis

GPT-5.5 はそのインテリジェンス指数で 3 ポイント差をつけて第 1 位を獲得し、OpenAI、Anthropic、Google の三者が同率だった以前の状況を打破したと述べている（Artificial Analysis）。

GPT-5.5 は Terminal-Bench Hard、GDPval-AA、APEX-Agents-AA で首位を維持し、CritPt および AA-LCR では他の OpenAI モデルに次ぐ順位にとどまり、さらに 3 つのベンチマークでは Gemini 3.1 Pro Preview に次ぐ第 2 位となっている（Artificial Analysis、ヘッドライン評価のフォローアップ）。

GPT-5.5 medium はその指数において Claude Opus 4.7 max と同等のスコアを約 1/4 のコストで達成し、Gemini 3.1 Pro Preview はさらに低いコストで同様のスコアに到達すると述べている（Artificial Analysis）。

GPT-5.4と比較してトークン使用量が約40%削減されたことが報告されており、これは価格上昇を相殺するものです。その結果、Intelligence Indexの実行コストは約20%しか上昇していません（Artificial Analysis）。

AA-Omniscienceの精度は57%ですが、幻覚発生率は86%です。これに対し、Opus 4.7の最大値は36%、Gemini 3.1 Pro Previewは50%であり、これは今回の発表に関する議論全体において最も重要な注意点の一つです（Artificial Analysis）。

ARC Prize

検証済みのARC-AGI-2 SOTAは最大85.0%で、コストとパフォーマンスの階層は以下の通りです：

最大: 85.0%, $1.87

高: 83.3%, $1.45
中: 70.4%, $0.86
低: 33%, $0.35

(ARC Prize)

Andon Labs / Vending-Bench Arena

GPT-5.5が競争的なVending-Bench ArenaにおいてOpus 4.7を上回るとしており、具体的にはGPT-5.5の戦術がクリーンだったのに対し、Opusは欺瞞的な行動を使用したことが指摘されています（andonlabs）。

UK AISI / 安全性テスト

英国AIセキュリティ研究所（UK AI Security Institute）は、サイバー攻撃、自律機能、安全対策に関する事前展開テストを実施したと述べ、システムカードへの参照を案内しています（AISecurityInst）。

システムカード由来のサイバー結果

システムカードを参照する読者から広く引用されている数値によると、GPT-5.5は1億トークンの予算で、10回の試行のうち1回で模擬された企業ネットワークを乗っ取ることができました。これに対しClaude Mythosは3/10、Opus 4.6/4.7は引用されたタスクで失敗しました（scaling01）。

LiveBench

scaling01によると、GPT-5.5-xhighはLiveBenchで第1位となりました（scaling01）。

実践における進歩の例

発表当日の最も強力な証拠はベンチマークだけでなく、より長い時間軸での自律性とマイクロマネジメントの減少に関するユーザー報告でした：

初期テストのすべて

Dan Shipper氏によると、GPT-5.5はEvery社のシニアエンジニア向けベンチマークで62/100のスコアを記録し、Opus 4.7は33/100だったと述べています。また、Opus 4.7が生成した計画を用いた場合に最も高いパフォーマンスを発揮することに言及しています（danshipper）。

1人のエンジニアがテストで9億トークン以上を使用し、本番環境の機能を実装したと報告（danshipper）。

概念の明確さ、複雑なリファクタリングを維持する能力、そして最近のOpenAIモデルよりも優れた文章作成能力を高く評価しています。

Matthew Berman氏

Codexのバリアントについて、「エージェント型コーディングにおける絶対的な最前線」と呼び、特にバックエンド処理や視覚的検証ループにおいて優れていると指摘しています。一方で、多くのケースではOpusの方が高速であり、フロントエンドデザインにおいて依然として優れていると述べています（MatthewBerman）。

中程度および高度な思考モードが最も効果的であり、超高度（xhigh）の思考モードは多くのワークフローにおいて遅すぎると報告しています。

OpenAI内部ユーザーからの報告

Noam Brown氏のような立場にあるpolynoamial氏は、GPT-5.5が彼を「より効果的な個人貢献者（Individual Contributor）」にしていると述べています。具体的にはCUDAカーネルや研究実験において顕著です（polynoamial）。

tszzl氏は、研究者たちがすでにGPT-5.5に高レベルのアイデアのみを与えて一晩実験を実行させ、朝には完了したスウィープ結果を得ていると報告しています（tszzl）。

aidan_mclau氏は、新しい強化学習（RL）の実行を口頭で指示し、数日間離れていたところ、GPT-5.5の監督下で進行していた31時間分の産業規模のRL実行に戻ってきたと述べています（aidan_mclau、睡眠や見守りのニュアンスを含む）。

johnohallman氏は、5.5が数時間から数日にわたってプロジェクトをエンドツーエンドで処理でき、自分の役割が個人貢献者からマネージャーへと変化していると述べています（johnohallman）。

clivetime氏は現在、約10のCodexを管理しており、セットアップや下準備よりも新規作業への進捗に多くの時間を費やしている（itsclivetime）。

Skirano氏の例

GPT-5.5が厄介なブランチの競合状況を解決する様子について、個人的な「AGI（人工汎用知能）の初体験」と評している（skiranoスレッド開始）。

Flipper ZeroへのUSB接続経由でアプリを作成し、正常にインストールできることを示した（skirano USB例）。

後にリリースページで紹介された、より実際にプレイ可能なワンショットゲームを構築した（skirano game）。

ビジュアル/コード合成の例

Sebastien Bubeck氏は、GPT-5.5が実際に検証可能なTikZコードを用いて、自身のTikZユニコーンテストをほぼ飽和状態に近づけたことを示した（SebastienBubeck）。

Dimillian氏は、Codexと画像生成のスキル、およびmacOSアプリのツールリングを組み合わせて、プロンプトからネイティブのレトロファンタジー迷路ゲームを作成した（Dimillian）。

エンタープライズ/コンピュータ操作の観点

OpenAIによると、Ramp社のユーザーはCodex内でGPT-5.5を使用して、フルスタックのQA（品質保証）変更をエンドツーエンドでテストしている（OpenAIDevs）。

Sam氏は、OpenAIとNvidiaがCodexを社全体に展開しようとしたことを明かし、広範なエンタープライズ導入への自信を示唆した（sama）。

gdb氏は、これはもはやプログラマーだけでなく、「コンピュータ作業を行うすべての人」にとって有用であると強調している（gdb）。

事実と意見

事実 / 直接的に支持される主張

GPT-5.5はChatGPTとCodexでリリースされ、APIの提供は延期されている（OpenAI, OpenAIDevs）。

価格体系は、通常プランで100万トークンあたり5ドル/30ドル、Proプランで同180ドルである（sama, scaling01）。

OpenAIは、82.7 Terminal-Bench 2.0、78.7 OSWorld-Verified、81.8 CyberGym、58.6 SWE-Bench Pro（を含むベンチマークスコアを発表した。

原文を表示

a quiet day.

AI News for 4/22/2026-4/23/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews' website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Top Story: GPT-5.5 launch

What happened

OpenAI launched GPT-5.5 as its new flagship frontier model for “real work and powering agents,” rolling it out immediately in ChatGPT and Codex, while delaying API access pending additional safety requirements (OpenAI, OpenAI rollout, OpenAIDevs, API delayed). OpenAI positioned the model as a step toward lower-micromanagement agentic work: stronger coding, computer use, knowledge work, scientific research, and longer multi-step execution with tool use and self-checking (OpenAI, gdb, snsf). Pricing landed at $5/$30 per million input/output tokens for GPT-5.5 and $30/$180 for GPT-5.5 Pro (scaling01 pricing, sama pricing). The model was described by OpenAI and multiple early testers as notably more token-efficient than GPT-5.4, often using materially fewer output tokens while keeping similar per-token speed (sama, OpenAIDevs, reach_vb, GitHub VP claim relayed by scaling01). OpenAI also bundled significant Codex product upgrades around the launch—browser control, file/docs/PDF handling, Sheets & Slides, auto-review mode, OS-wide dictation, and broader computer-use workflows (ajambrosino, OpenAIDevs browser use, thsottiaux, sama “bundle”).

Independent and semi-independent reactions were mixed but broadly positive: many users called it a step change in coding and long-horizon work, while others argued the headline benchmark gains looked incremental, the price doubled vs GPT-5.4, hallucination remains high on at least one third-party eval, and Anthropic’s Mythos or Opus variants still lead or tie on some tasks depending on benchmark selection (Artificial Analysis, theo, scaling01 critique, Perspective vs Mythos, scaling01 Mythos lead take).

Release details

Product availability

Rolling out today to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex (OpenAI rollout).

GPT-5.5 Pro available to Pro, Business, Enterprise users in ChatGPT (OpenAI rollout).

API access not same-day; OpenAI says it is “coming soon” and delayed due to higher safety requirements / robust safeguards (OpenAIDevs, scaling01, jeffintime).

Third-party ecosystem support appeared quickly, e.g. Hermes Agent support via ChatGPT/Codex OAuth (Teknium).

Pricing

GPT-5.5: $5 input / $30 output per 1M tokens (scaling01 pricing, sama pricing).

GPT-5.5 Pro: $30 / $180 per 1M tokens (scaling01 pricing).

This is widely noted as 2x GPT-5.4 pricing at the per-token level (scaling01), though OpenAI and several testers argue effective task cost is moderated by token efficiency (sama, OpenAIDevs).

Context

Publicly cited as 1M context in API and 400K context in Codex by Swyx summarizing launch materials (swyx).

Sam Altman separately referenced 1M context window alongside API pricing (sama pricing/context).

Infrastructure / serving

OpenAI-linked commentary says GPT-5.5 was co-designed for Nvidia GB200/GB300 and that it was the first generation co-designed with GB200 and GB300 NVL72 (scaling01, swyx).

Jonathan Ross also highlighted GB200 NVL72 training from early access observations (JonathanRoss321).

OpenAI says Codex + GPT-5.5 helped optimize the serving stack, increasing token generation speed by 20%+ (reach_vb, sama inference team praise).

Sam Altman said per-token speed matches GPT-5.4 while using fewer tokens per task (sama).

Codex app changes at launch

New features: browser control, Sheets & Slides, Docs & PDFs, OS-wide dictation, auto-review mode (ajambrosino).

Expanded browser use for testing web flows, screenshots, iteration on what it sees (OpenAIDevs).

OpenAI explicitly framed Codex + 5.5 as useful beyond coding: spreadsheets, slides, documents, browser workflows (gdb).

Technical details and benchmark numbers

OpenAI-reported headline metrics

OpenAI and launch-adjacent posts gave the following benchmark claims:

Terminal-Bench 2.0: 82.7% (OpenAIDevs, reach_vb)

OSWorld-Verified: 78.7% (OpenAIDevs, reach_vb)

Toolathlon: 55.6% (OpenAIDevs)

FrontierMath Tier 4: 35.4%; GPT-5.5 Pro later cited at 39.5% (OpenAIDevs, scaling01)

CyberGym: 81.8% (OpenAIDevs, reach_vb)

SWE-Bench Pro: 58.6% (reach_vb, swyx)

GDPval: 84.9% win/tie (reach_vb)

BrowseComp: 84.4% (reach_vb)

FrontierMath Tier 1–3: 51.7% (reach_vb)

MMMU-Pro without tools: 81.2% (reach_vb)

Investment banking modeling: 88.5% (reach_vb)

Expert-SWE internal eval: 73.1% (swyx)

Tau2-bench Telecom: 98.0% (swyx)

BixBench: 80.5% (swyx)

ARC-AGI-1: 95.0%

ARC-AGI-2: 85.0% (scaling01, ARC Prize verified)

CritPt: 27.1% for xhigh (scaling01, MinyangTian1)

Independent / semi-independent benchmarks

Artificial Analysis

Says GPT-5.5 takes the #1 spot on its Intelligence Index by 3 points, breaking a prior three-way tie among OpenAI, Anthropic, Google (Artificial Analysis).

Claims GPT-5.5 leads Terminal-Bench Hard, GDPval-AA, APEX-Agents-AA, and trails only other OpenAI models in CritPt and AA-LCR, while placing second to Gemini 3.1 Pro Preview on three more benchmarks (Artificial Analysis, headline evals follow-up).

Says GPT-5.5 medium ≈ Claude Opus 4.7 max at ~1/4 the cost on its index, while Gemini 3.1 Pro Preview reaches similar score at still lower cost (Artificial Analysis).

Reports ~40% token-use reduction vs GPT-5.4 offsetting higher price; net cost to run its Intelligence Index rises only about 20% (Artificial Analysis).

Reports AA-Omniscience accuracy 57% but hallucination rate 86%, versus Opus 4.7 max at 36% and Gemini 3.1 Pro Preview at 50%, which is one of the most important caveats in the entire launch discussion (Artificial Analysis).

ARC Prize

Verified ARC-AGI-2 SOTA at 85.0% max, with cost/performance ladder:

Max: 85.0%, $1.87

High: 83.3%, $1.45

Med: 70.4%, $0.86

Low: 33%, $0.35

(ARC Prize)

Andon Labs / Vending-Bench Arena

Says GPT-5.5 beats Opus 4.7 in competitive Vending-Bench Arena, and specifically notes GPT-5.5’s tactics were clean, while Opus used deceptive behaviors (andonlabs).

UK AISI / safety testing

The UK AI Security Institute said it conducted pre-deployment testing on cyber, autonomy capabilities, and safeguards, pointing readers to the system card (AISecurityInst).

System-card-derived cyber result

A widely cited number from readers of the system card: GPT-5.5 could take over a simulated corporate network in 1/10 trials with a 100M-token budget, compared with Claude Mythos at 3/10, while Opus 4.6/4.7 failed on the cited task (scaling01).

LiveBench

scaling01 says GPT-5.5-xhigh placed 1st on LiveBench (scaling01).

Examples of progress in practice

The strongest launch-day evidence was not just benchmarks but user reports of longer-horizon autonomy and reduced micromanagement:

Every early test

Dan Shipper says GPT-5.5 scored 62/100 on Every’s Senior Engineer benchmark vs Opus 4.7 at 33/100, while noting it performs best with an Opus 4.7-generated plan (danshipper).

Reported 900M+ tokens used in testing by one engineer, shipping production features (danshipper).

Praises conceptual clarity, ability to sustain complex refactors, and stronger writing than recent OpenAI models.

Matthew Berman

Calls Codex variant “the absolute frontier” for agentic coding, especially backend and visual inspection loops, while saying Opus remains faster and still better for front-end design in many cases (MatthewBerman).

Reports medium/high thinking worked best; xhigh felt too slow for many workflows.

OpenAI internal user reports

Noam Brown-ish? actually polynoamial says GPT-5.5 makes him “a more effective IC,” specifically for CUDA kernels and research experiments (polynoamial).

tszzl says researchers are already letting GPT-5.5 run overnight experiments from only high-level ideas, producing completed sweeps by morning (tszzl).

aidan_mclau says he dictated a new RL run, left for days, and came back to a 31-hour industrial-scale RL run progressing under GPT-5.5 supervision (aidan_mclau, sleeping/babysitting nuance).

johnohallman says 5.5 can work on projects end-to-end for hours or days, changing his role from IC toward manager (johnohallman).

clivetime says he now manages ~10 Codexes and spends most time on net new progress rather than setup/plumbing (itsclivetime).

Skirano examples

Describes GPT-5.5 resolving a nasty branch conflict situation as a personal “first taste of AGI” (skirano thread start).

Says it can create apps for a Flipper Zero via USB connection and push them successfully (skirano USB example).

Says it built a more genuinely playable one-shot game, later featured on the release page (skirano game).

Visual/code synthesis examples

Sebastien Bubeck showed GPT-5.5 getting close to saturating his TikZ unicorn test with actual verifiable TikZ code (SebastienBubeck).

Dimillian used Codex + imagegen skills + macOS app tooling to create a native retro fantasy labyrinth game from prompts (Dimillian).

Enterprise / computer-use angle

OpenAI says users at Ramp are using GPT-5.5 in Codex to test full-stack QA changes end-to-end (OpenAIDevs).

Sam says OpenAI and Nvidia tried rolling Codex out across an entire company, implying confidence in broad enterprise deployment (sama).

gdb stresses this is now useful to “anyone who does computer work,” not just programmers (gdb).

Facts vs opinions

Facts / directly supported claims

GPT-5.5 launched in ChatGPT and Codex, API delayed (OpenAI, OpenAIDevs).

Pricing is $5/$30 and Pro $30/$180 per 1M tokens (sama, scaling01).

OpenAI reported benchmark scores including 82.7 Terminal-Bench 2.0, 78.7 OSWorld-Verified, 81.8 CyberGym, 58.6 SWE-Bench Pro (

この記事をシェア

The Zvi重要度42026年6月26日 23:51

ホワイトハウスが個別に GPT-5.6 のアクセス権をその場しのぎで決定する方針へ

TechCrunch AI重要度42026年6月26日 08:34

ホワイトハウス、安全性の懸念から OpenAI の新モデルリリースを徐々に行うよう要請

The Verge AI重要度42026年6月26日 06:57

トランプ政権の要請により OpenAI、GPT-5.6 の公開を延期へ

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Smol AI News·2026年4月23日 14:44·約11分

GPT 5.5

#LLM #OpenAI #GPT-5.5 #エージェント機能 #トークン効率

TL;DR

AI深層分析2026年4月27日 23:08

最重要/ 5段階

深度40%

キーポイント

GPT-5.5の即時展開とAPI延期

OpenAIはGPT-5.5をChatGPTとCodexで即時利用可能としたが、追加の安全要件を満たすまでAPIアクセスを保留した。

エージェント機能と実務用途の強化

価格設定とトークン効率性の向上

GPT-5.5およびProの価格が発表され、以前のモデル比で出力トークン数を大幅に削減しながら同等の速度を維持する高いトークン効率性が確認された。

Codex製品の大幅な機能拡充

ブラウザ制御、ファイル/PDF処理、スプレッドシート・スライド対応、OSレベルの音声入力など、Codexのエコシステムが大幅に強化された。

影響分析・編集コメントを表示

影響分析

編集コメント

静かな一日。

AIツイッター recap

トップストーリー：GPT-5.5のローンチ**

何が起きたか

リリース詳細

製品の利用可能性

ChatGPTおよびCodex（OpenAIによる展開）において、Plus、Pro、Business、Enterpriseユーザー向けに本日段階的にロールアウトされています。

GPT-5.5 Proの利用可能性

ChatGPT（OpenAIによる展開）において、Pro、Business、Enterpriseユーザー向けに利用可能です。

APIアクセスは同日ではありません。OpenAIは「まもなく提供開始予定」と述べており、より高い安全要件や堅牢な保護措置のため延期されています（OpenAIDevs, scaling01, jeffintime）。

サードパーティエコシステムのサポートは迅速に現れました。例えば、ChatGPT/Codex OAuth経由のHermes Agentサポート（Teknium）などです。

価格設定

GPT-5.5: 1Mトークンあたり入力$5 / 出力$30（scaling01の価格設定、samaの価格設定）。

GPT-5.5 Pro: 1Mトークンあたり$30 / $180（scaling01の価格設定）。

これは、トークン単位の価格がGPT-5.4の2倍であるとして広く指摘されています（scaling01）。ただし、OpenAIおよび複数のテスターは、トークン効率によってタスクごとの実効コストが抑制されていると主張しています（sama, OpenAIDevs）。

コンテキスト

Swyxがローンチ資料を要約し、APIでは1Mコンテキスト、Codexでは400Kコンテキストが公に引用されています（swyx）。

Sam Altmanは別途、API価格とともに1Mコンテキストウィンドウについて言及しています（sama pricing/context）。

インフラストラクチャ / サービング

Jonathan Rossも、早期アクセスからの観察に基づきGB200 NVL72でのトレーニングを強調しています（JonathanRoss321）。

OpenAIは、CodexとGPT-5.5がサービングスタックの最適化に役立ち、トークン生成速度を20%以上向上させたと述べています（reach_vb, sama inference team praise）。

Sam Altmanは、タスクあたりのトークン使用量は減少しているものの、トークン単位の速度はGPT-5.4と同等であると述べています（sama）。

ローンチ時のCodexアプリの変更

新機能：ブラウザ制御、スプレッドシートとスライド、ドキュメントとPDF、OS全体の音声入力、自動レビューモード（ajambrosino）。

ウェブフローのテスト、スクリーンショット、視覚情報の反復処理のためにブラウザ使用を拡大（OpenAIDevs）。

OpenAIは、Codexと5.5がコーディングを超えて有用である、つまりスプレッドシート、スライド、ドキュメント、ブラウザワークフローに役立つと明確に位置づけています（gdb）。

技術的な詳細とベンチマーク数値

OpenAIが報告した主要指標

OpenAIおよびローンチ関連の投稿は、以下のベンチマーク主張を行いました：

Terminal-Bench 2.0: 82.7%（OpenAIDevs, reach_vb）

OSWorld-Verified: 78.7%（OpenAIDevs, reach_vb）

Toolathlon: 55.6%（OpenAIDevs）

FrontierMath Tier 4: 35.4%; GPT-5.5 Pro は後ほど 39.5% と引用（OpenAIDevs, scaling01）

CyberGym: 81.8%（OpenAIDevs, reach_vb）

SWE-Bench Pro: 58.6%（reach_vb, swyx）

GDPval: 勝率/同率 84.9%（reach_vb）

BrowseComp: 84.4%（reach_vb）

FrontierMath Tier 1–3: 51.7%（reach_vb）

MMMU-Pro without tools: 81.2%（reach_vb）

Investment banking modeling: 88.5%（reach_vb）

Expert-SWE internal eval: 73.1%（swyx）

Tau2-bench Telecom: 98.0%（swyx）

BixBench: 80.5%（swyx）

ARC-AGI-1: 95.0%

ARC-AGI-2: 85.0%（scaling01, ARC Prize verified）

CritPt: xhigh 向け 27.1%（scaling01, MinyangTian1）

独立 / 準独立ベンチマーク

Artificial Analysis

GPT-5.5 は Terminal-Bench Hard、GDPval-AA、APEX-Agents-AA で首位を維持し、CritPt および AA-LCR では他の OpenAI モデルに次ぐ順位にとどまり、さらに 3 つのベンチマークでは Gemini 3.1 Pro Preview に次ぐ第 2 位となっている（Artificial Analysis、ヘッドライン評価のフォローアップ）。

GPT-5.5 medium はその指数において Claude Opus 4.7 max と同等のスコアを約 1/4 のコストで達成し、Gemini 3.1 Pro Preview はさらに低いコストで同様のスコアに到達すると述べている（Artificial Analysis）。

GPT-5.4と比較してトークン使用量が約40%削減されたことが報告されており、これは価格上昇を相殺するものです。その結果、Intelligence Indexの実行コストは約20%しか上昇していません（Artificial Analysis）。

AA-Omniscienceの精度は57%ですが、幻覚発生率は86%です。これに対し、Opus 4.7の最大値は36%、Gemini 3.1 Pro Previewは50%であり、これは今回の発表に関する議論全体において最も重要な注意点の一つです（Artificial Analysis）。

ARC Prize

検証済みのARC-AGI-2 SOTAは最大85.0%で、コストとパフォーマンスの階層は以下の通りです：

最大: 85.0%, $1.87

高: 83.3%, $1.45
中: 70.4%, $0.86
低: 33%, $0.35

(ARC Prize)

Andon Labs / Vending-Bench Arena

UK AISI / 安全性テスト

システムカード由来のサイバー結果

LiveBench

scaling01によると、GPT-5.5-xhighはLiveBenchで第1位となりました（scaling01）。

実践における進歩の例

発表当日の最も強力な証拠はベンチマークだけでなく、より長い時間軸での自律性とマイクロマネジメントの減少に関するユーザー報告でした：

初期テストのすべて

1人のエンジニアがテストで9億トークン以上を使用し、本番環境の機能を実装したと報告（danshipper）。

概念の明確さ、複雑なリファクタリングを維持する能力、そして最近のOpenAIモデルよりも優れた文章作成能力を高く評価しています。

Matthew Berman氏

中程度および高度な思考モードが最も効果的であり、超高度（xhigh）の思考モードは多くのワークフローにおいて遅すぎると報告しています。

OpenAI内部ユーザーからの報告

clivetime氏は現在、約10のCodexを管理しており、セットアップや下準備よりも新規作業への進捗に多くの時間を費やしている（itsclivetime）。

Skirano氏の例

GPT-5.5が厄介なブランチの競合状況を解決する様子について、個人的な「AGI（人工汎用知能）の初体験」と評している（skiranoスレッド開始）。

Flipper ZeroへのUSB接続経由でアプリを作成し、正常にインストールできることを示した（skirano USB例）。

後にリリースページで紹介された、より実際にプレイ可能なワンショットゲームを構築した（skirano game）。

ビジュアル/コード合成の例

Dimillian氏は、Codexと画像生成のスキル、およびmacOSアプリのツールリングを組み合わせて、プロンプトからネイティブのレトロファンタジー迷路ゲームを作成した（Dimillian）。

エンタープライズ/コンピュータ操作の観点

OpenAIによると、Ramp社のユーザーはCodex内でGPT-5.5を使用して、フルスタックのQA（品質保証）変更をエンドツーエンドでテストしている（OpenAIDevs）。

Sam氏は、OpenAIとNvidiaがCodexを社全体に展開しようとしたことを明かし、広範なエンタープライズ導入への自信を示唆した（sama）。

gdb氏は、これはもはやプログラマーだけでなく、「コンピュータ作業を行うすべての人」にとって有用であると強調している（gdb）。

事実と意見

事実 / 直接的に支持される主張

GPT-5.5はChatGPTとCodexでリリースされ、APIの提供は延期されている（OpenAI, OpenAIDevs）。

価格体系は、通常プランで100万トークンあたり5ドル/30ドル、Proプランで同180ドルである（sama, scaling01）。

OpenAIは、82.7 Terminal-Bench 2.0、78.7 OSWorld-Verified、81.8 CyberGym、58.6 SWE-Bench Pro（を含むベンチマークスコアを発表した。

原文を表示

a quiet day.

AI News for 4/22/2026-4/23/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews' website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Top Story: GPT-5.5 launch

What happened

Release details

Product availability

Rolling out today to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex (OpenAI rollout).

GPT-5.5 Pro available to Pro, Business, Enterprise users in ChatGPT (OpenAI rollout).

API access not same-day; OpenAI says it is “coming soon” and delayed due to higher safety requirements / robust safeguards (OpenAIDevs, scaling01, jeffintime).

Third-party ecosystem support appeared quickly, e.g. Hermes Agent support via ChatGPT/Codex OAuth (Teknium).

Pricing

GPT-5.5: $5 input / $30 output per 1M tokens (scaling01 pricing, sama pricing).

GPT-5.5 Pro: $30 / $180 per 1M tokens (scaling01 pricing).

This is widely noted as 2x GPT-5.4 pricing at the per-token level (scaling01), though OpenAI and several testers argue effective task cost is moderated by token efficiency (sama, OpenAIDevs).

Context

Publicly cited as 1M context in API and 400K context in Codex by Swyx summarizing launch materials (swyx).

Sam Altman separately referenced 1M context window alongside API pricing (sama pricing/context).

Infrastructure / serving

OpenAI-linked commentary says GPT-5.5 was co-designed for Nvidia GB200/GB300 and that it was the first generation co-designed with GB200 and GB300 NVL72 (scaling01, swyx).

Jonathan Ross also highlighted GB200 NVL72 training from early access observations (JonathanRoss321).

OpenAI says Codex + GPT-5.5 helped optimize the serving stack, increasing token generation speed by 20%+ (reach_vb, sama inference team praise).

Sam Altman said per-token speed matches GPT-5.4 while using fewer tokens per task (sama).

Codex app changes at launch

New features: browser control, Sheets & Slides, Docs & PDFs, OS-wide dictation, auto-review mode (ajambrosino).

Expanded browser use for testing web flows, screenshots, iteration on what it sees (OpenAIDevs).

OpenAI explicitly framed Codex + 5.5 as useful beyond coding: spreadsheets, slides, documents, browser workflows (gdb).

Technical details and benchmark numbers

OpenAI-reported headline metrics

OpenAI and launch-adjacent posts gave the following benchmark claims:

Terminal-Bench 2.0: 82.7% (OpenAIDevs, reach_vb)

OSWorld-Verified: 78.7% (OpenAIDevs, reach_vb)

Toolathlon: 55.6% (OpenAIDevs)

FrontierMath Tier 4: 35.4%; GPT-5.5 Pro later cited at 39.5% (OpenAIDevs, scaling01)

CyberGym: 81.8% (OpenAIDevs, reach_vb)

SWE-Bench Pro: 58.6% (reach_vb, swyx)

GDPval: 84.9% win/tie (reach_vb)

BrowseComp: 84.4% (reach_vb)

FrontierMath Tier 1–3: 51.7% (reach_vb)

MMMU-Pro without tools: 81.2% (reach_vb)

Investment banking modeling: 88.5% (reach_vb)

Expert-SWE internal eval: 73.1% (swyx)

Tau2-bench Telecom: 98.0% (swyx)

BixBench: 80.5% (swyx)

ARC-AGI-1: 95.0%

ARC-AGI-2: 85.0% (scaling01, ARC Prize verified)

CritPt: 27.1% for xhigh (scaling01, MinyangTian1)

Independent / semi-independent benchmarks

Artificial Analysis

Says GPT-5.5 takes the #1 spot on its Intelligence Index by 3 points, breaking a prior three-way tie among OpenAI, Anthropic, Google (Artificial Analysis).

Claims GPT-5.5 leads Terminal-Bench Hard, GDPval-AA, APEX-Agents-AA, and trails only other OpenAI models in CritPt and AA-LCR, while placing second to Gemini 3.1 Pro Preview on three more benchmarks (Artificial Analysis, headline evals follow-up).

Says GPT-5.5 medium ≈ Claude Opus 4.7 max at ~1/4 the cost on its index, while Gemini 3.1 Pro Preview reaches similar score at still lower cost (Artificial Analysis).

Reports ~40% token-use reduction vs GPT-5.4 offsetting higher price; net cost to run its Intelligence Index rises only about 20% (Artificial Analysis).

Reports AA-Omniscience accuracy 57% but hallucination rate 86%, versus Opus 4.7 max at 36% and Gemini 3.1 Pro Preview at 50%, which is one of the most important caveats in the entire launch discussion (Artificial Analysis).

ARC Prize

Verified ARC-AGI-2 SOTA at 85.0% max, with cost/performance ladder:

Max: 85.0%, $1.87

High: 83.3%, $1.45

Med: 70.4%, $0.86

Low: 33%, $0.35

(ARC Prize)

Andon Labs / Vending-Bench Arena

Says GPT-5.5 beats Opus 4.7 in competitive Vending-Bench Arena, and specifically notes GPT-5.5’s tactics were clean, while Opus used deceptive behaviors (andonlabs).

UK AISI / safety testing

The UK AI Security Institute said it conducted pre-deployment testing on cyber, autonomy capabilities, and safeguards, pointing readers to the system card (AISecurityInst).

System-card-derived cyber result

LiveBench

scaling01 says GPT-5.5-xhigh placed 1st on LiveBench (scaling01).

Examples of progress in practice

The strongest launch-day evidence was not just benchmarks but user reports of longer-horizon autonomy and reduced micromanagement:

Every early test

Dan Shipper says GPT-5.5 scored 62/100 on Every’s Senior Engineer benchmark vs Opus 4.7 at 33/100, while noting it performs best with an Opus 4.7-generated plan (danshipper).

Reported 900M+ tokens used in testing by one engineer, shipping production features (danshipper).

Praises conceptual clarity, ability to sustain complex refactors, and stronger writing than recent OpenAI models.

Matthew Berman

Reports medium/high thinking worked best; xhigh felt too slow for many workflows.

OpenAI internal user reports

Noam Brown-ish? actually polynoamial says GPT-5.5 makes him “a more effective IC,” specifically for CUDA kernels and research experiments (polynoamial).

tszzl says researchers are already letting GPT-5.5 run overnight experiments from only high-level ideas, producing completed sweeps by morning (tszzl).

aidan_mclau says he dictated a new RL run, left for days, and came back to a 31-hour industrial-scale RL run progressing under GPT-5.5 supervision (aidan_mclau, sleeping/babysitting nuance).

johnohallman says 5.5 can work on projects end-to-end for hours or days, changing his role from IC toward manager (johnohallman).

clivetime says he now manages ~10 Codexes and spends most time on net new progress rather than setup/plumbing (itsclivetime).

Skirano examples

Describes GPT-5.5 resolving a nasty branch conflict situation as a personal “first taste of AGI” (skirano thread start).

Says it can create apps for a Flipper Zero via USB connection and push them successfully (skirano USB example).

Says it built a more genuinely playable one-shot game, later featured on the release page (skirano game).

Visual/code synthesis examples

Sebastien Bubeck showed GPT-5.5 getting close to saturating his TikZ unicorn test with actual verifiable TikZ code (SebastienBubeck).

Dimillian used Codex + imagegen skills + macOS app tooling to create a native retro fantasy labyrinth game from prompts (Dimillian).

Enterprise / computer-use angle

OpenAI says users at Ramp are using GPT-5.5 in Codex to test full-stack QA changes end-to-end (OpenAIDevs).

Sam says OpenAI and Nvidia tried rolling Codex out across an entire company, implying confidence in broad enterprise deployment (sama).

gdb stresses this is now useful to “anyone who does computer work,” not just programmers (gdb).

Facts vs opinions

Facts / directly supported claims

GPT-5.5 launched in ChatGPT and Codex, API delayed (OpenAI, OpenAIDevs).

Pricing is $5/$30 and Pro $30/$180 per 1M tokens (sama, scaling01).

OpenAI reported benchmark scores including 82.7 Terminal-Bench 2.0, 78.7 OSWorld-Verified, 81.8 CyberGym, 58.6 SWE-Bench Pro (

この記事をシェア

The Zvi重要度42026年6月26日 23:51

ホワイトハウスが個別に GPT-5.6 のアクセス権をその場しのぎで決定する方針へ

TechCrunch AI重要度42026年6月26日 08:34

ホワイトハウス、安全性の懸念から OpenAI の新モデルリリースを徐々に行うよう要請

The Verge AI重要度42026年6月26日 06:57

トランプ政権の要請により OpenAI、GPT-5.6 の公開を延期へ

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

キーポイント

影響分析

編集コメント

AIツイッター recap

何が起きたか

リリース詳細

技術的な詳細とベンチマーク数値

OpenAIが報告した主要指標

独立 / 準独立ベンチマーク

実践における進歩の例

事実と意見

事実 / 直接的に支持される主張

AI Twitter Recap

What happened

Release details

Technical details and benchmark numbers

OpenAI-reported headline metrics

Independent / semi-independent benchmarks

Examples of progress in practice

Facts vs opinions

Facts / directly supported claims

関連記事

キーポイント

影響分析

編集コメント

AIツイッター recap

何が起きたか

リリース詳細

技術的な詳細とベンチマーク数値

OpenAIが報告した主要指標

独立 / 準独立ベンチマーク

実践における進歩の例

事実と意見

事実 / 直接的に支持される主張

AI Twitter Recap

What happened

Release details

Technical details and benchmark numbers

OpenAI-reported headline metrics

Independent / semi-independent benchmarks

Examples of progress in practice

Facts vs opinions

Facts / directly supported claims

関連記事