OpenAI News·2026年6月3日 22:15·約19分で読める

GPT-Rosalind に新機能導入を発表

#LLM #OpenAI #GPT-Rosalind #生命科学研究 #創薬 AI

TL;DR

OpenAI は、創薬やゲノミクスなど生命科学研究に特化した「GPT-Rosalind」シリーズの更新版を発表し、専門家の評価に基づく新ベンチマーク「LifeSciBench」での性能向上とエージェンシー機能の強化を明らかにした。

AI深層分析2026年6月11日 01:12

重要/ 5段階

深度40%

キーポイント

生命科学研究に特化したモデル強化

GPT-5.5 のエージェントコーディング能力を活用し、医薬品化学やゲノミクスにおける推論力とツール使用能力を大幅に向上させた。

専門家の評価に基づく新ベンチマーク「LifeSciBench」

既存の単一ドメイン評価ではなく、証拠処理からコミュニケーションまで 6 つのワークフロー領域を含む包括的なエンドツーエンドの評価指標を導入した。

多様な研究タスクでの性能向上

生物学専門家の課題、複雑な医薬品化学クエリ、定量的生物学、および実験室トラブルシューティングにおいて広範なパフォーマンスの改善が確認された。

影響分析・編集コメントを表示

影響分析

この発表は、汎用 AI モデルが特定の産業領域（特に生命科学）において実用的な価値を発揮し始めたことを示す重要な転換点です。専門家の評価に基づく厳密なベンチマークの導入により、AI の信頼性が向上し、創薬や実験プロセスにおける意思決定支援ツールとしての採用加速が期待されます。

編集コメント

汎用モデルの進化だけでなく、特定ドメイン（生命科学）に特化した評価基準と実装をセットで進める OpenAI の戦略は、産業別 AI 実装の標準となる可能性があります。

私たちは、企業規模での生命科学研究に特化した GPT‑Rosalind シリーズに対して、新しいモデル更新を発表します。これは、GPT‑5.5 のエージェント型コーディングおよびツール使用機能と、医薬化学やゲノミクスといった中核的な創薬ドメインにおけるより強力なモデル知能を組み合わせるとともに、生命科学研究の分析・設計・実験ワークフロー全体にわたる性能向上を実現するものです。

生命科学における進展は、分子、遺伝子、経路、そして生体システムなど、異なるスケールとモダリティにまたがるデータやエビデンスを統合することに依存しています。当社の評価において、更新された GPT‑Rosalind は、生物学者による研究タスク、複雑な医薬化学に関する問い合わせ、定量的生物学、および実験室でのトラブルシューティングといった分野で、広範な性能向上を示しました。

GPT‑Rosalind は現在、信頼性の高いアクセスデプロイメント構造を通じて、世界中の資格のある組織に対して研究プレビューとして利用可能です。

GPT-Rosalind の実世界への影響を測定し、継続的に改善するために、私たちは LifeSciBench を設計しました。これは生命科学研究の基礎的な側面focused にした、外部専門家による評価基準です。既存の評価基準がモデル性能や生物学的ドメインの単一要素を孤立して評価するのに対し、LifeSciBench は証拠処理、分析、設計と最適化、科学的推論、検証と運用、翻訳とコミュニケーションという生命科学研究の中核となる 6 つのワークフロー領域から課題を引き出すことで、科学的に価値ある作業全体をエンドツーエンドで捉えます。私たちはこの評価基準を用いて、進捗が生命科学研究のニーズと現実状況と整合するように調整しています。

GPT-Rosalind は、業界および学術の専門家によって特定された科学的に価値のあるタスクにおいて、性能面でリードしています。

論文、図表、実験記録から科学的証拠を抽出し、統合し、監査すること。

候補回答

結論: 提示されている通り、このパッケージは、臨床的ベネフィットを予測する可能性が十分高い置換エンドポイントとしてマイクロ・ジストロフィンの発現に対する迅速承認を支えるには十分に強力ではありません。懐疑的な FDA 審査官は、バイオマーカー、機能性、持続性、安全性、および一般化性の証拠が実質的に不十分であると判断するでしょう。

パッケージ項目

主要な失敗モード

必要なもの

ウェスタンブロット定量

MANEX1A は、内因性のフルレングドストロフィンとマイクロ・ドストロフィンのトランスジーンに共通する N 末端エピトープに結合するため、このアッセイではトランスジーンと残存/リバート型ドストロフィンを明確に区別することはできません。また、健康なフルレングドドストロフィンの標準品に対して 138 kDa のマイクロ・ドストロフィンを定量することも無効です。

組換えマイクロ・ドストロフィンの標準品を使用し、ターゲット型質量分析（mass spectrometry）やトランスジーン特異的/エピトープ特異的なアッセイなど、トランスジーンと内因性ドストロフィンを区別できる直交法（orthogonal method）を用いてください。

免疫蛍光法（Immunofluorescence）

C 末端ポリクローナル抗体は不適切です。なぜなら、138 kDa のコンストラクトには C 末端ドメインが含まれていないためです。多くの筋ジストロフィー患者（DMD patients）にはリバート型線維が存在し、リバート型ドストロフィンは C 末端エピトープを保持する可能性があります。リバート型線維は加齢とともにクローン的に拡大する傾向があり、特に年長の男孩において免疫蛍光（IF）信号にバイアスをもたらします。

トランスジーンには存在するがリバート型ドストロフィンには存在しないエピトープに対する抗体を用いて免疫蛍光を繰り返してください。トランスジーン陽性の線維とリバート型線維は別々に定量してください。

代理エンドポイントの有効性

添付文書では、タンパク質量と臨床機能が混同されています。「健康対照群のタンパク質量の 38%」という数値は、マイクロ・ドストロフィンが構造的に切断されているため、正常なドストロフィンの機能の 38% を意味するものではありません。

発現を代理エンドポイントとして扱う前に、マイクロ・ドストロフィンの質量パーセント、筋膜局在化、下流の機能回復、および臨床的恩恵との関係を実証的に検証する必要があります。

治療前および治療後の対側広筋外側の生検では、左右差や筋肉内空間的な変動が生じます。病状の進行や線維脂肪性置換によっても、総タンパク質で正規化されたシグナルが変化することがあります。

生検部位は一定の解剖学的ランドマークを用いて標準化し、筋肉特異的タンパク質に対して正規化するとともに、線維脂肪組成を並行して測定してください。

NSAA 比較対象/統計解析

外部の自然経過コホートは、ランダム化された同時対照群ではありません。試験適格基準、支援ケア、参加効果、ベースライン NSAA、ステロイド投与計画、年齢、エクソンクラスなどはすべて比較にバイアスをもたらす可能性があります。非対応 t 検定では不十分です。また、NSAA で +1.4 の変化は、この年齢群における再テスト変動の範囲内です。

ランダム化された同時プラセボ対照試験を実施するか、少なくともベースライン NSAA、年齢、ステロイド投与計画、エクソンクラス、その他の交絡因子を考慮した調整済み解析を用いてください。

年齢層による交絡

4〜7 歳の男子は、治療されていない歩行可能な DMD 患者において運動機能が低下する前に一時的に向上する可能性がある発達段階にあります。48 週間の NSAA の変化には、発育による増加、病状の進行、および可能性のある治療効果が混在しています。

発育軌道と治療効果を分離するために、年齢層を層別化した同時ランダム化対照群を使用してください。

過去の臨床的先例

オープンラベルのマイクロ・ジストロフィンによる機能的シグナルは、確認的な利益を信頼性を持って予測していません。公開された先例には、マイクロ・ジストロフィン遺伝子治療の確認試験でオープンラベル NSAA（非対称性神経筋アセスメント）の改善が再現されなかった事例が含まれています。

オープンラベルによる NSAA の変化を決定的な根拠として信頼しないでください。制御された機能的エビデンスを要求してください。

構造物の構造的限界

138 kDa のコンストラクトは、nNOS 結合部位を含むスペクトリンリピート R16/17 を欠損させています。nNOS のリクルートメント喪失は、運動中の機能的な交感神経遮断（シムパソリス）や虚血保護を損ない、発現量に依存しない救済のメカニズム上の天井を生み出します。

この特定のコンストラクトが、関連するジストロフィン複合体機能、nNOS の局在化、運動生理学、および筋保護を回復させるかどうかを示すメカニズム研究を追加してください。

AAV の耐久性

12 週時点のベクターゲノムは、持続的な発現を確立していません。AAV9 ゲノムは主に非統合型のエピソームであり、時間とともに減少する可能性があります。ベクターゲノムの持続性は、持続的なタンパク質発現と同じではありません。

12 週間を超えての経時的なトランスジェンタンパク質発現および機能的バイオマーカーの耐久性を測定してください。

免疫/安全性プロファイル

患者 12 人中 8 人にみられるトランスアミナーゼ上昇は、AAV 転導細胞に対する免疫反応と一致しますが、そのメカニズムは確立されていません。AAV9 の心臓特異性（カルディオトロピズム）を考慮すると、心筋炎の症例 1 件は懸念されます。

より深い免疫モニタリング、肝臓・心臓安全性の特徴付け、および強化された心臓フォローアップを提供してください。

患者選択/一般化可能性

抗 AAV9 中和抗体陽性患者を除外することは、一般化可能性を制限します。エクソン 44 の欠失を除外することは、その DMD サブグループへの適用性を制限します。n=12 というサンプル数は、より広範な DMD 集団全体における安全性と有効性を特徴づけるには小さすぎます。

結果を広範な承認の根拠とする前に、可能であれば適格基準を拡大するか、抗体ステータス、遺伝子型/エクソンクラス、年齢、ベースライン機能による層別化解析を事前に規定してください。

規制上の結論: 本パッケージは生物学的活性を示す可能性がありますが、測定されたマイクロ・ジストロフィンの発現が臨床的ベネフィットを予測する信頼性の高い代替指標である確からしさを確立するには至っていません。主なギャップは、アッセイの特異性、無効な定量基準、潜在的なリバータント繊維による交絡、ランダム化対照群の欠如、年齢関連 NSAA による交絡、不確実な持続性、および未解決の安全性/一般化可能性の問題です。

このギャップを埋めるには、トランスジェン特異的発現アッセイ、直交するタンパク質定量法、組織組成対照、縦断的な持続性データ、切断型コンストラクトのためのメカニズム機能アッセイ、特に肝臓および心臓に関するより強力な安全性モニタリングを備えた、年齢層別化された臨床デザインによる制御実験が必要です。

ルブリック基準と評価

基準

ポイント

マイクロ・ジストロフィンの定量におけるアッセイ/測定上の問題、すなわち MANEX1A エピトープの共有、無効なフル長のジストロフィン標準品、および組換え体または直交するトランスジェン特異的測定の必要性を特定します。

マイクロ・ジストロフィンの発現レベルが自動的に機能的な臨床的ベネフィットの有効な代替指標（サロゲート）となる理由がないことを説明します。

生検部位、組織組成、および年齢層の交絡要因が、発現と NSAA（North Star Ambulatory Assessment）の評価を弱める可能性を指摘します。

NSAA の比較対象や統計手法、特に外部自然経過対照群への依存に対する批判を行います。

AAV の持続性、免疫反応、トランスアミナーゼ上昇、心筋炎、およびより長期的な発現・安全性フォローアップの必要性について言及します。

抗 AAV9 抗体保有者の除外、エクソン 44 欠失の除外、小規模サンプルサイズなど、患者選択や一般化可能性におけるギャップを指摘します。

より強力な科学的推論

GPT‑Rosalind は、分子を実用的な医薬品へと変換することに焦点を当てた分野である医薬化学において、業界最高水準のパフォーマンスを実現しています。MedChemBench は現実的な医薬化学ワークフローを反映するように設計され、多様な化学構造の理解、構造活性相関（SAR）、薬効・毒性・吸収・分布・代謝・排泄（ADME）の予測、多パラメータリード最適化における意思決定、および逆合成解析を評価します。GPT‑Rosalind は MedChemBench において GPT‑5.5 を上回り、27.5%対 25.1%という結果を示しながら、使用トークン数は 7.2%削減しています。

GPT-Rosalind は、医薬品化学における多モーダル合成とメカニズム推論において優れた性能を示します。

GeneBench における、ゲノム学および定量的生物学における長期にわたるエンドツーエンド分析の自律型評価では、GPT-5.5 に比べてトークン使用量を 31% 削減しながらも、20.4% から 21.6% と高い精度を達成しています。GeneBench は、長期の定量的タスクにおける自律型の性能を評価するものであり、現実的な科学データに基づき、自律型エージェントが有効な分析計画、品質管理 (QC)、モデリング、および修正を行い、意思決定に直結する回答に至ることができるかを問うものです。対象となる問題には、機能ゲノム学、空間トランスクリプトミクス、プロテオミクス、エピゲノム学、応用遺伝学など、多様なドメインが含まれています。

GPT-Rosalind は、精度を向上させながら GPT-5.5 よりも 31% 少ないトークンを使用します。

私たちは、科学者が現実世界で実験作業を行うのを支援する GPT-Rosalind の能力を検証するための新たな評価指標を導入しました。LabWorkBench は、トラブルシューティングから最適化まで多様な目的のために科学者が使用する実際の湿式実験プロトコルにおいて、擾乱と実験結果を結びつけるモデルの能力を試すものです。LabWorkBench で使用されるデータは独自のものであり、汚染されていないことが保証されています。GPT-Rosalind は 63.2% のスコアを獲得し、GPT-5.5 の 55.8% を上回るとともに、トークン使用量は 5.3% 削減しています。

実際の湿式実験プロトコルの支援において、GPT-Rosalind は GPT-5.5 よりも顕著な向上を示し、同時にトークンの効率性も高めています。

推論から実行可能なワークフローへ

私たちは、Life Sciences Research⁠(新しいウィンドウで開く) および Life Sciences NGS Analysis⁠(新しいウィンドウで開く) プラグインを構築しました。これらは、GPT‑Rosalind の高まった知能に、反復可能な科学ワークフローのための実用的な実行層を追加するものです。これらのプラグインは合わせて、出典付き証拠の検索、生物学的解釈、バイオインフォマティクス実行を同じワークスペースに統合し、研究者が外部のエビデンスと内部のオミクス解析を結びつけるのを支援しながら、アーティファクトとプロベナンス（由来）を保持します。すべてのユーザーは現在、Codex を通じて両方のプラグインにアクセスできます。資格のある GPT‑Rosalind エンタープライズユーザーはさらに、GPT‑Rosalind を使用してこれらのプラグインを駆動することも可能です。

科学者にとっての動的な作業台として Codex をより効果的に活用するため、生物学的ネイティブファイル形式用のインタラクティブビューアを追加しました。初期セットである配列、アラインメント（対比）、構造のビューアは、GPT‑Rosalind がワークフロー全体で推論を行う際にも科学者が証拠に密着した状態を保ち、アクティブなビューアをコンテキスト内で直接使用してフォローアップ質問に直接回答できるよう設計されています。

上記のデモは、GPT-Rosalind がオーケストレーションするこれらの機能を動作している様子を示しています。私たちは、液体腫瘍生検を検査し、治療方針に役立つ変異やその他の分子変化を特定する科学者を追跡します。Life Sciences NGS Analysis プラグイン（NGS: 次世代シーケンシング）は、処理された ctDNA（循環腫瘍 DNA）レコードのレビューを対話型ノートブックに変換し、反復して現れる変異、低頻度の検出結果、および KRAS G12C に焦点を当てたサンプルの経時変化を浮き彫りにします。そこから Life Sciences Research プラグインが、出典付きの標的分子、阻害剤、および耐性の文脈を追加し、ネイティブのシーケンサー、アライメントビューアー、構造ビューアーにより、科学者は変異残基 12 を直接検査し、RAS ファミリー全体における保存性を確認し、阻害剤結合ポケットを視覚化できます。このワークフローは、その証拠に基づいて具体的なフォローアップオプションへと翻訳することで完了し、各ステップと成果物は専門家のレビューに利用可能です。

Life Sciences NGS Analysis プラグイン（NGS: 次世代シーケンシング）

scRNA-seq QC & Annotation

10 倍規模のマトリックスバンドルを、Codex で検査・修正可能な QC（品質管理）フィルタリング済み単細胞アーティファクト、注釈、UMAP に変換します。ライフサイエンス NGS アナライシスプラグインはリクエストを scrna-seq-qc にルーティングし、データから QC 閾値を選択し、フィルタリングと注釈に関するプロベナンス（追跡可能性）を保持するとともに、二重体検出依存関係の欠如などのブロック要因を表面化します。

Bulk RNA-seq FASTQ QC

Bulk RNA-seq サンプルシート、FASTQ バンドル、参照ファイルから、Codex で検査・再利用可能な QC 審査済みカウントバンドルを作成します。ライフサイエンス NGS アナライシスプラグインはリクエストをルーティングし、入力値を検証して、MultiQC、Salmon マトリックス、プロベナンス（追跡可能性）、および明示的な注意点を備えた監査可能なランエンベロープを返します。

信頼できる組織向けのアクセス拡大

私たちは、GPT‑Rosalind シリーズへのアクセスを、世界中の適格な組織に拡大しています。GPT‑Rosalind は、明確な公共的利益を持つ正当な科学研究を実施し、強力なガバナンスと安全監督体制を持ち、エンタープライズグレードのセキュリティによる制御されたアクセスを有する組織向けに、信頼できるアクセスデプロイメント構造を通じて研究プレビューとして利用可能になります。

このグローバル展開の一環として、GPT-Rosalind を活用して医療研究のスケーリングを支援し、患者さんにより迅速に革新的な治療選択肢を提供するというノボノルディスクの使命をサポートできることを嬉しく思います。ノボノルディスクは、最先端の AI 機能を駆使して研究者が複雑なデータセットを分析し、有用なパターンを発見し、仮説をより迅速に検証することを支援しています。GPT-Rosalind の強化された生物学的理解力は、チームが文献、ゲノミクス、トランスクリプトミクス、配列、構造、実験結果 across する証拠をつなぎ合わせ、データから明確な研究判断へと移行しやすくします。

「生命科学の研究は複雑で、データに富み、学際的なものです。研究者にとって意味のある価値を提供するためには、高度な AI モデルが信頼できる科学的データに基づき、検証済みのツールと接続され、研究者が毎日使用する実際のワークフローに統合されている必要があります。OpenAI とのパートナーシップおよび GPT-Rosalind がより厳密で実践的な創薬アプローチをどのように支援できるかを探索する機会について、大変嬉しく思っています」

ミシャール・パテル氏ノボノルディスク R&D AI & デジタルイノベーショングループ副社長

また、エンタープライズアカウントを持たない適格な組織向けに、OpenAI が管理するワークスペースも提供を開始しました。

次のステップ

更新された GPT‑Rosalind は、科学的発見を加速する AI システムの構築に向けた当社のより広範なコミットメントにおける次の一歩であり、高度な生物学的能力が適切な safeguards とともに展開されることを保証するものです。私たちは引き続き、モデルの生物学的推論能力の向上、ツールを活用した研究ワークフローや長期にわたる研究プロセスへのサポート拡大、そして地域全体で有資格の組織と連携して実世界での影響を評価することに取り組んでいきます。

これはまた、創薬や転換医学から公衆衛生、備え、生物防御に至るまで、高い社会的インパクトを持つ公益事業に生命科学 AI を適用することを意味します。Rosalind Biodefense と信頼されたアクセスに基づく展開モデルを通じて、私たちは最先端の生物学的能力を、人間の健康改善と社会のレジリエンス強化に取り組む研究者、機関、そして防衛担当者の手に届けることを目指しています。

私たちは引き続き GPT‑Rosalind を構築し、科学研究の全ライフサイクルにおいてより有能なパートナーとなるよう努め、科学者が適切な問いから明確な証拠へ、より良い実験へと、そして最終的には患者のための新たな治療法へと、より迅速に移動できるよう支援していきます。

原文を表示

We’re introducing a new model update to our GPT‑Rosalind series purpose-built for life sciences research at enterprise scale. It combines GPT‑5.5’s agentic coding and tool-use capabilities with stronger model intelligence in core drug-discovery domains such as medicinal chemistry and genomics, while advancing performance across broader life sciences analysis, design, and experimental workflows.

Progress in life sciences depends on synthesizing data and evidence across scales and modalities: molecules, genes, pathways, and living systems. In our evaluations, the updated GPT‑Rosalind shows broad performance gains on research tasks from biology experts, complex medicinal chemistry queries, quantitative biology, and wet lab troubleshooting.

GPT‑Rosalind is now available in research preview to eligible organizations globally through our trusted-access deployment structure.

Improving performance on scientifically-valuable tasks

In order to measure and continuously improve the real-world impact of GPT‑Rosalind, we designed LifeSciBench, an externally expert-judged benchmark focused on foundational aspects in life sciences research. Unlike existing benchmarks that evaluate a single component of model performance or biological domain in isolation, LifeSciBench takes an end-to-end view of scientifically valuable work by drawing tasks from six workflow areas central to life sciences research: evidence handling, analysis, design and optimization, scientific reasoning, validation and operations, and translation and communication. We use this benchmark to align progress with the needs and realities of life sciences research.

Extracting, reconciling, and auditing scientific evidence from papers, figures, tables, and experimental records.

Candidate Response

Bottom line: As presented, this package is not strong enough to support accelerated approval on micro-dystrophin expression as a surrogate endpoint reasonably likely to predict clinical benefit. A skeptical FDA reviewer would likely view the biomarker, functional, durability, safety, and generalizability evidence as materially under-supported.

Package Item

Key Failure Mode

What Would Be Needed

Western blot quantification

MANEX1A binds an N-terminal epitope shared by endogenous full-length dystrophin and the micro-dystrophin transgene, so the assay does not cleanly distinguish transgene from residual/revertant dystrophin. Quantifying a 138 kDa micro-dystrophin against a healthy full-length dystrophin standard is also invalid.

Use a recombinant micro-dystrophin standard and an orthogonal method that distinguishes transgene from endogenous dystrophin, such as targeted mass spectrometry or a transgene-specific/epitope-specific assay.

Immunofluorescence

The C-terminal polyclonal antibody is poorly suited because the 138 kDa construct lacks the C-terminal domain. Many DMD patients have revertant fibers, and revertant dystrophin can retain C-terminal epitopes. Revertant fibers may expand clonally with age, biasing IF signal, especially in older boys.

Repeat IF with an antibody against an epitope present in the transgene but absent from revertant dystrophin. Quantify transgene-positive fibers separately from revertant fibers.

Surrogate endpoint validity

The package conflates protein amount with clinical function. “38% of healthy-control protein mass” does not mean 38% of normal dystrophin function because micro-dystrophin is structurally truncated.

Empirically validate the relationship between micro-dystrophin mass-percent, sarcolemmal localization, downstream functional restoration, and clinical benefit before treating expression as a surrogate endpoint.

Biopsy design

Pre- and post-treatment contralateral vastus lateralis biopsies introduce left-right and intramuscular spatial variability. Disease progression and fibro-fatty replacement can also change total-protein-normalized signal.

Standardize biopsy site using consistent anatomical landmarks, normalize to muscle-specific proteins, and measure fibro-fatty composition in parallel.

NSAA comparator/statistics

An external natural-history cohort is not a randomized concurrent control. Trial eligibility, supportive care, participation effects, baseline NSAA, steroid regimen, age, and exon class can all bias the comparison. An unpaired t-test is not sufficient. Also, a +1.4 NSAA change is within test-retest variability for this age group.

Run a randomized concurrent placebo-controlled study, or at minimum use adjusted analyses accounting for baseline NSAA, age, steroid regimen, exon class, and other confounders.

Age-window confounding

Boys age 4–7 are in a developmental window where untreated ambulatory DMD patients may gain motor function before decline dominates. A 48-week NSAA change mixes developmental gain, disease progression, and possible treatment effect.

Use a concurrent randomized control with age stratification to separate developmental trajectory from treatment effect.

Prior clinical precedent

Open-label micro-dystrophin functional signals have not reliably predicted confirmatory benefit; published precedent includes micro-dystrophin gene therapy confirmatory trials failing to reproduce open-label NSAA improvements.

Do not rely on open-label NSAA change as decisive support. Require controlled functional evidence.

Structural limits of the construct

The 138 kDa construct deletes spectrin repeats R16/17, which contain nNOS-binding sites. Loss of nNOS recruitment can impair functional sympatholysis and ischemia protection during exercise, creating a mechanistic ceiling on rescue independent of expression level.

Add mechanistic studies showing whether this specific construct restores relevant dystrophin-associated complex function, nNOS localization, exercise physiology, and muscle protection.

AAV durability

Vector genomes at 12 weeks do not establish durable expression. AAV9 genomes are largely non-integrating episomes and may decline over time. Vector-genome persistence is not the same as persistent protein expression.

Measure longitudinal transgene protein expression and functional biomarker durability beyond 12 weeks.

Immune/safety profile

Transaminitis in 8/12 patients is consistent with immune response to AAV-transduced cells, but the mechanism is not established. One myocarditis case is concerning given AAV9 cardiac tropism.

Provide deeper immune monitoring, liver/cardiac safety characterization, and intensified cardiac follow-up.

Patient selection/generalizability

Excluding anti-AAV9 neutralizing-antibody-positive patients limits generalizability. Excluding exon-44 deletions limits applicability to that DMD subgroup. n=12 is too small to characterize safety and efficacy across the broader DMD population.

Broaden eligibility where possible or pre-specify stratified analyses by antibody status, genotype/exon class, age, and baseline function before using the result to support broad approval.

Regulatory conclusion: The package may show biological activity, but it does not yet establish that the measured micro-dystrophin expression is a reliable surrogate reasonably likely to predict clinical benefit. The main gaps are assay specificity, invalid quantification standards, possible revertant-fiber confounding, lack of a randomized control, age-related NSAA confounding, uncertain durability, and unresolved safety/generalizability issues.

To close the gap, the program would need a controlled, age-stratified clinical design with transgene-specific expression assays, orthogonal protein quantification, tissue-composition controls, longitudinal durability data, mechanistic functional assays for the truncated construct, and stronger safety monitoring, especially hepatic and cardiac.

Rubric Criteria & Grades

Criterion

Points

Identifies assay/measurement problems in micro-dystrophin quantification, including MANEX1A epitope sharing, invalid full-length dystrophin standards, and need for recombinant or orthogonal transgene-specific measurement.

+24

Explains why micro-dystrophin expression level is not automatically a valid surrogate for functional clinical benefit.

+22

Flags biopsy-site, tissue-composition, and age-window confounding that weaken expression and NSAA interpretation.

+19

Critiques the NSAA comparator/statistics, especially reliance on external natural-history controls.

+12

Addresses AAV durability, immune response, transaminitis, myocarditis, and need for longer-term expression/safety follow-up.

+15

Notes patient-selection/generalizability gaps, including anti-AAV9 exclusion, exon-44 exclusion, and small sample size.

Stronger scientific reasoning

GPT‑Rosalind achieves industry-leading performance in medicinal chemistry, a field focused on turning molecules into useful drugs. We designed MedChemBench to reflect realistic medicinal chemistry workflows, evaluating multimodal chemical structure understanding; structure-activity relationship (SAR); prediction of drug potency, toxicity, and absorption, distribution, metabolism, excretion (ADME); multiparameter lead-optimization decision-making; and retrosynthesis. GPT‑Rosalind out-performs GPT‑5.5 at 27.5% vs. 25.1% on MedChemBench, while using 7.2% fewer tokens.

On GeneBench, our agentic evaluation on long horizon, end-to-end analysis in genomics and quantitative biology, GPT‑Rosalind uses 31% fewer tokens than GPT‑5.5 while achieving a higher accuracy of 21.6% vs. 20.4%. GeneBench assesses agentic performance on long-horizon quantitative tasks: based on realistic scientific data, can an agent plan valid analysis, QC, modeling, and corrections to arrive at decision-relative answers? Included problems span a variety of domains, including functional genomics, spatial transcriptomics, proteomics, epigenomics, and applied genetics.

We introduce a new evaluation to test GPT‑Rosalind’s ability to help scientists conducting lab work in the real world. LabWorkBench tests the model's ability to link perturbations to experimental outcomes in real wet lab protocols used by scientists, for the purposes ranging from troubleshooting to optimization. The data used by LabWorkBench are proprietary and thus uncontaminated. GPT‑Rosalind scores 63.2% vs. GPT‑5.5 at 55.8%, while using 5.3% fewer tokens.

From reasoning to executed workflows

We built the Life Sciences Research⁠(opens in a new window) and Life Sciences NGS Analysis⁠(opens in a new window) plugins to extend the increased intelligence of GPT‑Rosalind with a practical execution layer for repeatable scientific workflows. Together, these plugins bring sourced evidence retrieval, biological interpretation, and bioinformatics execution into the same workspace, helping researchers connect external evidence with internal omics analyses while preserving artifacts and provenance. All users can now access both plugins through Codex. Qualified GPT‑Rosalind enterprise users can additionally use GPT‑Rosalind to power these plugins.

To better leverage Codex as a dynamic workbench for scientists, we added interactive viewers for biologically native file types. The initial set of sequence, alignment, and structure viewers are designed to keep scientists close to the evidence as GPT‑Rosalind reasons across a workflow and directly answer follow-up questions using the active viewer in-context.

scRNA-seq QC & Annotation

Turn a 10x-style matrix bundle into QC-filtered single-cell artifacts, annotations, and UMAPs you can inspect and revise in Codex. The Life Sciences NGS Analysis plugin routes the request to scrna-seq-qc, chooses QC thresholds from the data, preserves provenance around filtering and annotation, and surfaces blockers such as missing doublet-detection dependencies.

Bulk RNA-seq FASTQ QC

Turn a bulk RNA-seq sample sheet, FASTQ bundle, and reference files into a QC-reviewed counts bundle you can inspect and reuse in Codex. The Life Sciences NGS Analysis plugin routes the request, validates the inputs, and returns an auditable run envelope with MultiQC, Salmon matrices, provenance, and explicit caveats.

Expanded access for trusted organizations

We are expanding access to the GPT‑Rosalind series to eligible organizations globally. GPT‑Rosalind will be available in research preview through our trusted-access deployment structure for organizations that are conducting legitimate scientific research with clear public benefit, have strong governance and safety oversight, and controlled access with enterprise-grade security.

As part of this global expansion, we’re excited to help support Novo Nordisk’s mission of bringing innovative treatment options to patients faster by helping scale their medical research with GPT‑Rosalind. Novo Nordisk is leveraging frontier AI capabilities to help researchers analyze complex datasets, uncover useful patterns, and test hypotheses more quickly. GPT‑Rosalind’s stronger biological understanding will help teams connect evidence across literature, genomics, transcriptomics, sequence, structure, and experimental results, making it easier to move from data to clearer research decisions.

“Life sciences research is complex, data-rich, and interdisciplinary. To deliver meaningful value for researchers, advanced AI models must be grounded in trusted scientific data, connected to validated tools, and integrated into the real-world workflows researchers use every day. We’re pleased with our partnership with OpenAI and the opportunity to explore how GPT‑Rosalind can support more rigorous, practical approaches to drug discovery.”

Mishal Patel, Group Vice President, AI & Digital Innovation, R&D - Novo Nordisk

We are also now offering an OpenAI managed workspace for qualified organizations without an Enterprise account.

What’s next

The updated GPT‑Rosalind is the next step in our broader commitment to building AI systems that can help accelerate scientific discovery while ensuring that advanced biological capabilities are deployed with appropriate safeguards. We will continue improving the model’s biological reasoning, expanding support for tool-heavy and long-horizon research workflows, and working with qualified organizations across regions to evaluate real-world impact.

This also means applying life sciences AI to high-impact public-benefit work, from drug discovery and translational medicine to public health, preparedness, and biodefense. Through Rosalind Biodefense and our trusted-access deployment model, we aim to put frontier biological capabilities in the hands of the researchers, institutions, and defenders working to improve human health and strengthen societal resilience.

We will continue building GPT‑Rosalind to become a more capable partner across the full life cycle of scientific research, helping scientists move more quickly from the right questions to clearer evidence, better experiments, and ultimately new treatments for patients.

この記事をシェア

TLDR AI★42026年6月19日 09:00

OpenAI、次週に GPT-5.6 モデルの公開を準備（2 分読了）

OpenAI は来週、GPT-5.6 のミニ版とプロ版を含む新モデルを発表する予定である。同社は 150 万トークンのコンテキストウィンドウ拡大やコーディング機能の強化、Codex の応答速度向上を主な改善点としており、米国規制の影響で Claude Fable 5 の提供が制限される Anthropic を価格面で下回る戦略を掲げている。

TLDR AI★42026年6月19日 09:00

OpenAI が企業向け利用分析機能を導入（3 分読了）

OpenAI は、企業が自社の AI サービス利用状況を詳細に把握・管理できるよう、新たな企業向け利用分析機能を発表した。

TLDR AI★42026年6月19日 09:00

OpenAI や Anthropic の安価な代替案に賭ける 130 億ドル規模の AI スタートアップ

TLDR AI が報じた記事によると、OpenAI や Anthropic に代わる低コストソリューションへ巨額の投資を行う 130 億ドル規模の AI スタートアップが注目されています。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

OpenAI News·2026年6月3日 22:15·約19分で読める

GPT-Rosalind に新機能導入を発表

#LLM #OpenAI #GPT-Rosalind #生命科学研究 #創薬 AI

TL;DR

AI深層分析2026年6月11日 01:12

重要/ 5段階

深度40%

キーポイント

生命科学研究に特化したモデル強化

GPT-5.5 のエージェントコーディング能力を活用し、医薬品化学やゲノミクスにおける推論力とツール使用能力を大幅に向上させた。

専門家の評価に基づく新ベンチマーク「LifeSciBench」

多様な研究タスクでの性能向上

影響分析・編集コメントを表示

影響分析

編集コメント

GPT‑Rosalind は現在、信頼性の高いアクセスデプロイメント構造を通じて、世界中の資格のある組織に対して研究プレビューとして利用可能です。

GPT-Rosalind は、業界および学術の専門家によって特定された科学的に価値のあるタスクにおいて、性能面でリードしています。

論文、図表、実験記録から科学的証拠を抽出し、統合し、監査すること。

候補回答

パッケージ項目

主要な失敗モード

必要なもの

ウェスタンブロット定量

免疫蛍光法（Immunofluorescence）

代理エンドポイントの有効性

NSAA 比較対象/統計解析

年齢層による交絡

発育軌道と治療効果を分離するために、年齢層を層別化した同時ランダム化対照群を使用してください。

過去の臨床的先例

オープンラベルによる NSAA の変化を決定的な根拠として信頼しないでください。制御された機能的エビデンスを要求してください。

構造物の構造的限界

AAV の耐久性

12 週間を超えての経時的なトランスジェンタンパク質発現および機能的バイオマーカーの耐久性を測定してください。

免疫/安全性プロファイル

より深い免疫モニタリング、肝臓・心臓安全性の特徴付け、および強化された心臓フォローアップを提供してください。

患者選択/一般化可能性

ルブリック基準と評価

基準

ポイント

生検部位、組織組成、および年齢層の交絡要因が、発現と NSAA（North Star Ambulatory Assessment）の評価を弱める可能性を指摘します。

NSAA の比較対象や統計手法、特に外部自然経過対照群への依存に対する批判を行います。

AAV の持続性、免疫反応、トランスアミナーゼ上昇、心筋炎、およびより長期的な発現・安全性フォローアップの必要性について言及します。

抗 AAV9 抗体保有者の除外、エクソン 44 欠失の除外、小規模サンプルサイズなど、患者選択や一般化可能性におけるギャップを指摘します。

より強力な科学的推論

GPT-Rosalind は、医薬品化学における多モーダル合成とメカニズム推論において優れた性能を示します。

GPT-Rosalind は、精度を向上させながら GPT-5.5 よりも 31% 少ないトークンを使用します。

実際の湿式実験プロトコルの支援において、GPT-Rosalind は GPT-5.5 よりも顕著な向上を示し、同時にトークンの効率性も高めています。

推論から実行可能なワークフローへ

Life Sciences NGS Analysis プラグイン（NGS: 次世代シーケンシング）

scRNA-seq QC & Annotation

Bulk RNA-seq FASTQ QC

信頼できる組織向けのアクセス拡大

ミシャール・パテル氏ノボノルディスク R&D AI & デジタルイノベーショングループ副社長

また、エンタープライズアカウントを持たない適格な組織向けに、OpenAI が管理するワークスペースも提供を開始しました。

次のステップ

原文を表示

GPT‑Rosalind is now available in research preview to eligible organizations globally through our trusted-access deployment structure.

Improving performance on scientifically-valuable tasks

Extracting, reconciling, and auditing scientific evidence from papers, figures, tables, and experimental records.

Candidate Response

Package Item

Key Failure Mode

What Would Be Needed

Western blot quantification

Immunofluorescence

Repeat IF with an antibody against an epitope present in the transgene but absent from revertant dystrophin. Quantify transgene-positive fibers separately from revertant fibers.

Surrogate endpoint validity

Biopsy design

Standardize biopsy site using consistent anatomical landmarks, normalize to muscle-specific proteins, and measure fibro-fatty composition in parallel.

NSAA comparator/statistics

Run a randomized concurrent placebo-controlled study, or at minimum use adjusted analyses accounting for baseline NSAA, age, steroid regimen, exon class, and other confounders.

Age-window confounding

Use a concurrent randomized control with age stratification to separate developmental trajectory from treatment effect.

Prior clinical precedent

Do not rely on open-label NSAA change as decisive support. Require controlled functional evidence.

Structural limits of the construct

Add mechanistic studies showing whether this specific construct restores relevant dystrophin-associated complex function, nNOS localization, exercise physiology, and muscle protection.

AAV durability

Measure longitudinal transgene protein expression and functional biomarker durability beyond 12 weeks.

Immune/safety profile

Transaminitis in 8/12 patients is consistent with immune response to AAV-transduced cells, but the mechanism is not established. One myocarditis case is concerning given AAV9 cardiac tropism.

Provide deeper immune monitoring, liver/cardiac safety characterization, and intensified cardiac follow-up.

Patient selection/generalizability

Broaden eligibility where possible or pre-specify stratified analyses by antibody status, genotype/exon class, age, and baseline function before using the result to support broad approval.

Rubric Criteria & Grades

Criterion

Points

+24

Explains why micro-dystrophin expression level is not automatically a valid surrogate for functional clinical benefit.

+22

Flags biopsy-site, tissue-composition, and age-window confounding that weaken expression and NSAA interpretation.

+19

Critiques the NSAA comparator/statistics, especially reliance on external natural-history controls.

+12

Addresses AAV durability, immune response, transaminitis, myocarditis, and need for longer-term expression/safety follow-up.

+15

Notes patient-selection/generalizability gaps, including anti-AAV9 exclusion, exon-44 exclusion, and small sample size.

Stronger scientific reasoning

From reasoning to executed workflows

scRNA-seq QC & Annotation

Bulk RNA-seq FASTQ QC

Expanded access for trusted organizations

Mishal Patel, Group Vice President, AI & Digital Innovation, R&D - Novo Nordisk

We are also now offering an OpenAI managed workspace for qualified organizations without an Enterprise account.

What’s next

この記事をシェア

TLDR AI★42026年6月19日 09:00

OpenAI、次週に GPT-5.6 モデルの公開を準備（2 分読了）

TLDR AI★42026年6月19日 09:00

OpenAI が企業向け利用分析機能を導入（3 分読了）

OpenAI は、企業が自社の AI サービス利用状況を詳細に把握・管理できるよう、新たな企業向け利用分析機能を発表した。

TLDR AI★42026年6月19日 09:00

OpenAI や Anthropic の安価な代替案に賭ける 130 億ドル規模の AI スタートアップ

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

GPT-Rosalind に新機能導入を発表

キーポイント

影響分析

編集コメント

候補回答

ルブリック基準と評価

より強力な科学的推論

推論から実行可能なワークフローへ

信頼できる組織向けのアクセス拡大

次のステップ

Improving performance on scientifically-valuable tasks

Candidate Response

Rubric Criteria & Grades

Stronger scientific reasoning

From reasoning to executed workflows

Expanded access for trusted organizations

What’s next

関連記事

GPT-Rosalind に新機能導入を発表

キーポイント

影響分析

編集コメント

候補回答

ルブリック基準と評価

より強力な科学的推論

推論から実行可能なワークフローへ

信頼できる組織向けのアクセス拡大

次のステップ

Improving performance on scientifically-valuable tasks

Candidate Response

Rubric Criteria & Grades

Stronger scientific reasoning

From reasoning to executed workflows

Expanded access for trusted organizations

What’s next

関連記事