The Decoder·2026年3月12日 00:31·約1分

OpenAIの新しいトレーニングデータセットはAIモデルに信頼すべき指示を教える

#AIセキュリティ #トレーニングデータセット #プロンプトインジェクション防御 #信頼性 #OpenAI #モデル安全性

TL;DR

OpenAIは、AIモデルが信頼できる指示と信頼できない指示を区別して優先順位を付けることを学習させるための新しいトレーニングデータセット「IH-Challenge」をリリースし、初期結果ではセキュリティとプロンプトインジェクション防御の両方で大幅な改善が示されている。

AI深層分析2026年3月12日 01:42

重要/ 5段階

深度40%

キーポイント

新データセット「IH-Challenge」のリリース

OpenAIが、AIモデルに信頼できる指示を信頼できない指示よりも優先的に扱うことを教えることを目的とした新しいトレーニングデータセット「IH-Challenge」を公開した。

信頼性の優先付けを学習

このデータセットは、モデルが与えられた指示のうち、どれを信頼し、どれを優先すべきかを判断する能力を向上させるように設計されている。

セキュリティと防御の向上

初期の結果では、このアプローチがAIモデルのセキュリティを強化し、特にプロンプトインジェクション攻撃に対する防御能力を大幅に改善することが示されている。

影響分析・編集コメントを表示

影響分析

この開発は、AIシステムの安全性と信頼性を根本的に向上させる可能性がある。プロンプトインジェクションなどの攻撃に対する防御を強化することで、実世界でのAI応用の安全性が高まり、より複雑なタスクへの信頼できる展開が可能になる。

編集コメント

AIの安全性向上に向けた実践的なアプローチとして注目。データセットの公開が業界全体のセキュリティ基準向上に寄与する可能性がある。

image

OpenAIは、AIモデルが信頼できない指示よりも信頼できる指示を確実に優先するよう学習させるためのトレーニングデータセット「IH-Challenge」を公開しました。初期結果では、セキュリティとプロンプトインジェクション対策の両方で顕著な改善が確認されています。

この記事「OpenAI's new training dataset teaches AI models which instructions to trust」は、The Decoderで最初に公開されました。

原文を表示

OpenAI has released IH-Challenge, a training dataset designed to teach AI models to reliably prioritize trusted instructions over untrusted ones. Early results show significant improvements in both security and prompt injection defense.

AI systems receive instructions from multiple sources at once. System-level security policies, developer settings, user requests, and information from external tools can all contradict each other. When a model makes the wrong call about which instruction to follow, security policies can be bypassed and prompt injection attacks can succeed.

According to OpenAI, many of these problems share the same root cause: the model simply follows the wrong instruction. To address this, the company developed the training dataset "IH-Challenge," which uses reinforcement learning to teach models a clear pecking order: system over developer over user over tool.

OpenAI had already introduced a similar approach based on GPT-3.5 Turbo in 2024, but that version only supported three priority levels and relied on LLM judges for evaluation. IH-Challenge moves past both limitations. The new dataset adds a fourth hierarchy level for developers and replaces error-prone language model evaluations with simple Python scripts for automated verification.

Current training methods fail in three key areas

In the accompanying paper, OpenAI identifies three core pitfalls. First, errors in following complex instructions can be mistakenly flagged as hierarchy failures. Second, instruction conflicts are often subjective, making automated evaluation difficult. Third, models tend to learn shortcuts, such as rejecting harmless requests just to be safe.

IH-Challenge tackles these issues with deliberately simple tasks that can be automatically evaluated by scripts and don't allow for trivial shortcuts.

According to OpenAI, the internal model GPT-5 Mini-R trained on IH-Challenge shows clear improvements across academic and internal benchmarks when it comes to correctly prioritizing instructions. The biggest gains appeared in conflicts between developer and user-level instructions. At the same time, the model's general capabilities remained largely intact.

Prompt injections through tools get caught

The stronger instruction hierarchy translates into two concrete benefits, according to OpenAI. First, the model follows security policies in the system prompt more reliably without becoming less helpful overall. Second, robustness against prompt injection attacks improves significantly, particularly those that hide malicious instructions in tool outputs. OpenAI had previously documented similar vulnerabilities in ChatGPT Atlas.

OpenAI emphasizes that this capability will become a critical security feature as models become more agentic. Models that independently call tools and read untrusted documents need to reliably distinguish between legitimate and manipulative instructions. OpenAI has published the IH-Challenge dataset on Hugging Face to encourage further research.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Subscribe now

この記事をシェア

LY Corp Tech Blog2026年4月20日 11:00

エンジニア以外にもCoding Agent活用を広げる架け橋に ─ 個人開発から始まった、Codex×Electron製GUIエージェント誕生秘話インタビュー

TLDR AI2026年5月19日 09:00

イーロン・マスク氏によるサム・アルトマン CEO に対する訴訟の全請求が棄却される

The Verge AI重要度42026年5月19日 04:00

ムスク対アルトマン裁判は、AI が不適切な人物に導かれていることを証明した

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む