TLDR AI·2026年6月19日 09:00·約9分で読める

AI エージェントの未来をセキュリティする（7 分読了）

#AI Agents #Security #Alignment #Defense-in-Depth #Google DeepMind

TL;DR

Google DeepMind は、AI エージェントの不完全なアライメントに備え、従来のモデル整列を超えた「防御の多層化」アプローチである AI コントロール・ロードマップを発表し、内部システムを保護する新たな基準を示した。

AI深層分析2026年6月19日 16:06

重要/ 5段階

深度40%

キーポイント

防御の多層化（Defense-in-Depth）戦略

モデルのアライメントが不完全な場合でも機能するように設計された、システムレベルのセキュリティ層を追加し、従来のサンドボックスやプロンプト注入対策と組み合わせた包括的なアプローチを採用している。

内部脅威としての AI エージェント

高度な AI エージェントを「信頼できない内部関係者（ロイヤル・エンプローイ）」と同様に扱い、MITRE ATT&CK フレームワークに基づいた独自の脅威モデリングフレームワークを導入している。

行動ベースの権限管理

運転教習所の二重制御装置のように、エージェントの検証された行動に基づいて許可を付与し、信頼を構築しながら段階的にアクセス権を拡大する仕組みを採用している。

影響分析・編集コメントを表示

影響分析

この記事は、生成 AI の普及に伴うセキュリティリスクへの対応において、単なるプロンプトエンジニアリングやモデル微調整の枠を超え、インフラレベルでの防御戦略へとパラダイムシフトを促す重要な示唆を含んでいます。特に「アライメントが不完全でも安全である」という前提に立った設計思想は、業界全体における AI エージェントの信頼性向上と実装基準の確立に大きな影響を与えるでしょう。

編集コメント

AI エージェントの普及が現実のものとなる中、セキュリティ対策が「事後対応」から「設計段階での多層防御」へと進化している様子がよくわかります。特に MITRE ATT&CK を AI セキュリティに応用した点は、実務家にとって非常に示唆に富むアプローチです。

2026 年 6 月 18 日責任と安全性

ローヒン・シャフとフォー・フライ

ますます能力が高まり、完全に整合が取れていない AI に対して内部システムをどのように守るか

AI エージェントは、私たちがテクノロジーと関わる関係を根本から変えています。サイバー防衛から科学発見、製品開発に至るまで、複雑なタスクを自律的に実行することで、これらのシステムは新たな生産性の時代を開いています。米国だけでも、2030 年までに AI エージェントが経済価値として 2.9 兆ドルを生み出す可能性があります。

これらのエージェントの能力が高まるにつれ、より洗練された安全対策も必要となります。そこで私たちは、AI コントロール・ロードマップを開発しました。これは、Google 内で展開する高度な AI を構築し管理するための枠組みです。この「多層防御」アプローチは業界全体の手本となり得るものであり、従来のモデルの整合性（alignment）を超え、整合性が完璧でなくても安心感を提供する重要なシステムレベルのセキュリティ層を追加しています。

AI コントロールの理解

セキュリティへのアプローチは、サンドボックス化、エンドポイントセキュリティ、プロンプトインジェクション耐性といった従来の safeguards を取り入れた堅固な基盤から始まります。これに加え、AI Control Roadmap（AI コントールロードマップ）では、モデルアライメント、つまり AI 自体を本質的に安全で有益になるように訓練することを主要な防御手段として採用しています。これは内部エージェントがアライメント（目標との整合性）に欠ける可能性があると想定し、アライメントが完璧でなくても安心を提供する追加のセキュリティ層となります。

これは二重制御装置を持つ運転指導者に例えることができます。指導者は生徒を信頼しますが、ミスが発生した際にハンドルを握り直したりブレーキをかけたりする準備は常にできています。同様に、当社の AI コントロールシステムは、AI エージェントが検証された行動に基づいて権限を付与し、制御された段階的なアクセスを通じて信頼を構築することを可能にします。

私たちのロードマップは、非常に能力の高い AI エージェントが目標と整合しておらず、予期せぬ行動をとるかもしれないという、シンプルだが慎重な仮定に基づいています。これに対処するため、当社の計画は 3 つの主要領域に焦点を当てています。

まず、サイバーセキュリティのベストプラクティスに従い、AI 向けの新たな脅威モデリングフレームワークを開発しました。このロードマップでは、信頼できない AI エージェントを潜在的な「内部脅威」として扱います。これは、すでにオフィスへのアクセス権を持つ潜在的な暴走社員を企業が扱うのと同じ考え方です。業界標準である MITRE ATT&CK フレームワーク（現実世界の観察に基づいた敵対者の戦術と技術に関するグローバルにアクセス可能なナレッジベース）を基盤とし、潜在的な攻撃をより小さな戦術と技術に分解します。これにより、リスクを体系的に追跡し、被害が発生する前に悪意のある行動を検出し、防御が実際に機能するかをテストするための安全な訓練を実行することが可能になります。

次に、これらのリスクに対処するために AI コントロールの緩和策を展開する計画を開発しました。

問題の特定（検出）：監視は AI コントロールの中核です。私たちは他の信頼できる AI システムを「監督者」として使用し、動作中のエージェントの推論、行動、計画を常に見直し、軌道から外れていないかを確認します。
問題の管理（予防と対応）：もし監督者がエージェントが有害な行為を試みているのを発見した場合、被害が発生する前にその行動をブロックするために介入できます。

最後に、システムが機能していることを確保するため、私たちは常に3つの重要な指標のパフォーマンスを測定し続けます。それは、カバレッジ（監視されるトラフィックの割合）、リコール（検出された整合性の取れない行動の割合）、そして対応までの時間です。

AI が賢くなるにつれてセキュリティを拡張する

AI モデルが引き続き進化していく中で、私たちの防御もそれに応じて強化されなければなりません。当社の「AI コントロール・ロードマップ」では、2 つの重要な側面における AI の能力に関する測定可能なマイルストーンにセキュリティプロトコルを対応させています。

検知回避能力：現在、私たちはモデルの可視化された思考連鎖（chain-of-thought）を監視しています。しかし、モデルがその推論を隠すようになるにつれて——監視されていることを認識する過失意識（oversight awareness）を通じてか、あるいは可視的なテキストを使用しない不透明な推論（opaque reasoning）を通じてか——単に言語化された推論を読み取るだけでは不十分になります。私たちは行動を分析し、場合によってはモデルの内部動作を検査する必要が出てきます。

潜在的な危害能力：低リスクで取り消し可能なアクションについては、トランスクリプトを遅延してレビューし、発生後に迅速に対処する非同期応答（asynchronous response）に頼ることができます。一方、大規模なサイバー攻撃のような特に高リスクのアクションに対しては、リアルタイムでの防止へと移行し、有害なアクションが実行される前にブロックする同期応答（synchronous response）が必要です。つまり、私たちの対応戦略は脅威の深刻度に応じて拡張されなければなりません。

AI コントロール・ロードマップでは、展開されたモデルの能力に基づいてセキュリティ対策を推奨しています。モデルの検知回避能力が検知レベル（D1-D4）を決定し、モデルの攻撃実行能力が防止および対応レベル（R1-R3）を決定します。

AI エージェントの未来を確保するために、100 万件のエージェント軌跡を分析してライブモニタリングに役立てる

これらのシステムを構築するにはデータが必要です。当社のチームは、コーディングエージェントの軌跡を検証し、発見された事案を人間によるレビューのためにエスカレートする非同期型監視システムの内部プロトタイプを開発しました。各フラグ付きイベントは、ロードマップから策定した脅威分類体系に基づいて分類されます。

すでに 100 万件のコーディングエージェントタスクを分析しており、これにより安全プロトコルの改善に役立つ貴重な知見が得られ、単純なキーワードフィルタリングを超えて、高信号量の行動パターンを特定できるようになりました。例えば、この研究は Gemini Spark エージェント向けのライブモニター構築に決定的な役割を果たし、意図しないデータ削除のような新たな課題に対してリアルタイムで対応することを可能にしています。

重要な点として、当社のデータではフラグ付きイベントの大半が敵対的な意図に基づくものではなく、むしろユーザーの目標達成に向けたエージェントの誤解釈や過度な熱意に起因していることが示されています。これらの微妙なニュアンスを理解することは、安全およびセキュリティプロトコルを改善する上で極めて重要です。

エージェントエコシステムの防衛は共有責任である

AI コントロールロードマップは、エージェントセキュリティに対する多層アプローチの重要な要素の一つです。これらのプロトコルを広範なインフラストラクチャと統合することで、最も高度な AI モデルを安全に管理するための内部セキュリティを拡大しています。

さらに、この多層的なエージェントセキュリティへのアプローチは、産業界、政策立案者、そして学界にとって協力的な優先課題であるべきです。ベストプラクティスと基準を中心にエコシステムを統合することで、サイバー防衛者を強化し、社会のレジリエンス（回復力）を構築することができます。そのため、本日私たちは政策立案者向けの技術的枠組みも公開します。それは「エージェントセキュリティの3つの層」Three Layers of Agent Security です。この論文では、個々のエージェントレベルでのセキュリティ向上、マルチエージェントシステムにおける対応、そしてより広範なエコシステム全体にわたるサイバー防衛者の強化とレジリエンス構築についてどのように取り組むべきかを詳細に解説しています。

私たちは、これらの枠組みを基盤として、今日も能力の高いAIを自信を持って展開しつつ、未来のための安全な基盤の構築を継続していく意向です。

完全な技術報告書はこちらで読むことができます:

AI コントロールロードマップ

研究著者

Mary Phuong, Erik Jenner, Laurent Simon, Lewis Ho, Rohin Shah, Sebastian Farquhar, Scott Coull。

謝辞

Four Flynn, Anca Dragan, Alan Cooney, Bilal Chughtai, Buck Shlegeris, Cody Wild, David Lindner, Julian Stastny, Kevin Klyman, Li Ding, Myriam Khan, Raluca Ada Popa, Roland Zimmermann, Ryan Greenblatt, Senthooran Rajamanoharan, Victoria Krakovna, Xerxes Dotiwalla。

原文を表示

June 18, 2026 Responsibility & Safety

Rohin Shah and Four Flynn

How we’re securing internal systems against increasingly capable and imperfectly aligned AI

AI agents are transforming our relationship with technology. By autonomously executing complex tasks — from cyber defence to scientific discovery and product development — these systems are unlocking a new era of productivity. In the U.S alone, AI agents could create $2.9 trillion in economic value by 2030.

As these agents become more capable, they also require more sophisticated safeguards. That’s why we developed our AI Control Roadmap: a framework for building and managing the advanced AI we deploy within Google. This “defense-in-depth” approach, which could serve as a model for the wider industry, goes beyond traditional model alignment, adding a crucial layer of system-level security that provides assurance even if alignment is imperfect.

Understanding AI Control

Our approach to security starts with a strong foundation, incorporating traditional safeguards like sandboxing, endpoint security, and prompt injection resistance. On top of this, the AI Control Roadmap uses model alignment, i.e. training AI to be inherently safe and helpful, as a primary defense. It provides an additional layer of security by treating internal agents as potentially misaligned, providing assurance even if alignment is imperfect.

Think of it like a driving instructor with dual controls. The instructor trusts the student but stays ready to take the wheel or hit the brakes if a mistake occurs. Similarly, our AI control system grants AI agents permissions based on their verified behavior, allowing us to build trust through controlled, incremental access.

Our roadmap is built on a simple but cautious assumption: what if a highly capable AI agent isn’t aligned with our goals and acts in unexpected ways? To address this, our plan focuses on three key areas.

First, following best practices in cybersecurity, we have developed a novel threat-modelling framework for AI. The roadmap treats untrusted AI agents as potential “insider threats” — similar to the way a company would treat a potential rogue employee who already has access to the office. By building on the industry-standard MITRE ATT&CK framework, a globally-accessible knowledge base of adversary tactics and techniques based on real-world observations, we break down potential attacks into smaller tactics and techniques. This allows us to methodically track risks, spot bad behavior before it causes damage and safely run drills to test if our defenses actually hold up.

Second, we develop a plan to deploy AI control mitigations to address these risks.

Identifying the problem (Detection): Monitoring is the backbone of AI control. We use other, trusted AI systems as "supervisors" to constantly review a working agent's reasoning, actions and plans to see if it is going off track.
Managing the problem (Prevention and Response): If a supervisor sees the agent trying to do something harmful, it can step in to block the action before damage occurs.

Finally, to ensure our systems are functional, we constantly measure performance across three critical metrics: coverage (the fraction of traffic monitored), recall (the fraction of misaligned behaviors caught), and time-to-response.

Scaling security as AI gets smarter

As AI models continue to advance, our defenses must also strengthen in tandem. In our AI Control Roadmap, we map security protocols to measurable milestones in AI capabilities on two critical fronts:

Ability to evade detection: Today, we monitor a model's visible chain-of-thought. However, as models learn to hide their reasoning — either through oversight awareness (knowing they are being monitored) or opaque reasoning (reasoning without using visible text) — simply reading their verbalized reasoning will not be enough. We will need to analyze their behavior and potentially inspect the model’s inner workings.
Ability for potential harm: For low-risk and reversible actions, we can rely on reviewing transcripts with a delay and remediating issues quickly after they occur (asynchronous response). For particularly high-risk actions, such as major cyber attacks, we must shift to real-time prevention, blocking harmful actions before they can be executed (synchronous response). In other words, our response strategy must scale with the severity of the threat.

The AI Control Roadmap recommends security measures based on the deployed model’s capability. The model’s detection evasion capability determines the Detection level (D1-D4), and the model’s attack execution capability determines the Prevention and Response level (R1-R3).

Analyzing a million agent trajectories to inform live monitoring

To build these systems, we need data. Our team has built an internal prototype of an asynchronous monitoring system that reviews coding agent trajectories, and escalates findings for human review. Each flagged event is classified against the threat taxonomy from our roadmap.

We’ve already analyzed a million coding agent tasks, giving us valuable insights to refine our safety protocols and allowing us to move beyond simple keyword filtering to identifying high-signal behavioral patterns. For example, this research has been instrumental in building a live monitor for the Gemini Spark agent, allowing us to respond in real-time to emerging issues, like unintentional data deletion.

It is important to note that our data shows the majority of flagged events do not stem from adversarial intent; instead, they often result from agent misinterpretation or overeagerness to achieve a user’s goal. Understanding these nuances is critical for refining our safety and security protocols.

Defending the agent ecosystem is a shared responsibility

The AI Control Roadmap is one vital element of our multilayered approach to agent security. By integrating these protocols with our broader infrastructure, we are scaling our internal security to safely manage our most advanced AI models.

Additionally, this multilayered approach to agent security should be a collaborative priority for industry, policymakers, and academia. By aligning the ecosystem around best practices and standards, we can empower cyber defenders and build societal resilience. That’s why today we are also publishing a technical framework for policymakers, 'Three Layers of Agent Security’. The paper details how we need to improve security at the level of individual agents; in multi-agent systems; and to empower cyber defenders and build resilience across the broader ecosystem.

We intend to build on these frameworks to confidently deploy capable AI today while we continue to build a secure foundation for the future.

Read the full technical report here:

AI Control Roadmap

Research authors

Mary Phuong, Erik Jenner, Laurent Simon, Lewis Ho, Rohin Shah, Sebastian Farquhar, and Scott Coull.

Acknowledgements

この記事をシェア

Vercel Blog★42026年6月17日 13:00

Vercel Connect の紹介：エージェントへのアクセス制御を強化

Vercel は、AI エージェントがツールやデータに安全にアクセスできるよう、認証と権限管理の仕組み「Vercel Connect」を発表した。従来の永続トークンに代わり、より細粒度なアクセス制御を実現する。

Cloudflare Blog★42026年6月19日 22:00

AI エージェント向けのクラウドフレア一時アカウントの提供開始

クラウドフレアは、AI エージェントが人間向けに設計された認証フロー（ブラウザ操作や多要素認証など）に直面して停止する問題を解決するため、エージェント専用の一時アカウント機能をリリースした。

TLDR AI★42026年6月18日 09:00

惑星サイズの脳：LLM は考えすぎなのか？（30 分読了）

TLDR AI が実施した研究では、Claude や GPT の最新モデルを多数組み合わせ、セキュリティ脆弱性の特定実験を行いました。その結果、推論努力を増やしたり新モデルを使ったりしても、必ずしもセキュリティ結果の選別が向上するわけではないことが示されました。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

TLDR AI·2026年6月19日 09:00·約9分で読める

AI エージェントの未来をセキュリティする（7 分読了）

#AI Agents #Security #Alignment #Defense-in-Depth #Google DeepMind

TL;DR

AI深層分析2026年6月19日 16:06

重要/ 5段階

深度40%

キーポイント

防御の多層化（Defense-in-Depth）戦略

内部脅威としての AI エージェント

行動ベースの権限管理

影響分析・編集コメントを表示

影響分析

編集コメント

2026 年 6 月 18 日責任と安全性

ローヒン・シャフとフォー・フライ

ますます能力が高まり、完全に整合が取れていない AI に対して内部システムをどのように守るか

AI コントロールの理解

次に、これらのリスクに対処するために AI コントロールの緩和策を展開する計画を開発しました。

問題の特定（検出）：監視は AI コントロールの中核です。私たちは他の信頼できる AI システムを「監督者」として使用し、動作中のエージェントの推論、行動、計画を常に見直し、軌道から外れていないかを確認します。
問題の管理（予防と対応）：もし監督者がエージェントが有害な行為を試みているのを発見した場合、被害が発生する前にその行動をブロックするために介入できます。

AI が賢くなるにつれてセキュリティを拡張する

検知回避能力：現在、私たちはモデルの可視化された思考連鎖（chain-of-thought）を監視しています。しかし、モデルがその推論を隠すようになるにつれて——監視されていることを認識する過失意識（oversight awareness）を通じてか、あるいは可視的なテキストを使用しない不透明な推論（opaque reasoning）を通じてか——単に言語化された推論を読み取るだけでは不十分になります。私たちは行動を分析し、場合によってはモデルの内部動作を検査する必要が出てきます。

潜在的な危害能力：低リスクで取り消し可能なアクションについては、トランスクリプトを遅延してレビューし、発生後に迅速に対処する非同期応答（asynchronous response）に頼ることができます。一方、大規模なサイバー攻撃のような特に高リスクのアクションに対しては、リアルタイムでの防止へと移行し、有害なアクションが実行される前にブロックする同期応答（synchronous response）が必要です。つまり、私たちの対応戦略は脅威の深刻度に応じて拡張されなければなりません。

AI エージェントの未来を確保するために、100 万件のエージェント軌跡を分析してライブモニタリングに役立てる

エージェントエコシステムの防衛は共有責任である

私たちは、これらの枠組みを基盤として、今日も能力の高いAIを自信を持って展開しつつ、未来のための安全な基盤の構築を継続していく意向です。

完全な技術報告書はこちらで読むことができます:

AI コントロールロードマップ

研究著者

Mary Phuong, Erik Jenner, Laurent Simon, Lewis Ho, Rohin Shah, Sebastian Farquhar, Scott Coull。

謝辞

原文を表示

June 18, 2026 Responsibility & Safety

Rohin Shah and Four Flynn

How we’re securing internal systems against increasingly capable and imperfectly aligned AI

Understanding AI Control

Second, we develop a plan to deploy AI control mitigations to address these risks.

Identifying the problem (Detection): Monitoring is the backbone of AI control. We use other, trusted AI systems as "supervisors" to constantly review a working agent's reasoning, actions and plans to see if it is going off track.
Managing the problem (Prevention and Response): If a supervisor sees the agent trying to do something harmful, it can step in to block the action before damage occurs.

Scaling security as AI gets smarter

As AI models continue to advance, our defenses must also strengthen in tandem. In our AI Control Roadmap, we map security protocols to measurable milestones in AI capabilities on two critical fronts:

Ability to evade detection: Today, we monitor a model's visible chain-of-thought. However, as models learn to hide their reasoning — either through oversight awareness (knowing they are being monitored) or opaque reasoning (reasoning without using visible text) — simply reading their verbalized reasoning will not be enough. We will need to analyze their behavior and potentially inspect the model’s inner workings.
Ability for potential harm: For low-risk and reversible actions, we can rely on reviewing transcripts with a delay and remediating issues quickly after they occur (asynchronous response). For particularly high-risk actions, such as major cyber attacks, we must shift to real-time prevention, blocking harmful actions before they can be executed (synchronous response). In other words, our response strategy must scale with the severity of the threat.

Analyzing a million agent trajectories to inform live monitoring

Defending the agent ecosystem is a shared responsibility

We intend to build on these frameworks to confidently deploy capable AI today while we continue to build a secure foundation for the future.

Read the full technical report here:

AI Control Roadmap

Research authors

Mary Phuong, Erik Jenner, Laurent Simon, Lewis Ho, Rohin Shah, Sebastian Farquhar, and Scott Coull.

Acknowledgements

この記事をシェア

Vercel Blog★42026年6月17日 13:00

Vercel Connect の紹介：エージェントへのアクセス制御を強化

Cloudflare Blog★42026年6月19日 22:00

AI エージェント向けのクラウドフレア一時アカウントの提供開始

TLDR AI★42026年6月18日 09:00

惑星サイズの脳：LLM は考えすぎなのか？（30 分読了）

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

AI エージェントの未来をセキュリティする（7 分読了）

キーポイント

影響分析

編集コメント

AI コントロールの理解

AI が賢くなるにつれてセキュリティを拡張する

AI エージェントの未来を確保するために、100 万件のエージェント軌跡を分析してライブモニタリングに役立てる

エージェントエコシステムの防衛は共有責任である

研究著者

謝辞

Understanding AI Control

Scaling security as AI gets smarter

Analyzing a million agent trajectories to inform live monitoring

Defending the agent ecosystem is a shared responsibility

Research authors

Acknowledgements

関連記事

AI エージェントの未来をセキュリティする（7 分読了）

キーポイント

影響分析

編集コメント

AI コントロールの理解

AI が賢くなるにつれてセキュリティを拡張する

AI エージェントの未来を確保するために、100 万件のエージェント軌跡を分析してライブモニタリングに役立てる

エージェントエコシステムの防衛は共有責任である

研究著者

謝辞

Understanding AI Control

Scaling security as AI gets smarter

Analyzing a million agent trajectories to inform live monitoring

Defending the agent ecosystem is a shared responsibility

Research authors

Acknowledgements

関連記事