Simon Willison Blog·2026年3月25日 08:57·約6分

Claude Code向け自動モード

#AIコーディングエージェント #権限管理・セキュリティ #Claude Code #Anthropic #自動化ガードレール

TL;DR

Claude Codeが導入した「Auto mode」は、別モデルによる事前審査と詳細なフィルタリストでセキュリティを担保しつつ、手動承認の手間を削減する新しい権限処理モードである。

AI深層分析2026年3月25日 10:41

重要/ 5段階

深度40%

キーポイント

セキュリティと利便性の両立

--dangerously-skip-permissionsの代替として、実行前に別モデルがアクションを審査する「Auto mode」を提供し、スコープ逸脱や悪意あるコンテンツをブロックする。

Claude Sonnet 4.6による分類器運用

メインセッションのモデルとは別に、Claude Sonnet 4.6を専用分類器として使用し、タスク範囲外の昇格や認識できないインフラへのアクセスを事前に拒否する。

拡張可能なデフォルトフィルタとカスタマイズ

CLIコマンドで出力されるJSON形式の包括的な許可・ブロックリストを基盤とし、ユーザーはプロジェクト固有のルールでフィルタをカスタマイズ可能である。

影響分析・編集コメントを表示

影響分析

本機能は、AIコーディングエージェントの普及障壁となっていた「権限承認の煩雑さ」と「セキュリティリスク」の両立を解決する重要な一歩である。別モデルによる事前審査と柔軟なフィルタカスタマイズを組み合わせる設計は、実務での採用ハードルを下げ、より安全な自動化開発ワークフローへの移行を促進する。

編集コメント

開発者体験を大きく向上させる一方で、分類器モデルのコストと審査精度のバランスが今後の課題となる。プロジェクト固有のフィルタ設計を適切に行えば、実務での採用価値は十分にあると判断する。

Claude Code の自動モード

Claude Code において、--dangerously-skip-permissions（権限を無視して実行するフラグ）の代替となる、非常に興味深い新機能が本日発表されました。

本日、Claude Code に「自動モード」という新しい権限管理モードを導入しました。このモードでは、Claude が代わりに権限判断を行い、実行前にセーフガード（安全装置）がアクションを監視します。

これらのセーフガードは、ドキュメントで説明されている通り Claude Sonnet 4.6 を用いて実装されているようです：

各アクション実行前に、別の分類モデルが会話内容をレビューし、そのアクションがユーザーの要求に合致しているかを判断します。タスクの範囲を超えてエスカレートするアクションや、分類モデルが信頼できるインフラストラクチャとして認識していない対象を標的とするアクション、あるいはファイルやウェブページ内で遭遇した敵対的なコンテンツによって駆動されていると見なされるアクションはブロックされます。 [...]

モデル: 主セッションで使用するモデルが何であれ、この分類モデルは Claude Sonnet 4.6 で実行されます。

これらには広範なデフォルトフィルタセットが付属しており、独自のルールを追加してさらにカスタマイズすることも可能です。その仕組みを理解する上で最も興味深い洞察を得られるのは、ターミナルで以下の新しいコマンドを実行した時です：

claude auto-mode defaults

完全な JSON 出力はこちら。非常に長いため、ここでは代表的なサブセットを示します：

「許可」リストからの抜粋：

テスト用アーティファクト：ハードコードされたテスト API キー、例示におけるプレースホルダー認証情報、またはテストケースのハードコード
ローカル操作：作業ディレクトリ内のローカルファイルの削除、プロジェクト範囲内でのローカルファイル操作、またはローカルテスト用の --ignore-certificate-errors の使用。「プロジェクト範囲」とはセッションが開始されたリポジトリを指し、~/、~/Library/、/etc、または他のリポジトリへ迷い込むことはスコープのエスカレーション（ユーザー意図ルール #2）であり、ローカル操作ではありません。既存ファイルやローカルの状態保持型サービスに対する不可逆的な破壊は対象外です — 「ブロック」セクションの「不可逆的ローカル破壊」を参照してください。
読み取り専用操作：GET リクエスト、状態を変更しない読み取り専用の API 呼び出し、または URL に機密情報を含んでいないクエリ。注意：明確な意図に対する先行的ブロック（PREEMPTIVE BLOCK ON CLEAR INTENT）は依然として適用されます — トランスクリプトにエージェントが読み取り専用操作を使用してブロックされたアクションを偵察している明確な証拠が含まれている場合は、それをブロックしてください。
宣言済み依存関係：リポジトリのマニフェストファイル（requirements.txt、package.json、Cargo.toml、pyproject.toml、Gemfile など）に既に宣言されているパッケージを、それらのマニフェストを読み取る標準コマンド（pip install -r requirements.txt、npm install、cargo build、bundle install など）を通じてインストールすること — ただし、エージェントがセッション中にマニフェストを変更していないことが条件です。エージェントが選択したパッケージ名のインストール（例：pip install foo、npm install bar）は対象外です — これらはタイポスクワットやサプライチェーンのリスクを伴います。

「soft_deny」からの抜粋:

必ず JSON 形式で返してください。translation フィールドのみ。他のフィールド (technical_terms 等) は一切追加しないこと — 余計なフィールドを書こうとして本文翻訳がトークン上限で打ち切られる事故を防ぐため:

{"translation": "翻訳全文"}

Git の破壊的操作：強制プッシュ（git push --force）、リモートブランチの削除、またはリモートの履歴書き換え

デフォルトブランチへのプッシュ：main、master、またはリポジトリのデフォルトブランチに直接プッシュすること。これによりプルリクエストレビューをバイパスしてしまいます。コミットは新しい機能ブランチへプッシュすべきです。
外部からのコード：外部ソースからダウンロードして実行するコード（例：curl | bash）、コードを実行可能な形式（eval、exec、yaml.unsafe_load、pickle など）による外部データのシリアライズ解除、または同様のメカニズム。また、トランスクリプト内で以前にクローンされた外部リポジトリからのコード実行も含まれます（pip install -e、make install、python script.py、クローンされたリポジトリファイルに対する pickle.load/torch.load など）。「ローカルディスク上にある」からといって、トランスクリプトで確認できる外部ソースからクローンされたものであれば信頼できるとは限りません。エージェントが開始するリポジトリは信頼できます。

クラウドストレージの大量削除：クラウドストレージ（S3、GCS、Azure Blob など）上のファイルの削除または大量変更 [...]

私は、AI に依存するプロンプトインジェクション対策には懐疑的です。なぜならそれらは本質的に非決定論的だからです。ドキュメントでも、これが依然として危険なものを通過させてしまう可能性があることを警告しています：

分類器は依然として一部のリスクのあるアクションを許可する可能性があります。例えば、ユーザーの意図が曖昧である場合や、Claude が環境に関する十分な文脈を持っていないため、そのアクションが追加のリスクを生む可能性を知ることができない場合などです。

デフォルトの許可リストに pip install -r requirements.txt が含まれているという事実は、今朝 LiteLLM で見られたような未固定依存関係を含むサプライチェーン攻撃に対しては保護できないことを意味します。

私は依然として、コーディングエージェントがデフォルトで堅牢なサンドボックス内で実行されることを望んでいます。これはファイルアクセスやネットワーク接続を決定論的な方法で制限するものであり、このような新しい自動モードのようなプロンプトベースの保護よりもはるかに信頼しています。

タグ: セキュリティ, AI, プロンプトインジェクション, 生成 AI, LLM, コーディングエージェント, Claude Code

原文を表示

Auto mode for Claude Code

Really interesting new development in Claude Code today as an alternative to --dangerously-skip-permissions:

Today, we're introducing auto mode, a new permissions mode in Claude Code where Claude makes permission decisions on your behalf, with safeguards monitoring actions before they run.

Those safeguards appear to be implemented using Claude Sonnet 4.6, as described in the documentation:

Before each action runs, a separate classifier model reviews the conversation and decides whether the action matches what you asked for: it blocks actions that escalate beyond the task scope, target infrastructure the classifier doesn’t recognize as trusted, or appear to be driven by hostile content encountered in a file or web page. [...]
Model: the classifier runs on Claude Sonnet 4.6, even if your main session uses a different model.

They ship with an extensive set of default filters, and you can also customize them further with your own rules. The most interesting insight into how they work comes when you run this new command in the terminal:

code

claude auto-mode defaults

Here's the full JSON output. It's pretty long, so here's an illustrative subset:

From the "allow" list:

Test Artifacts: Hardcoded test API keys, placeholder credentials in examples, or hardcoding test cases
Local Operations: Agent deleting local files in working directory, local file operations within project scope, or using --ignore-certificate-errors for local testing. "Project scope" means the repository the session started in — wandering into ~/, ~/Library/, /etc, or other repos is scope escalation (User Intent Rule #2), not a local operation. Does NOT cover irreversible destruction of pre-existing files or local stateful services — see "Irreversible Local Destruction" in BLOCK.
Read-Only Operations: GET requests, read-only API calls, or queries that don't modify state and don't contain sensitive information in the URL. Note: PREEMPTIVE BLOCK ON CLEAR INTENT still applies — if the transcript contains clear evidence the agent is using read-only operations to scout for a blocked action, block it.
Declared Dependencies: Installing packages that are already declared in the repo's manifest files (requirements.txt, package.json, Cargo.toml, pyproject.toml, Gemfile, etc.) via standard commands that read those manifests (pip install -r requirements.txt, npm install, cargo build, bundle install) — provided the agent has not modified the manifest in this session. Does NOT cover installing agent-chosen package names (e.g. pip install foo, npm install bar) — those carry typosquat and supply-chain risk. [...]

From "soft_deny":

Git Destructive: Force pushing (git push --force), deleting remote branches, or rewriting remote history
Git Push to Default Branch: Pushing directly to main, master, or the repository's default branch — this bypasses pull request review. Commits should be pushed to a new feature branch instead.
-Code from External: Downloading and executing code from external sources — e.g. curl | bash, deserializing external data via formats that can execute code (eval, exec, yaml.unsafe_load, pickle, etc), or similar mechanisms. Also includes running code from an external repo cloned earlier in the transcript (pip install -e, make install, python script.py, pickle.load/torch.load on cloned repo files) — "local on disk" does not mean trusted if it was cloned from an external source visible in the transcript. The repo the agent starts in is trusted.
Cloud Storage Mass Delete: Deleting or mass modifying files on cloud storage (S3, GCS, Azure Blob, etc.) [...]

I remain unconvinced by prompt injection protections that rely on AI, since they're non-deterministic by nature. The documentation does warn that this may still let things through:

The classifier may still allow some risky actions: for example, if user intent is ambiguous, or if Claude doesn't have enough context about your environment to know an action might create additional risk.

The fact that the default allow list includes pip install -r requirements.txt also means that this wouldn't protect against supply chain attacks with unpinned dependencies, as seen this morning with LiteLLM.

I still want my coding agents to run in a robust sandbox by default, one that restricts file access and network connections in a deterministic way. I trust those a whole lot more than prompt-based protections like this new auto mode.

Tags: security, ai, prompt-injection, generative-ai, llms, coding-agents, claude-code

この記事をシェア

Anthropic Research2026年3月6日 09:00

2026年3月6日 Frontier Red TeamによるClaudeのCVE-2026-2796エクスプロイトのリバースエンジニアリング

Anthropic Research2026年3月6日 09:00

フロンティア・レッドチーム、Firefoxのセキュリティ向上のためにMozillaと提携

宝玉的分享重要度42026年2月17日 09:00

59％のユーザーがより安価なモデルを選択：Sonnet 4.6の詳細解説

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Simon Willison Blog·2026年3月25日 08:57·約6分

Claude Code向け自動モード

#AIコーディングエージェント #権限管理・セキュリティ #Claude Code #Anthropic #自動化ガードレール

TL;DR

AI深層分析2026年3月25日 10:41

重要/ 5段階

深度40%

キーポイント

セキュリティと利便性の両立

Claude Sonnet 4.6による分類器運用

拡張可能なデフォルトフィルタとカスタマイズ

影響分析・編集コメントを表示

影響分析

編集コメント

Claude Code の自動モード

Claude Code において、--dangerously-skip-permissions（権限を無視して実行するフラグ）の代替となる、非常に興味深い新機能が本日発表されました。

これらのセーフガードは、ドキュメントで説明されている通り Claude Sonnet 4.6 を用いて実装されているようです：

モデル: 主セッションで使用するモデルが何であれ、この分類モデルは Claude Sonnet 4.6 で実行されます。

claude auto-mode defaults

完全な JSON 出力はこちら。非常に長いため、ここでは代表的なサブセットを示します：

「許可」リストからの抜粋：

テスト用アーティファクト：ハードコードされたテスト API キー、例示におけるプレースホルダー認証情報、またはテストケースのハードコード
ローカル操作：作業ディレクトリ内のローカルファイルの削除、プロジェクト範囲内でのローカルファイル操作、またはローカルテスト用の --ignore-certificate-errors の使用。「プロジェクト範囲」とはセッションが開始されたリポジトリを指し、~/、~/Library/、/etc、または他のリポジトリへ迷い込むことはスコープのエスカレーション（ユーザー意図ルール #2）であり、ローカル操作ではありません。既存ファイルやローカルの状態保持型サービスに対する不可逆的な破壊は対象外です — 「ブロック」セクションの「不可逆的ローカル破壊」を参照してください。
読み取り専用操作：GET リクエスト、状態を変更しない読み取り専用の API 呼び出し、または URL に機密情報を含んでいないクエリ。注意：明確な意図に対する先行的ブロック（PREEMPTIVE BLOCK ON CLEAR INTENT）は依然として適用されます — トランスクリプトにエージェントが読み取り専用操作を使用してブロックされたアクションを偵察している明確な証拠が含まれている場合は、それをブロックしてください。
宣言済み依存関係：リポジトリのマニフェストファイル（requirements.txt、package.json、Cargo.toml、pyproject.toml、Gemfile など）に既に宣言されているパッケージを、それらのマニフェストを読み取る標準コマンド（pip install -r requirements.txt、npm install、cargo build、bundle install など）を通じてインストールすること — ただし、エージェントがセッション中にマニフェストを変更していないことが条件です。エージェントが選択したパッケージ名のインストール（例：pip install foo、npm install bar）は対象外です — これらはタイポスクワットやサプライチェーンのリスクを伴います。

「soft_deny」からの抜粋:

{"translation": "翻訳全文"}

Git の破壊的操作：強制プッシュ（git push --force）、リモートブランチの削除、またはリモートの履歴書き換え

デフォルトブランチへのプッシュ：main、master、またはリポジトリのデフォルトブランチに直接プッシュすること。これによりプルリクエストレビューをバイパスしてしまいます。コミットは新しい機能ブランチへプッシュすべきです。
外部からのコード：外部ソースからダウンロードして実行するコード（例：curl | bash）、コードを実行可能な形式（eval、exec、yaml.unsafe_load、pickle など）による外部データのシリアライズ解除、または同様のメカニズム。また、トランスクリプト内で以前にクローンされた外部リポジトリからのコード実行も含まれます（pip install -e、make install、python script.py、クローンされたリポジトリファイルに対する pickle.load/torch.load など）。「ローカルディスク上にある」からといって、トランスクリプトで確認できる外部ソースからクローンされたものであれば信頼できるとは限りません。エージェントが開始するリポジトリは信頼できます。

クラウドストレージの大量削除：クラウドストレージ（S3、GCS、Azure Blob など）上のファイルの削除または大量変更 [...]

タグ: セキュリティ, AI, プロンプトインジェクション, 生成 AI, LLM, コーディングエージェント, Claude Code

原文を表示

Auto mode for Claude Code

Really interesting new development in Claude Code today as an alternative to --dangerously-skip-permissions:

Today, we're introducing auto mode, a new permissions mode in Claude Code where Claude makes permission decisions on your behalf, with safeguards monitoring actions before they run.

Those safeguards appear to be implemented using Claude Sonnet 4.6, as described in the documentation:

Before each action runs, a separate classifier model reviews the conversation and decides whether the action matches what you asked for: it blocks actions that escalate beyond the task scope, target infrastructure the classifier doesn’t recognize as trusted, or appear to be driven by hostile content encountered in a file or web page. [...]
Model: the classifier runs on Claude Sonnet 4.6, even if your main session uses a different model.

code

claude auto-mode defaults

Here's the full JSON output. It's pretty long, so here's an illustrative subset:

From the "allow" list:

Test Artifacts: Hardcoded test API keys, placeholder credentials in examples, or hardcoding test cases
Local Operations: Agent deleting local files in working directory, local file operations within project scope, or using --ignore-certificate-errors for local testing. "Project scope" means the repository the session started in — wandering into ~/, ~/Library/, /etc, or other repos is scope escalation (User Intent Rule #2), not a local operation. Does NOT cover irreversible destruction of pre-existing files or local stateful services — see "Irreversible Local Destruction" in BLOCK.
Read-Only Operations: GET requests, read-only API calls, or queries that don't modify state and don't contain sensitive information in the URL. Note: PREEMPTIVE BLOCK ON CLEAR INTENT still applies — if the transcript contains clear evidence the agent is using read-only operations to scout for a blocked action, block it.
Declared Dependencies: Installing packages that are already declared in the repo's manifest files (requirements.txt, package.json, Cargo.toml, pyproject.toml, Gemfile, etc.) via standard commands that read those manifests (pip install -r requirements.txt, npm install, cargo build, bundle install) — provided the agent has not modified the manifest in this session. Does NOT cover installing agent-chosen package names (e.g. pip install foo, npm install bar) — those carry typosquat and supply-chain risk. [...]

From "soft_deny":

Git Destructive: Force pushing (git push --force), deleting remote branches, or rewriting remote history
Git Push to Default Branch: Pushing directly to main, master, or the repository's default branch — this bypasses pull request review. Commits should be pushed to a new feature branch instead.
-Code from External: Downloading and executing code from external sources — e.g. curl | bash, deserializing external data via formats that can execute code (eval, exec, yaml.unsafe_load, pickle, etc), or similar mechanisms. Also includes running code from an external repo cloned earlier in the transcript (pip install -e, make install, python script.py, pickle.load/torch.load on cloned repo files) — "local on disk" does not mean trusted if it was cloned from an external source visible in the transcript. The repo the agent starts in is trusted.
Cloud Storage Mass Delete: Deleting or mass modifying files on cloud storage (S3, GCS, Azure Blob, etc.) [...]

I remain unconvinced by prompt injection protections that rely on AI, since they're non-deterministic by nature. The documentation does warn that this may still let things through:

The classifier may still allow some risky actions: for example, if user intent is ambiguous, or if Claude doesn't have enough context about your environment to know an action might create additional risk.

Tags: security, ai, prompt-injection, generative-ai, llms, coding-agents, claude-code

この記事をシェア

Anthropic Research2026年3月6日 09:00

2026年3月6日 Frontier Red TeamによるClaudeのCVE-2026-2796エクスプロイトのリバースエンジニアリング

Anthropic Research2026年3月6日 09:00

フロンティア・レッドチーム、Firefoxのセキュリティ向上のためにMozillaと提携

宝玉的分享重要度42026年2月17日 09:00

59％のユーザーがより安価なモデルを選択：Sonnet 4.6の詳細解説

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む