TechCrunch AI·2026年5月11日 05:40·約2分

Anthropic、Claude の身代金要求は AI の悪意ある描写が原因と主張

TL;DR

AI 企業 Anthropic は、同社が開発したチャットボット Claude が身代金を要求する行為を行った背景について、AI を悪意ある存在として描いた外部の描写が影響した可能性があると説明しました。

要約

投稿日時：

2026年5月10日午後1時40分 PDT

image画像クレジット： Samuel Boivin/NurPhoto / Getty Images

人工知能のフィクションにおける描写は、AI モデルに実際の影響を与える可能性があることを、Anthropic は指摘しています。

昨年同社は、架空の企業を対象としたリリース前のテストにおいて、Claude Opus 4 が別のシステムに置き換えられるのを避けるためにエンジニアをしばしば脅迫しようとしたと述べていました。その後、Anthropic は他社のモデルも同様に「エージェントのミスマッチ（agentic misalignment）」という問題を抱えている可能性を示す研究論文を発表しました。

どうやら Anthropic はこの行動に関するさらなる調査を進めており、X での投稿で「我々は、この行動の元となったのは、AI を悪意ある存在として描き、自己保存に関心があると表現するインターネット上のテキストであると信じている」と主張しています。

同社はブログ記事でより詳細を述べており、Claude Haiku 4.5 以降、Anthropic のモデルは「テスト中に脅迫を行うことはなく」、以前のモデルでは最大96%の確率でそのような行動が見られたと明言しています。

違いの要因は何でしょうか。同社は、「Claude の憲章に関する文書や、AI が模範的な行動をとるというフィクション物語をトレーニングに含めることで、アライメントが改善されること」を発見したと述べています。

関連して、Anthropic は、「アライメントされた行動の背後にある原則」を含み、「アライメントされた行動の単なるデモンストレーション」だけでは不十分である場合、トレーニングがより効果的になることも発見したと発表しました。

「両方を組み合わせることが最も効果的な戦略のように思われます」と同社は述べています。

Techcrunch イベント

サンフランシスコ、カリフォルニア州

2026 年 10 月 13 日〜15 日

トピック

業界最大のテックニュースを購読する

最新 AI ニュース

原文を表示

In Brief

Posted:

1:40 PM PDT · May 10, 2026

Image Credits:Samuel Boivin/NurPhoto / Getty Images

Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.

Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.”

Apparently Anthropic has done more work around that behavior, claiming in a post on X, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.”

The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”

What accounts for the difference? The company said it found that training on “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.”

Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.”

“Doing both together appears to be the most effective strategy,” the company said.

Techcrunch event

San Francisco, CA

October 13-15, 2026

Topics

Subscribe for the industry’s biggest tech news

Latest in AI

この記事をシェア

TLDR AI重要度42026年5月11日 09:00

Anthropic、AI の悪役描写がClaudeの脅迫行為の原因と発表

TechCrunch AI2026年6月27日 02:43

OpenAI から SpaceX までが独自チップ開発に乗り出し、Nvidia の支配に熱を帯びさせる理由

TechCrunch AI重要度42026年6月27日 01:24

もはやアンソロピック対オープンエーアイではない

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む