読み込み中…

Ars Technica AI·2026年6月10日 04:20·約2分

Anthropic、Fable 5 モデルの議論禁止トピックを公表

#LLM #AI セーフティ #Anthropic #Claude Fable 5 #悪用防止

TL;DR

Anthropic は新モデル「Claude Fable 5」の公開に際し、サイバーセキュリティや生物・化学分野における悪用リスクを回避するため、特定の機密トピックへの回答を制限する厳格な安全装置を導入した。

AI深層分析2026年6月10日 06:02

重要/ 5段階

深度40%

キーポイント

新モデル「Fable 5」の発表と性能向上

Anthropic は初の「Mythos-class」モデルである Claude Fable 5 を公開し、以前の最高峰モデル「Opus」を上回る全体的な能力を有すると発表した。

悪用防止のための厳格な安全フィルター

サイバーセキュリティ、生物学、化学などの分野において、悪意のあるアクターへの支援（「uplift」）を防ぐため、これらのトピックに関する回答を制限する safeguards を導入した。

ユーザー体験と安全性のトレードオフ

過剰な防御により無害なリクエストも拒否されるケース（偽陽性）が約 5% 発生する可能性があるが、重大な危害を防ぐためにこれを許容すると発表している。

機密トピックへのフォールバック機構

敏感な話題に対するクエリは、より保守的な「Claude Opus 4.8」モデルに振り分けられ、ユーザーにその旨が警告される仕組みとなっている。

重要な引用

surpasses its previous frontier Opus models in overall capabilities

potential impact to 'uplift' malicious actors

stricter than ideal, meaning the system may occasionally refuse 'harmless requests'

avoid situations where Mythos could give malicious actors assistance in 'causing serious harm that they couldn't have received from other sources'

影響分析・編集コメントを表示

影響分析

この発表は、大規模言語モデル（LLM）の開発において、単なる性能競争から「安全性と倫理的制約」が最優先事項へとパラダイムシフトしていることを示す重要な指標です。特に高度な能力を持つモデルほど、悪用防止のための厳格なガバナンスが必要であるという業界全体の認識を強化し、今後の AI 規制や開発ガイドラインに大きな影響を与える可能性があります。

編集コメント

性能の飛躍的向上を謳いながら、あえて公開範囲と回答内容を制限する姿勢は、AI 業界が直面している「強力な AI の管理」課題への現実的な対応策と言えます。

Anthropic は火曜日に、Claude Fable 5 を公にリリースしました。これは同社が「Mythos クラス」の最初のモデルとして位置づけるもので、全体的な能力において以前の最前線である Opus モデルを上回るとしています。しかし、本日のモデル発表には、サイバーセキュリティ、生物学、化学といったトピックに関する問い合わせへの回答を防止するためのセーフガードが組み込まれています。同社はこれらの分野における「悪意ある行為者の強化」につながる潜在的な影響について公に懸念を示しているためです。

Anthropic によると、Fable 5 は現在、数ヶ月間にわたる「Mythos プレビュー」期間を終了して本日リリースされる Mythos 5 と同じ基盤モデル上で動作します。ただし、Mythos 5 は既存の Project Glasswing を通じて信頼できると判断された「限られたサイバー防衛者」グループのみが利用可能です。一方、一般にアクセス可能な Fable 5 は、特定の機密トピックに関する問い合わせを以前の Claude Opus 4.8 モデルへ誘導し、その際にユーザーに警告を表示するように設計されています。

image

Fable 5 の数多くのベンチマーク改善の主張のうち、サイバーセキュリティに関連するものは特に大きな飛躍でした。

クレジット:

Anthropic

Anthropic はこれらのセーフガードを「理想的な基準よりも厳格になるように調整した」と述べており、その結果としてシステムが時折、「無害なリクエスト」さえも拒否することがあると認めています。これは一般ユーザーにとって時には苛立たしいこととなる可能性があります。しかし Anthropic によると、このような誤検出（false positive）はテスト中のセッション全体の 5% 未満で発生しており、Mythos が悪意のある行為者に「他の源泉では得られなかった深刻な危害」を与えるための支援を行ってしまうような状況を回避するために、そのコストは価値があるとしています。

記事全文を読む

原文を表示

Anthropic Tuesday publicly released Claude Fable 5, its first "Mythos-class" model that it says surpasses its previous frontier Opus models in overall capabilities. But the model's launch today comes with safeguards designed to prevent it from answering queries on topics like cybersecurity, biology, and chemistry, where the company has publicly worried about its potential impact to "uplift" malicious actors.

Anthropic says Fable 5 operates on the "same underlying model" as Mythos 5, which is coming out of its monthslong "Mythos Preview" period today, but only for "a small group of cyberdefenders" judged trustworthy through the existing Project Glasswing. Unlike Mythos 5, though, the publicly accessible Fable 5 is designed to funnel queries on certain sensitive topics to the earlier Claude Opus 4.8 model and to warn the user when this is happening.

Among the many claimed benchmark improvements for Fable 5, the one related to cybersecurity was a particularly large jump.

Credit:

Anthropic

Anthropic said it has tuned these safeguards to be "stricter than ideal," meaning the system may occasionally refuse "harmless requests" in a way that it acknowledges may be frustrating for regular users. But Anthropic says such false positives come up in less than five percent of all sessions in testing, and were worth it to avoid situations where Mythos could give malicious actors assistance in "causing serious harm that they couldn’t have received from other sources."

Read full article

Comments

この記事をシェア

TechCrunch AI2026年7月24日 04:00

Anthropic、Claude の音声モードを強化

MarkTechPost重要度42026年7月24日 03:07

Anthropic、請求モデルと異なるモデルを出力

One Useful Thing重要度42026年7月24日 03:05

AI活用ガイド：何に使うべきか

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Ars Technica AI·2026年6月10日 04:20·約2分

Anthropic、Fable 5 モデルの議論禁止トピックを公表

#LLM #AI セーフティ #Anthropic #Claude Fable 5 #悪用防止

TL;DR

AI深層分析2026年6月10日 06:02

重要/ 5段階

深度40%

キーポイント

新モデル「Fable 5」の発表と性能向上

Anthropic は初の「Mythos-class」モデルである Claude Fable 5 を公開し、以前の最高峰モデル「Opus」を上回る全体的な能力を有すると発表した。

悪用防止のための厳格な安全フィルター

ユーザー体験と安全性のトレードオフ

機密トピックへのフォールバック機構

敏感な話題に対するクエリは、より保守的な「Claude Opus 4.8」モデルに振り分けられ、ユーザーにその旨が警告される仕組みとなっている。

重要な引用

surpasses its previous frontier Opus models in overall capabilities

potential impact to 'uplift' malicious actors

stricter than ideal, meaning the system may occasionally refuse 'harmless requests'

avoid situations where Mythos could give malicious actors assistance in 'causing serious harm that they couldn't have received from other sources'

影響分析・編集コメントを表示

影響分析

編集コメント

image

Fable 5 の数多くのベンチマーク改善の主張のうち、サイバーセキュリティに関連するものは特に大きな飛躍でした。

クレジット:

Anthropic

記事全文を読む

原文を表示

Among the many claimed benchmark improvements for Fable 5, the one related to cybersecurity was a particularly large jump.

Credit:

Anthropic

Read full article

Comments

この記事をシェア

TechCrunch AI2026年7月24日 04:00

Anthropic、Claude の音声モードを強化

MarkTechPost重要度42026年7月24日 03:07

Anthropic、請求モデルと異なるモデルを出力

One Useful Thing重要度42026年7月24日 03:05

AI活用ガイド：何に使うべきか

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Anthropic、Fable 5 モデルの議論禁止トピックを公表

キーポイント

重要な引用

影響分析

編集コメント

関連記事

Anthropic、Fable 5 モデルの議論禁止トピックを公表

キーポイント

重要な引用

影響分析

編集コメント

関連記事