The Zvi·2026年6月30日 00:05·約9分

WSJ の「中国が Anthropic に追いついた」という主張は明らかなデマであるとの指摘

#LLM #サイバーセキュリティ #自律的エクスプロイト #Anthropic #GLM

TL;DR

Zvi は WSJ の「中国が Anthropic の Mythos をサイバーセキュリティで追いついた」という報道を根拠のない虚偽と断じ、Mythos が持つ自律的な脆弱性発見・結合能力の独自性を強調して誤報を糾弾した。

AI深層分析2026年6月30日 01:05

重要/ 5段階

深度40%

キーポイント

WSJ の報道は重大な誤りである

著者は WSJ の「中国が Anthropic の Mythos を追いついた」という見出しと記事内容を、根拠のない虚偽であり、批判的に検証されないまま拡散された誤報であると断じている。

Mythos の真の独自性：自律的な脆弱性の発見

Mythos が他モデルと決定的に異なる点は、特定の脆弱性を指し示されることなく、自律的に大規模なコードから脆弱性を特定し、それらを組み合わせて完全なエクスプロイトを構築できる点にある。

他社モデルとの明確な性能差

GPT-5.6 Sol、Opus 4.8、GLM-5.2、Fable など他の主要モデルは、Mythos が行うような自律的な脆弱性の発見と結合を同等の難易度で実行できないと指摘されている。

公共の利益とリスク管理

著者は Fable や Sol の早期公開は公衆に利益をもたらすが、Mythos を現時点で一般公開することは重大な誤りであり、ホワイトハウスの対応にも類似した過ちが見られると懸念している。

条件付きの性能同等性

WSJ の主張する中国 AI が Anthropic のモデルと同等であるという点は、特定のセキュリティバグ発見タスクにおいて、十分なリソースを与えられ、コードの特定部分に指示された場合のみ成立する限定的な事実に基づくものである。

使用量増加の要因はコスト

中国製 AI システムの利用が急増している主な理由は技術的優位性ではなく、Microsoft などの企業が Copilot のような製品のコスト削減（enshittification）を求めて安価な代替案を探しているためである。

能力格差の相対的な縮小

中国モデルの最新バージョンが以前のバージョンより大幅に改善されたことで「格差が縮小した」と言えるが、これは米国の最新モデルとの絶対的な差がなくなったことを意味するものではなく、技術進歩に伴って常に上下動する相対的な状態である。

影響分析・編集コメントを表示

影響分析

この記事は、主要メディアによる誤った技術評価が公衆の認識や政策決定に与える影響を警告しており、特に「自律的エクスプロイト生成」という高度な AI 能力の実態とリスクについて、業界関係者への重要な警鐘となっている。WSJ のような権威あるメディアの誤報が拡散されることで、技術競争の現状が歪曲され、適切なセキュリティ対策や規制議論が阻害される可能性を指摘している。

編集コメント

技術的な詳細を無視したセンセーショナルなメディア報道に対し、実態に基づいた厳格な分析が求められている状況を示す重要な指摘です。特に「自律的攻撃能力」の定義とリスク評価は、今後の AI ガバナンスにおいて極めて重要な論点となります。

ウォール・ストリート・ジャーナルは、このように主張する完全に誤った見出しと、極めて誤解を招く記事を掲載し、もちろんこれはいつもの常套句によって批判もされずに増幅されました。

私は状況を説明するためのリンク先として使える場所を作るために、これを単独で投稿します。

ヘッドラインニュース

WSJの見出し（明白なナンセンス）：中国がサイバーセキュリティ分野でAnthropicに追いつき、AI競争をリセットした。

image

そんなことは。ありえ。ない。

この投稿はさらに、Claude Opus 4.8 も同様に Claude Mythos と『同等』であると明示的に主張しており、これはより明白に誤った主張です。

ウォール・ストリート・ジャーナルよ、恥を知れ。私はゲルマン・アムネシア（Gell-Mann Amnesia）を恐れています。彼らがこれほど重要なことをこれほど完全に間違えるなら、他のことについてはどうなるのでしょうか？

正確な報道や些細な異議に関する部分は省略します。

中心的な誤った主張を明確に否定することに焦点を当てることは重要と思われます。

哀れにも、ここで犯された過ちは、ホワイトハウス全体で現在行われている過ちと非常に似ており、それが特定の悪意ある行為者によって利用され、彼らが「Mythos Moment」に対して我々を準備不足のままにした大きな要因となっています。

GLM-5.2（これは確かに印象的なオープンモデルです）を完全に理解するためには、これを適切な文脈に位置づけた私の完全な解説記事をご覧ください。

Mythos が特別である理由を理解することが重要です。これはそれではありません。

Mythos が特別である理由

Mythos が特別なのは、選ばれた者だけがコード内の特定の脆弱性を特定できるからではありません。

Mythos が特別なのは、それが自律的に、大規模に、かつ指向されることなく脆弱性を特定でき、さらに自律的にさまざまな一見無関係な脆弱性を組み合わせて完全な動作するエクスプロイト（exploit）を構築できるからです。これは GPT-5.6 Sol や Opus 4.8、GPT-5.5、GLM-5.2 が同じレベルの難易度でできないことです。また、Fable もできません。この行為を「 Jailbreak（脱獄）」と呼ぶかどうかにかかわらず、既知のプロンプト戦略を用いても不可能です。

Mythos は、最終的にはより能力の低いモデルでも達成できることを行うことはできません。これは無限の数の猿がシェイクスピアを書き上げる可能性と同じ意味を持ちます。

Fable と Sol が現在の形で可能な限り早く一般利用可能になることは、サイバーセキュリティの問題を含め、公衆にとって正味の利益をもたらすでしょう。一方、現時点で Mythos を一般公開することは、期待される結果として重大な誤りとなります。

詳細な主張の検討

ロバート・マクミラン、ラファエル・フアング、アンリット・ラムクマー（WSJ、極めて誤解を招く報道）：中国の人工知能システムは、サイバーセキュリティの特定のシナリオにおいて、Anthropic の強力なモデル「Mythos」と同等のパフォーマンスを発揮した。この進展は、グローバルな技術競争の再設定と、ホワイトハウスによる米国 AI 政策の見直しに対する圧力をもたらす可能性がある。

神よ、私は「限定された不信」のルールが嫌いだ。なぜなら、見出しは嘘をついても許される（ここではまさにそうしている）し、この最初の段落自体はジャーナリズムの規範を破っていないからだ。

なぜか？技術的に言えば、「特定のサイバーセキュリティのシナリオにおいて」という条件付きで、Z.ai の GLM-5.2 は Mythos と同等のパフォーマンスを発揮できる。これらのシナリオは「簡単なもの」と呼べるだろう。

セキュリティ研究者たちは、今月中国の智譜 AI（別名 Z.ai）がリリースした新しい AI モデルが、セキュリティバグの発見においては最新の米国モデルと同等である一方、他のタスクでは Anthropic や OpenAI の製品にはまだ劣っていると述べている。

より正確に言えば、これは以前から繰り返し指摘されていた発見と同じだ。GPT-5.5 や Opus 4.8 だけでなく、いくつかのオープンソースモデルであっても、十分なリソースを与え、コードの特定の正しいサブセクションを指し示せば、あらゆるセキュリティ問題を特定できるという内容である。

私は、実務において GLM-5.2 が Mythos と同等にセキュリティバグを特定できるとする考え方に異議を唱えるが、再び、もし条件を十分に狭く定義すれば、Mythos が見つせるもののほとんどは、GLM-5.2 も適切に指し示されれば同様に発見できることになる。

さらに境界線ギリギリの話題へと進みます。ここではもはや一種の芸術形式となっています。

ロバート・マクミラン、ラファエル・フアング、アンリット・ラムクマー（WSJ）：全体として、米国トップモデルと中国企業が構築したモデルとの間の能力格差は著しく縮小しており、企業 runaway コスト（制御不能なコスト増大）を抑制しようとする動きの中で、中国製 AI システムの利用が急増しています。マイクロソフトを含む多数の企業が、自社のプラットフォーム上で中国製モデルを提供する方法を検討しており、この動向はテック企業の間のパワーバランスを変化させることが予想されています。

あらゆる種類の AI システムの利用が急増しているため、中国製モデルの利用も同様に急増しています。また、マイクロソフトが Copilot に中国製モデルの使用を検討したのは事実です。Copilot は彼らが提供する「エンシティフィード（腐敗した）」製品であり、コスト削減を図っているからです。「パワーバランスを変化させることが予想されている」という表現には規模の程度を示す数値が付随していないため、技術的には真実と言えます。

さらに、「能力格差が著しく縮小した」という主張についても、GLM-5.2 が以前の中国製モデルに比べて大幅に改善されたという意味では真実です。つまり、リリース当日に格差が一気に縮まったことになります。観測される格差は上下に連続的に変動し、その過程で現在予測（nowcast）における格差の谷間もいくつか現れるでしょう。

ここでの印象、および後々の印象として「時間とともにギャップは予測可能に縮小している」というのは、単なる誤りです。GLM-5.2 以前においては、DeepSeek の R1 や DeepSeek モーメント（これ自体も特に近いものではない）以降、サイクル数においても絶対的なクロックタイムにおいても、明らかにギャップは拡大していたと言えます。

ロバート・マクミラン、ラファエル・フアング、アンリット・ラムクマー（WSJ）：「中国は時間とともにギャップを小さくし続けることを確実にしている」と、サイバーセキュリティ企業 7AI の最高経営責任者であるリオール・ディブは述べています。

他人の発言を引用することは常に許されますが、ここではリオール・ディブの主張は誤りです。

ロバート・マクミラン、ラファエル・フアング、アンリット・ラムクマー（WSJ）：水曜日、中国のサイバーセキュリティ企業 360 セキュリティテクノロジーは、バグ発見ツール「Tulongfeng」という新しいツールをリリースしました。同社は、このツールのバグ発見能力が Mythos に匹敵するものであると述べています。これらの機能に多くの国家安全保障関係者や CEO が警戒感を抱いています。

再び、おそらく同社はそのように発言したのでしょう。しかし、なぜそれを信じる必要があるのでしょうか？このような誤りあるいは過度に誤解を招く主張は、常に繰り返されています。

有益な注記

私は、言われた有益な点を一つだけ指摘しておきたいです：

ロバート・マクミラン、ラファエル・フアング、アンリット・ラムクマー（WSJ）：「中国が必要とするチップを販売しながら Fable を禁止することは、中国が独自バージョンを開発するための贈り物になる」と、バイデン政権で輸出規制に関わったインスティテュート・フォー・プログレスのシンクタンクにおける著名な技術フェローであるサイフ・カーンは語った。彼はまた、米国は自国のサイバー防御を強化できるうちに、Mythos や同等のモデルの使用を最大化する必要があると付け加えた。

なるほど、その段落は正確で重要なので、少なくともそこだけは正しいですね。

全体的な印象は極めて誤りである

しかし、見出しを除いたとしても記事全体の印象としては、中国が着実に追いついているか、すでに追いついてしまっているというものであり、ホワイトハウスがリリースの延期と制限を決定した理由となる能力において Mythos に匹敵しているどころか、中国のオープンモデルは AI 利用を確実に席巻しつつあるという内容です。これらすべては完全に誤りです。

ティム・フアがここで非常に辛辣な言葉を浴びせた人物の一人だったのには理由があります。

これは過去にも起こり、再び起こる出来事である

これは WSJ による AI に関する重要な誤った見出しが初めてではありませんでした。

2026 年 6 月 4 日付の「Anthropic が AI 開発のグローバル一時停止を要請、『自己改善リスク』を警告」という記事をご覧ください。Anthropic はそのような提案に極めて近いものを出したわけではありません。過去の事例では、本文自体は問題なく、誤りがあったのは見出しのみでしたが、読者に与える印象としては非常に悪い誤解を生むものでした。

私たちはまた、さまざまなソースから「中国のモデルが追いついた」という見出しにかなり苦しめられてきました。いつかそれが真実になる日もあるかもしれません。しかし、今日はその日ではありません。

この件に関心を持っていただき、ありがとうございます。

原文を表示

The Wall Street Journal printed an outright false headline and heavily misleading story claiming this, which of course was uncritically amplified by the usual suspects.

I post this now on its own so that we have a place to link to, to explain the situation.

Headline News

WSJ Headline (Obvious Nonsense): China Has Matched Anthropic in Cybersecurity, Resetting AI Race.

That. Did. Not. Happen.

The post even claims, explicitly, that Claude Opus 4.8 similarly ‘matches’ Claude Mythos, a claim which is even more obviously false.

Shame upon the Wall Street Journal. I fear Gell-Mann Amnesia. If they can get something as important as this so completely wrong, what about everything else?

I am skipping over the parts that involve accurate reporting, or minor quibbles.

It seems important to focus on clearly debunking the central false claims.

Alas, the mistakes made here very much rhyme with mistakes being made throughout all this by the White House, and that get latched onto by certain bad actors, who have played a large part in leaving us unprepared for the Mythos Moment.

For a full understanding of GLM-5.2, which is indeed an impressive open model, here is my full coverage of that release, placing it in proper context.

It is important to understand what makes Mythos special. This is not it.

What Makes Mythos Special

What makes mythos special is not that only the chosen one can identify any given vulnerability in code.

What makes Mythos special is that it can identify vulnerabilities autonomously, at scale, without being pointed at them, and can then autonomously string together a variety of seemingly unrelated vulnerabilities into full working exploits.

This is the thing that GPT-5.6 Sol cannot do, and that Opus 4.8 and GPT-5.5 cannot do, and that GLM-5.2 also cannot do, at anything like the same level of difficulty. It is also a thing that Fable cannot do, not with any known prompting strategy, whether or not you call this a ‘jailbreak.’

Mythos cannot do anything that you could not eventually do with a less capable model, in the same sense that an infinite number of monkeys can write Shakespeare.

The public would net benefit if both Fable and Sol were generally available in their current forms as soon as possible, including on the question of cybersecurity. Whereas releasing Mythos to the public at this time would in expectation be a serious error.

Going Over The Detailed Claims

Robert McMillan, Raffaele Huang and Amrith Ramkumar (WSJ, being super misleading): Chinese artificial-intelligence systems have matched the performance of Anthropic’s powerful model Mythos in some cybersecurity scenarios, a development poised to reset the global tech race and pressure the White House in its overhaul of U.S. AI policy.

My lord, I hate the rules of bounded distrust, because the headline is allowed to lie (as it does here) and this first paragraph does not actually break journalistic norms.

Why? Technically speaking, yes, ‘in some cybersecurity scenarios’ Z.ai’s GLM-5.2 can match the performance of Mythos. Those scenarios could be called ‘the easy ones.’

Security researchers said that a new AI model, released this month by China’s Zhipu AI, also known as Z.ai, can match the latest U.S. models when it comes to finding security bugs, although it still lags behind Anthropic’s and OpenAI’s products in other tasks.

More precisely, this is the same finding that was harped on previously about how various models, not only GPT-5.5 and Opus 4.8 but also some open ones, could always identify any given security issue in code, provided you give them sufficient resources and point them at the particular correct subsection of code.

I would challenge the idea that GLM-5.2 can match Mythos in security bug identification in practice, but again if we sufficiently narrowly define it, then most of the things that Mythos can find GLM-5.2 can, if pointed correctly, also find.

Onward to more skirting of the line. It’s almost an art form here.

Robert McMillan, Raffaele Huang and Amrith Ramkumar (WSJ): Overall, the capability gap between top U.S. models and those built by Chinese companies has narrowed significantly, and use of Chinese AI systems has surged as businesses seek to rein in runaway costs. A host of companies, including Microsoft, are weighing how they can offer Chinese models on their platforms, a development that is set to alter the balance of power among tech companies.

Use of AI systems of all types has surged, so Chinese model usage has as well. And yes, Microsoft considered using a Chinese model for Copilot, because Copilot is the enshittified product they offer and they want to save money. And ‘set to alter the balance of power’ does not have a magnitude attached, so it technically is true.

Then there’s the claim that the ‘capability gap has narrowed significantly,’ which is true in the sense that GLM-5.2 is considerably better than previous Chinese models, which means the gap shrunk a bunch the day of its release. The observed gap is going to continuously move both up and down, and there will be some troughs along the way in terms of nowcast gaps.

The impression here and later, that the gap is predictably shrinking over time, is simply false. Before GLM-5.2, I would say the gap had clearly gotten larger, both in terms of number of cycles and also absolute clock time, since DeepSeek’s R1 and the DeepSeek moment, which itself was not especially close.

Robert McMillan, Raffaele Huang and Amrith Ramkumar (WSJ): “China is making sure that the gap becomes smaller and smaller over time,” said Lior Div, chief executive officer of the cybersecurity company 7AI.

You’re always allowed to quote someone else, but Lior Div is incorrect here.

Robert McMillan, Raffaele Huang and Amrith Ramkumar (WSJ): On Wednesday, the Chinese cybersecurity company 360 Security Technology released a new bug-finding tool called Tulongfeng. The company said it was comparable to Mythos in finding bugs. Those capabilities have alarmed many national-security officials and CEOs.

Again, the company presumably did say this. But why would you believe them? This kind of false or heavily misleading claim is made all the time.

One Helpful Note

I do want to note one helpful thing that was said:

Robert McMillan, Raffaele Huang and Amrith Ramkumar (WSJ): “Banning Fable while selling chips China needs to develop its own version is a gift to China,” said Saif Khan, a distinguished technology fellow at the Institute for Progress think tank who worked on export restrictions in the Biden administration. The U.S. needs to maximize the use of Mythos and comparable models to harden its cyber defenses while it can, he added.

Okay, yeah, that paragraph is accurate and important, so at least there’s that.

The Overall Impression Is Extremely Wrong

But the overall impression of the article, even excluding the headline, is that China is steadily catching up if it has not already done so, and even that it has matched Mythos where it counts and in the capabilities that caused the White House to delay and restrict its release, and that Chinese open models are steadily taking over AI usage. All of this is completely wrong.

There is a reason that Tim Hua was among those having very unkind words here.

All Of This Has Happened Before And Will Happen Again

This was not the first rather importantly false headline in the WSJ about AI.

See ‘Anthropic Urges Global Pause in AI Development, Flags ‘Self-Improvement Risk,’ from June 4, 2026. Anthropic came very short of any such suggestion. In that past case, the body of the article was fine, it was only the headline that was false, but that is one hell of a false impression to put onto readers.

We also have had to suffer quite a lot of ‘Chinese models have caught up’ headlines, from various sources. Some day one of them might be true. Today is not that day.

Thank you for your attention to this matter.

この記事をシェア

The Verge AI重要度42026年6月27日 09:33

Anthropic の「Mythos 5」が政府との交渉を経て一部組織向けに再稼働

TechCrunch AI重要度42026年6月30日 03:10

Anthropic とカリフォルニア州政府が契約を結び、Claude を半額で使用可能に

The Verge AI重要度42026年6月30日 01:00

議員らが AI 企業による健康データ販売を禁止する法案を提案

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む