Anthropic Red Team·2026年3月6日 09:00·約1分で読める

Mozillaとの提携でFirefoxのセキュリティを向上

#AIセキュリティ #脆弱性発見 #LLM応用 #Anthropic #Claude #企業協業

TL;DR

AnthropicのAIモデルClaude OpusがMozilla研究者との協業で、2週間でFirefoxの22件の脆弱性を発見した。

AI深層分析2026年4月13日 06:41

注目/ 5段階

深度40%

キーポイント

AIによるセキュリティ脆弱性発見

Claude OpusがFirefoxのセキュリティ脆弱性を22件発見し、AIのセキュリティ分野での実用性を示した。

企業間協業の成果

AnthropicとMozillaの研究者が協力し、AIを活用したセキュリティテストを実施した。

短期間での成果

わずか2週間という短期間で多数の脆弱性を発見し、AIの効率性を実証した。

影響分析・編集コメントを表示

影響分析

この記事は、AIがセキュリティ分野で実用的な価値を発揮する可能性を示しており、AIの応用範囲拡大に寄与する。一方で、具体的な脆弱性の詳細や修正状況が不明なため、即時的な業界への影響は限定的と言える。

編集コメント

AIのセキュリティ分野での実用性を示す好例だが、脆弱性の重大度や修正状況などの詳細情報が不足しているため、より深い分析には追加情報が必要。

Mozillaの研究者との共同研究において、Claude Opus 4.6は2週間で22件のFirefox脆弱性を発見しました。

原文を表示

March 6, 2026

Evyatar Ben Asher, Keane Lucas, Nicholas Carlini, Newton Cheng, and Daniel Freeman

AI models can now independently identify high-severity vulnerabilities in complex software. As we recently

documented, Claude found more than 500 zero-day

vulnerabilities (security flaws that are unknown to the software’s maintainers) in well-tested

open-source software.

In this post, we share details of a collaboration with researchers at Mozilla in which Claude Opus 4.6

discovered 22 vulnerabilities over the course of two weeks. Of these, Mozilla assigned 14 as high-severity

vulnerabilities—almost a fifth of all high-severity Firefox vulnerabilities that were remediated in

In other words: AI is making it possible to detect severe security vulnerabilities at highly

accelerated speeds.

As part of this collaboration, Mozilla fielded a large number of reports from us, helped us understand what

types of findings warranted submitting a bug report, and shipped fixes to hundreds of millions of users in

Firefox 148.0. Their partnership,

and the technical lessons we learned, provides a model for how AI-enabled security researchers and

maintainers can work together to meet this moment.

From model evaluations to a security partnership

In late 2025, we noticed that Opus 4.5 was close to solving all tasks in CyberGym, a benchmark that tests whether LLMs can reproduce known

security vulnerabilities. We wanted to construct a harder and more realistic evaluation that contained a

higher concentration of technically complex vulnerabilities, like those present in modern web browsers. So

we built a dataset of prior Firefox common vulnerabilities and exposures (CVEs) to see if Claude could

reproduce those.

We chose Firefox because it’s both a complex codebase and one of the most well-tested and secure open-source

projects in the world. This makes it a harder test of AI’s ability to find novel security vulnerabilities

than the open-source software we previously used to test our models. Hundreds of millions of users rely on

it daily, and browser vulnerabilities are particularly dangerous because users routinely encounter untrusted

content and depend on the browser to keep them safe.

Our first step was to use Claude to find previously identified CVEs in older versions of the Firefox

codebase. We were surprised that Opus 4.6 could reproduce a high percentage of these historical CVEs, given

that each of them took significant human effort to uncover. But it was still unclear how much we should

trust this result because it was possible that at least some of those historical CVEs were already in

Claude’s training data.

So we tasked Claude with finding novel vulnerabilities in the current version of Firefox—bugs that by definition, can’t have been

reported before. We focused first on Firefox’s JavaScript engine but then expanded to other areas of the

browser. The JavaScript engine was a convenient first step: it’s an independent slice of Firefox’s codebase

that can be analyzed in isolation, and it’s particularly important to secure, given its wide attack

surface (it processes untrusted external code when users browse the web).

After just twenty minutes of exploration, Claude Opus 4.6 reported that it had identified a Use After Free (a

type of memory vulnerability that could allow attackers to overwrite data with arbitrary malicious content)

in the JavaScript engine. One of our researchers validated this bug in an independent virtual machine with

the latest Firefox release, then forwarded it to two other Anthropic researchers, who also validated the

bug. We then filed a bug report in Bugzilla,

Mozilla’s issue tracker, along with a description of the vulnerability and a proposed patch (written by

Claude and validated by the reporting team) to help triage the root cause.

In the time it took us to validate and submit this first vulnerability to Firefox, Claude had already

discovered fifty more unique crashing inputs. While we were triaging these crashes, a researcher from

Mozilla reached out to us. After a technical discussion about our respective processes and sharing a few

more vulnerabilities we had manually validated, they encouraged us to submit all of our findings in

bulk without validating each one, even if we weren’t confident that all of the crashing test cases had

security implications. By the end of this effort, we had scanned nearly 6,000 C++ files and submitted

a total of 112 unique reports, including the high- and moderate-severity vulnerabilities mentioned above.

Most issues have been fixed in Firefox 148, with the remainder to be fixed in upcoming releases.

When doing this kind of bug hunting in external software, we’re always conscious of the fact that we may have

missed something critical about the codebase that would make the discovery a false positive. We try to do

the due diligence of validating the bugs ourselves, but there’s always room for error. We are extremely

appreciative of Mozilla for being so transparent about their triage process, and for helping us adjust our

approach to ensure we only submitted test cases they cared about (even if not all of them ended up being

relevant to security). Mozilla researchers have since started experimenting with Claude for security

purposes internally.

From identifying vulnerabilities to writing primitive exploits

To measure the upper limits of Claude’s cybersecurity abilities, we also developed a new evaluation to

determine whether Claude was able to exploit any of the bugs we

discovered. In other words, we wanted to understand whether Claude could also develop the sorts of tools

that a hacker would use to take advantage of these bugs to execute malicious code.

To do this, we gave Claude access to the vulnerabilities we’d submitted to Mozilla and asked Claude to create

an exploit focusing on each one. To prove it had successfully exploited a vulnerability, we asked Claude to

demonstrate a real attack. Specifically, we required it to read and write a local file in a target system,

as an attacker would.

We ran this test several hundred times with different starting points, spending approximately $4,000 in

API credits. Despite this, Opus 4.6 was only able to actually turn the vulnerability into an exploit in two

cases. This tells us two things. One, Claude is much better at finding these bugs than it is at exploiting

them. Two, the cost of identifying vulnerabilities is an order of magnitude cheaper than creating an exploit

for them. However, the fact that Claude could succeed at

automatically developing a crude browser exploit, even if only in a few cases, is concerning.

“Crude” is an important caveat here. The exploits Claude wrote only worked on our testing environment, which

intentionally removed some of the security features found in modern browsers. This includes, most

importantly, the sandbox, the purpose of which is to

reduce the impact of these types of vulnerabilities. Thus, Firefox’s “defense in depth” would have been

effective at mitigating these particular exploits. But vulnerabilities that escape the sandbox are not

unheard of, and Claude’s attack is one necessary component of an end-to-end exploit. You can read more about

how Claude developed one of these Firefox exploits on our Frontier Red Team blog.

What’s next for AI-enabled cybersecurity

These early signs of AI-enabled exploit development underscore the importance of accelerating the

find-and-fix process for defenders. Towards that end, we want to share a few technical and procedural best

practices we’ve found while performing this analysis.

First, when researching “patching agents,” which use LLMs to develop and validate bug fixes, we have

developed a few methods we hope will help maintainers use LLMs like Claude to triage and address security

reports faster.[1]

In our experience, Claude works best when it's able to check its own work with another tool. We refer to this

class of tool as a “task verifier”: a trusted method of confirming whether an AI agent’s output actually

achieves its goal. Task verifiers give the agent real-time feedback as it explores a codebase, allowing it

to iterate deeply until it succeeds.

Task verifiers helped us discover the Firefox vulnerabilities described above,[2] and in separate research, we’ve found that they’re also useful for fixing bugs. A good patching agent needs to verify at least two

things: that the vulnerability has actually been removed, and that the program’s intended functionality has

been preserved. In our work, we built tools that automatically tested whether the original bug could still

be triggered after a proposed fix, and separately ran test suites to catch regressions (a change that

accidentally breaks something else). We expect maintainers will know best how to build these verifiers for

their own codebases; the key point is that giving the agent a reliable way to check both of these properties

dramatically improves the quality of its output.

We can’t guarantee that all agent-generated patches that pass these tests are good enough to merge

immediately. But task verifiers give us increased confidence that the produced patch will fix the specific

vulnerability while preserving program functionality—and therefore achieve what’s considered to be the

minimum requirement for a plausible patch. Of course, when reviewing AI-authored patches, we recommend that

maintainers apply the same scrutiny they’d apply to any other patch created by an external author.

Zooming out to the process of submitting bugs and patches: we know that maintainers are underwater.

Therefore, our approach is to give maintainers the information they need to trust and verify reports. The

Firefox team highlighted three components of our submissions that were key for trusting our results:

accompanying minimal test cases,

detailed proofs-of-concept, and

candidate patches.

We strongly encourage researchers who use LLM-powered vulnerability research tools to include similar

evidence of verification and reproducibility when submitting reports based on the output of such tooling.

We’ve also published our

Coordinated Vulnerability Disclosure operating principles for vulnerabilities discovered by

Anthropic, where we describe the procedures

we will use when working with maintainers. Our processes here follow standard industry norms for the time

being, but as models improve we may need to adjust our processes to keep pace with capabilities.

The urgency of the moment

Frontier language models are now world-class vulnerability researchers. On top of the 22 CVEs we identified

in Firefox, we’ve used Claude Opus 4.6 to discover vulnerabilities in other important software projects like

the Linux kernel. Over the coming weeks and months, we will continue to report on how we’re using our models

and working with the open-source community to improve security.

Opus 4.6 is currently far better at identifying and fixing vulnerabilities than at exploiting them. This

gives defenders the advantage. And with the recent release of Claude Code Security in limited research

preview, we’re bringing vulnerability-discovery (and patching) capabilities directly to customers and

open-source maintainers.

But looking at the rate of progress, it is unlikely that the gap between frontier models’ vulnerability

discovery and exploitation abilities will last very long. If and when future language models break through

this exploitation barrier, we will need to consider additional safeguards or other actions to prevent our

models from being misused by malicious actors.

We urge developers to take advantage of this window to redouble their efforts to make their software more

secure. For our part, we plan to significantly expand our cybersecurity efforts, including by working with

developers to search for vulnerabilities (following the CVD process outlined above), developing tools to

help maintainers triage bug reports, and directly proposing patches.

If you’re interested in supporting our security efforts—writing new

scaffolds to identify vulnerabilities in open-source software; triaging, patching, and reporting

vulnerabilities; and developing a robust CVD process for the AI

era—apply to work at Anthropic

here.

Footnotes

[1] All the advice shared here is based on our use of Claude, but it

should apply to whichever LLM you prefer.

[2] Which Mozilla patched independently.

この記事をシェア

The Decoder★42026年4月14日 22:53

Claude Mythosは欧州のAI安全体制への警告である

Anthropic社は、人間よりも優れたセキュリティ脆弱性発見能力を持つAIモデル「Claude Mythos」へのアクセスを制限している。欧州当局は同システムをほぼ監視できず、英国は独自テストを実施中で、構造的問題が露呈している。

The Decoder★42026年4月22日 21:54

不正アクセス者がAnthropicの制限付きAIモデル「Mythos」に侵入

不正アクセス者グループが、Anthropicの制限付きAIモデル「Mythos」に不正アクセスしたとBloombergが報じた。

The Register AI/ML★32026年4月27日 17:30

アンストロピックの魔法のようなコード検査ツール：現時点ではチェダーよりスイスチーズの方が穴が多い

AnthropicはAIコードセキュリティモデル「Mythos」を公開した。しかし、このシステムは人間が教えた脆弱性しか検出できず、その能力には限界がある。名称の期待と現実のギャップが指摘されている。

ニュース一覧に戻る元記事を読む