The Verge AI·2026年5月6日 01:52·約3分

メタ、主要出版社から著作権侵害で集団訴訟を起こされる

#LLM #著作権法 #データトレーニング #メタ #法的規制

TL;DR

メタは、LLaMA モデルのトレーニング過程における大規模な著作権侵害を主張する主要出版社 5 社と著者 1 名から集団訴訟を起こされ、AI 業界全体に法的リスクが波及している。

AI深層分析2026年5月6日 02:05

重要/ 5段階

深度40%

キーポイント

大規模な著作権侵害訴訟の発生

メタに対し、Macmillan、McGraw-Hill、Elsevier、Hachette、Cengage の主要出版社 5 社と著者 1 名が、LLaMA モデル学習における「歴史上最大規模の著作権侵害」を理由に集団訴訟を起こした。

生成 AI トレーニングデータの法的位置づけ

ニュースソースであるニューヨーク・タイムズや本訴訟は、大規模言語モデル（LLM）の学習プロセスで無断で著作権素材を使用することの法的限界を問う重要な事例となっている。

業界全体への波及リスク

メタのような大手テック企業に対する訴訟は、他社の AI 開発モデルやデータ収集方針にも影響を与える可能性があり、業界全体のコンプライアンス体制の見直しを迫る。

影響分析・編集コメントを表示

影響分析

本件は、生成 AI の発展を阻害する最大の法的障壁となる可能性があり、AI モデルの開発コストや学習データの入手方法に根本的な変更を迫る重大な転換点です。メタのみならず、同様の手法でモデルを構築している他社企業も訴訟リスクに直面し、業界全体で著作権クリアランスの強化が急務となります。

編集コメント

AI 業界の成長を加速させるデータ利用と、著作権保護のバランスが問われる歴史的な訴訟です。メタの対応次第で、今後の生成 AI の開発ルールが再定義される可能性があります。

メタ社は、Llama AI モデルのトレーニング時に「歴史上最も大規模な著作権侵害の一つを行った」と主張し、5 社の主要出版社と 1 人の著者によって提起された集団訴訟の対象となっています。これはニューヨーク・タイムズが以前報じた内容です。訴状において、マクミラン、メグロウ・ヒル、エルゼビア、ハチェット、ケンゲージの各出版社および著者のスコット・トゥーローは、メタ社が許可なく彼らの書籍や学術論文を「繰り返しコピーした」と主張しています。

この訴訟では、メタ社が LibGen、Anna's Archive、Sci-Hub、Sci-Mag など「悪名高い海賊版サイト」から著作権のある作品を意図的に抜き取り、それを AI モデルに読み込ませたと非難されています。また、メタ社は Common Crawl データセットに含まれる情報を用いて Llama をトレーニングしたとされ、このデータセットには「無許可のコピーされた著作権作品が溢れている」と主張されています。その結果、Llama は著作権のある素材の「原文そのまま、またはほぼ原文のままの代替文」を出力します：

例えば、ジェームズ・スチュワート著『Calculus: Early Transcendentals』第 9 版（ケンゲージ社のベストセラー教科書）からの短い 2 文をプロンプトとして入力すると、Llama はそのセクションの続きを単語ごとに一字一句そのまま再現し始めます。

複数の著者が、メタに対して著作権侵害の疑いで訴訟を起こしており、これにより同社内部での「海賊版と知られているデータセットを使用したというメディア報道への対応」に関する議論が明らかになりました。昨年、ある連邦裁判所はこれらの訴訟のうち一件でメタの勝訴を認める判決を下しましたが、その際、「言語モデルのトレーニングに著作権のある資料を使用することが合法であるとする主張を支持するものではない」と指摘しました。

著者たちのグループもまた、Anthropic を著作権侵害で提訴しました。連邦裁判所は、許可なく法的に購入した書籍を AI モデルのトレーニングに使用することはフェアユース（公正利用）と判断しましたが、Anthropic が海賊版化したとされる「数百万」の作品をめぐる集団訴訟を進めることを著者たちに認めています。Anthropic は昨年、この集団訴訟を和解するために作家らに 15 億ドルを支払することに合意しました。

トゥロウ氏と出版社グループは、Meta を相手取り損害賠償を求めるとともに、同社に対して alleged な違法行為の差し止め命令を出すよう裁判所に要請しています。また、同社が Llama AI モデル（大規模言語モデル）のトレーニングに使用した書籍、学術論文、およびその他の著作権保護対象作品の一覧を提供するよう命じることも求めています。

「AI は個人や企業にとって変革的なイノベーション、生産性、創造性を推進しており、裁判所も著作権素材を AI のトレーニングに用いることがフェアユースに該当すると正しく判断しています」と、Meta のスポークスマンであるデイブ・アーノルド氏は The Verge への電子メールコメントで述べています。「私たちはこの訴訟に積極的に戦うつもりです」。

0 コメント

このストーリーのトピックや著者をフォローして、パーソナライズされたホームページフィードで類似のコンテンツをもっとご覧いただき、メール更新を受け取ってください。

エマ・ロス

原文を表示

Meta is facing a class action lawsuit filed by five major book publishers and one author over claims the company “engaged in one of the most massive infringements of copyrighted materials in history” when training its Llama AI models, as reported earlier by The New York Times. In their suit, Macmillan, McGraw Hill, Elsevier, Hachette, Cengage, and author Scott Turow allege that Meta “repeatedly copied” their books and journal articles without permission.

The lawsuit accuses Meta of knowingly ripping copyrighted work from “notorious pirate sites,” such as LibGen, Anna’s Archive, Sci-Hub, Sci-Mag, and others, and then feeding that material into its AI model. It also claims that Meta trained Llama with information inside the Common Crawl dataset, which is allegedly “full of unauthorized copies of copyrighted works.” As a result, Llama “outputs verbatim and near-verbatim substitutes” of copyrighted material:

For example, when prompted with two brief sentences from Cengage’s best-selling textbook, Calculus: Early Transcendentals, 9th edition, by James Stewart, Llama begins reproducing word-for-word the continuation of the section.

Several authors have already sued Meta for alleged copyright infringement, which brought to light the company’s internal discussions about how to handle “media coverage suggesting we have used a dataset we know to be pirated.” Last year, a federal judge ruled in favor of Meta in one of these lawsuits, though he pointed out that his ruling “does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful.”

A group of authors also sued Anthropic over copyright infringement. While a federal judge ruled that training AI models on legally purchased books without permission is considered fair use, he allowed the authors to move forward with a class action lawsuit over the “millions” of works Anthropic allegedly pirated. Anthropic agreed to pay writers $1.5 billion last year to settle the class action lawsuit.

Turow and the group of publishers are suing Meta for damages, and ask that the court order the company to block its allegedly unlawful activities. They also ask the court to require the company to provide a list of books, journal articles, and other copyrighted works that it trained its Llama AI models on.

“AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use,” Meta spokesperson Dave Arnold said in an emailed statement to *The Verge.* “We will fight this lawsuit aggressively.”

0 Comments

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.

Emma Roth

この記事をシェア

KDnuggets重要度42026年6月27日 00:00

Apple Silicon で MLX を用いた言語モデルのファインチューニング

The Zvi重要度42026年6月26日 23:51

ホワイトハウスが個別に GPT-5.6 のアクセス権をその場しのぎで決定する方針へ

AWS Machine Learning Blog重要度42026年6月26日 23:42

AWS を活用した保険仲介向けドメイン特化型 AI の先駆者、Cara の取り組み

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む