The Verge AI·2026年6月21日 03:46·約3分で読める

大西洋月刊が AI 学習に使用された音楽の検索可能データベースを作成

TL;DR

The Atlantic が作成した検索可能なデータベースは、AI モデルの学習に使用された数百万曲のうち、本来非公開であるべき楽曲がどのようにデータセットに漏洩しているかを可視化し、著作権侵害と透明性の欠如という深刻な課題を浮き彫りにしました。

AI深層分析2026年6月21日 06:01

重要/ 5段階

深度40%

キーポイント

非公開データの可視化と検索機能の実装

The Atlantic が作成したデータベースは、AI トレーニングデータセットに含まれる数百万曲のうち、権利者が意図せず公開された楽曲や、本来非公開であるべき楽曲を検索可能にしました。

著作権侵害とトレーニングデータの透明性

このイニシアチブは、生成 AI モデルが学習する際に、許可なく使用された音楽コンテンツの規模を具体的に示し、業界全体のデータ収集プロセスにおける透明性の欠如を指摘しています。

クリエイターへの権利保護の重要性

アーティストや著作権保持者にとって、自身の作品が AI 学習に使用されているかどうかを確認・管理する手段としてのデータベースの意義は極めて大きく、法的な対抗策の基盤となり得ます。

影響分析・編集コメントを表示

影響分析

この記事は、生成 AI の学習プロセスにおける「ブラックボックス」化されたデータ収集の実態を暴く重要な一歩であり、音楽業界および著作権法における議論に決定的な影響を与えます。AI 開発企業にとっては、トレーニングデータの選別とライセンス管理の重要性が再認識され、法的リスクや倫理的批判への対応が迫られることになります。

編集コメント

AI の学習データにおける著作権問題は現在最もホットなトピックの一つであり、このデータベースは開発者側が直面するリスクを具体的に可視化する画期的なツールと言えます。

数百万の楽曲が、本来は公開すべきではないにもかかわらず、データセット内で自由に利用可能となっています。

Terrence O'Brien による記事

2026年6月21日 GMT+9 午前3時46分

画像：Cath Virginia / The Verge

Terrence O'Brien は The Verge の週末編集者です。彼はテクノロジー業界を18年以上にわたり取材しており、シンセサイザーについても詳しい知識を持っています。

*Atlantic* 紙の記者 Alex Reisner 氏は最近、AI モデル（人工知能モデル）の学習に使用されている音楽の 4 つのデータセットを発見し、これらを一般向けに完全に検索可能にしました。そのうち2つのセットは非常に巨大で、それぞれ1200万曲と900万曲に及びます。残りの2つは比較的小さいものの、それでも各セットとも10万曲を超える楽曲を含んでおり、学習データとして重要な規模を占めています。

ライズナーによると、これらのセットは数千回ダウンロードされており、誰が使用したかを正確に知ることは不可能ですが、Google と Stability はともに研究論文においてそれらを使用したことを確認しています。Free Music Archive データセットなどの一部のソースは、個人利用のためにストリーミングが無料ですが、商用アプリケーションにはライセンスが必要です。

これらのデータセットは理論上インターネット上で自由に入手可能ですが、トレーニングデータとして使用するには、単に ZIP ファイルをダウンロードして AI モデルに読み込ませるだけでは不十分です。ライズナーの説明によると：

**私が発見した 3 つのデータセットは、YouTube または Spotify の楽曲へのリンク一覧として配布されています。AI 開発者は、自動化されたツールを使用して実際のオーディオファイルをダウンロードしますが、これらのツールの一部には、クリエイターが収益を得たり登録者を増やしたりするためのログイン、広告、および仕組みを回避できるものもあります。このようなツールは、各プラットフォームの利用規約に違反しています。

データセットに登場する名前は、ポップスターの Lady Gaga や Fred Again.. から、Radiohead、Aphex Twin、Wu-Tang Clan、Bruce Springsteen、そして実験音楽作曲家の Hainbach まで多岐にわたります。ご自身で、世界の AI モデル（人工知能モデル）の学習に使用されている楽曲や書籍、その他のメディアを検索するには、*Atlantic* の AI ウォッチドッグサイトへお越しください。

2 件のコメント

この記事のトピックや著者をフォローして、パーソナライズされたホームページフィードで類似の記事をもっとご覧になったり、メールでの更新を受け取ったりしてください。

Terrence O'Brien

The Verge Daily

最も重要なニュースを毎日お届けする無料ダイジェスト。

メールアドレス（必須）

原文を表示

Millions of tracks are freely available in datasets, even if they’re not supposed to be.

by Terrence O'Brien

Jun 21, 2026, 3:46 AM GMT+9

Image: Cath Virginia / The Verge

Terrence O'Brien

is the Verge’s weekend editor. He’s covered the tech industry for over 18 years and knows a thing or two about synths.

*Atlantic* reporter Alex Reisner recently uncovered four datasets of music being used to train AI models and made them fully searchable for the public. Two of the sets are absolutely enormous at 12 million and 9 million tracks. The other two are much smaller, but still represent a significant amount of training data at over 100,000 songs each.

According to Reisner, the sets have been downloaded thousands of times and, while it’s impossible to know exactly who has used them, Google and Stability have both confirmed they have in research papers. Some of the sources, like the Free Music Archive dataset, are free to stream for personal use but require licensing for commercial applications.

While the datasets are freely available on the internet in theory, using them as training data is not as simple as downloading a ZIP file and feeding it to an AI model. As Reisner explains:

Three of the datasets I found are distributed as a list of links to songs on YouTube or Spotify. AI developers download the actual audio using tools that automate the job, some of which allow developers to bypass logins, advertisements, and mechanisms that might earn money or subscribers for creators. Such tools violate the terms of service of these platforms.

Names that pop up in the dataset range from pop stars like Lady Gaga and Fred Again.., to Radiohead, Aphex Twin, Wu-Tang Clan, Bruce Springsteen, and experimental composer Hainbach. You can hop over to the *Atlantic’s* AI Watchdog site and search through the songs, books, and other media being used to train the world’s AI models yourself.

2 Comments

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.

Terrence O'Brien

The Verge Daily

A free daily digest of the news that matters most.

Email (required)

この記事をシェア

The Verge AI★42026年6月11日 02:20

Google、YouTube のクリエイターを音楽 AI に利用していることを認めない

独立系ミュージシャンらが Google を相手取り訴訟を起こし、同社が YouTube にアップロードされた楽曲を Lyria 3 モデルの学習に無断で使用したと主張している。Google はこれを否定していないものの、公式には認めていない。

TechCrunch AI★42026年6月19日 09:51

Elastic、CRV 支援の DeductiveAI を最高 8500 万ドルで買収へ合意

検索・分析プラットフォーム「Elastic」が、リスク管理 AI 企業「DeductiveAI」を最高額 8500 万ドルで買収することに合意した。

TechCrunch AI★32026年6月19日 05:30

Snap、AI動画チームを新会社「Dotmo」として分社化、コスト削減のため

Snap社は高騰するコストに対応するため、自社の AI 動画開発チームを独立させ、新たな企業「Dotmo」としてスピンオフさせる方針を発表した。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

The Verge AI·2026年6月21日 03:46·約3分で読める

大西洋月刊が AI 学習に使用された音楽の検索可能データベースを作成

#生成 AI #著作権 #データセット #音楽産業 #透明性

TL;DR

AI深層分析2026年6月21日 06:01

重要/ 5段階

深度40%

キーポイント

非公開データの可視化と検索機能の実装

著作権侵害とトレーニングデータの透明性

クリエイターへの権利保護の重要性

影響分析・編集コメントを表示

影響分析

編集コメント

数百万の楽曲が、本来は公開すべきではないにもかかわらず、データセット内で自由に利用可能となっています。

Terrence O'Brien による記事

2026年6月21日 GMT+9 午前3時46分

画像：Cath Virginia / The Verge

2 件のコメント

Terrence O'Brien

The Verge Daily

最も重要なニュースを毎日お届けする無料ダイジェスト。

メールアドレス（必須）

原文を表示

Millions of tracks are freely available in datasets, even if they’re not supposed to be.

by Terrence O'Brien

Jun 21, 2026, 3:46 AM GMT+9

Image: Cath Virginia / The Verge

Terrence O'Brien

is the Verge’s weekend editor. He’s covered the tech industry for over 18 years and knows a thing or two about synths.

While the datasets are freely available on the internet in theory, using them as training data is not as simple as downloading a ZIP file and feeding it to an AI model. As Reisner explains:

Three of the datasets I found are distributed as a list of links to songs on YouTube or Spotify. AI developers download the actual audio using tools that automate the job, some of which allow developers to bypass logins, advertisements, and mechanisms that might earn money or subscribers for creators. Such tools violate the terms of service of these platforms.

2 Comments

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.