Simon Willison Blog·2026年3月6日 01:49·約7分

コーディングエージェントは「クリーンルーム」実装によりオープンソースを再ライセンスできるか？

#Open Source Licensing #Coding Agents #Legal Compliance #Clean Room Implementation #Python

TL;DR

コーディングエージェントによる「クリーンルーム」方式のコード生成が、既存のオープンソースライブラリchardetのライセンス再許諾を巡る法的・倫理的争点を浮上させ、AI生成コードの著作権帰属とオープンソースコンプライアンスに重大な影響を与える可能性を示唆している。

AI深層分析2026年4月26日 04:21

重要/ 5段階

深度40%

キーポイント

コーディングエージェントによるクリーンルーム実装の進化

従来の人間による数週間〜数ヶ月を要していたクリーンルーム実装（仕様策定と新規実装の分離）が、コーディングエージェントを用いることで数時間で完了可能になりつつある。

chardetライブラリのライセンス再許諾事件

Pythonの文字エンコーディング検出ライブラリ「chardet」のmaintainerが、LGPLライセンスの下で書かれた既存コードを参照しつつ、MITライセンスで再実装・再リリースしたことが問題視されている。

創始者によるライセンス侵害の主張

元作者Mark Pilgrimは、maintainerが元のコードにアクセスしていたため「クリーンルーム」要件を満たさず、LGPLの条件（派生物は同じライセンスで公開）に違反していると主張し、再許諾権はないと反論している。

AI生成コードとオープンソースライセンスの衝突

Fancyなコードジェネレーター（AI）の使用が、法的に「クリーンルーム」実装として認められるか、あるいは単なる改変とみなされるかという法的グレーゾーンが顕在化している。

影響分析・編集コメントを表示

影響分析

この記事は、AIコーディングエージェントがオープンソースエコシステムに与える構造的な影響を指摘しています。単なる生産性の向上だけでなく、既存の知的財産権やライセンス条項（特にLGPLのようなcopyleft系）との衝突が現実的な法的争点として表面化しています。開発者は、AIツールを使用してコードを生成・改変する際、単に「書き直した」という事実だけでなく、元のコードへのアクセス履歴と法的な「クリーンルーム」要件の定義を厳密に検証する必要があり、企業レベルでの法務レビュープロセスの見直しが必要になるでしょう。

編集コメント

AIによるコード生成が単なる生産性ツールではなく、知的財産権の帰属を問う法的な争点になり得ることを示す重要な事例です。開発者はAI使用時のライセンスコンプライアンスについて、従来の「書き直し」の概念とは異なる視点からの法務対応が求められるようになります。

ここ数ヶ月で、コーディングエージェントがコードの「クリーンルーム」実装の奇妙なバージョンを構築するのに極めて長けていることが明らかになってきました。

このパターンの最も有名な例は、1982年にコンパックがIBM BIOSのクリーンルーム・クローンを作成したときです。彼らは一つのエンジニアチームにBIOSをリバースエンジニアリングして仕様書を作成させ、その仕様書を別のチームに渡して新規に一から構築させました。

このプロセスは、以前は複数のエンジニアチームが数週間から数ヶ月かけて完了させていました。コーディングエージェントは、このプロセスの一形態を数時間で実行できます。私は昨年12月にJustHTMLに対してこのパターンのバリエーションを実験しました。

これについては、倫理的にも法的にも多くの未解決の問題があります。これらの問題は、由緒あるchardet Pythonライブラリで山場を迎えているようです。

chardetは2006年にマーク・ピルグリムによって作成され、LGPLの下でリリースされました。マークは2011年に公的なインターネット生活から引退し、chardetのメンテナンスは他の人々に引き継がれました。特にダン・ブランチャードは、2012年7月のバージョン1.1以降、すべてのリリースを担当してきました。

2日前、ダンはリリースノートに以下のメモを添えてchardet 7.0.0をリリースしました：

一から書き直された、MITライセンスのchardetです。同じパッケージ名、同じ公開API——chardet 5.x/6.xのドロップイン置換です。ただ、はるかに高速で正確です！

昨日、マーク・ピルグリムが#327: このプロジェクトのライセンスを変更する権利はないを開設しました：

[...] まず、現在のメンテナンス担当者と、長年にわたりこのプロジェクトに貢献し改善してきたすべての方々に感謝したいと思います。まさにフリーソフトウェアの成功物語です。
しかし、バージョン7.0.0のリリースで、メンテナンス担当者がプロジェクトを「再ライセンス」する権利があると主張していることが私の注意を引きました。彼らにはそのような権利はありません。そのような行為はLGPLの明示的な違反です。修正されたライセンスコードは、同じLGPLライセンスの下でリリースされなければなりません。彼らが「完全な書き直し」であると主張することは無関係です。なぜなら、彼らは元のライセンスコードに十分に触れていたからです（つまり、これは「クリーンルーム」実装ではありません）。洗練されたコードジェネレーターを混ぜても、彼らに追加の権利が与えられるわけではありません。

ダンの長い返信には以下の内容が含まれていました：

私が元のコードベースに広範に触れていたことは正しいです：私は10年以上にわたりそれをメンテナンスしてきました。従来のクリーンルームアプローチでは、元のコードの知識を持つ人々と新しい実装を書く人々の間の厳格な分離が含まれますが、その分離はここには存在しませんでした。
しかし、クリーンルーム手法の目的は、結果としてのコードが元のコードの派生物（デリバティブワーク）ではないことを保証することです。それは目的を達成する手段であって、目的そのものではありません。この場合、私は結果が同じであること——新しいコードは構造的に古いコードから独立していること——を、プロセスの保証だけではなく直接的な測定を通じて示すことができます。

ダンはさらに、JPlagツール——「最先端のソースコード盗作（プラギアリズム）・共謀検出」と自称する——からの結果を示し、新しい7.0.0リリースが以前のリリースと最大1.29%、バージョン1.1と0.64%の類似性しかないことを示しました。他のリリースバージョンは80-93%の範囲の類似性がありました。

彼はその後、彼のプロセスに関する重要な詳細を共有し、以下を強調しました：

完全な透明性のために、書き直しがどのように行われたかを以下に示します。私はsuperpowersブレインストーミングスキルを使用して、書き直しのために私が持っていた以下の要件に基づいて、アーキテクチャとアプローチを指定した設計文書を作成しました[...]
その後、私は古いソースツリーにアクセスできない空のリポジトリで開始し、ClaudeにLGPL/GPLライセンスコードに基づかないように明示的に指示しました。その後、Claudeを使用して結果のすべての部分をレビュー、テスト、反復しました。[...]
これは新しく不快な領域であり、長年続くオープンソースプロジェクトの書き直しにAIツールを使用することは正当な疑問を提起することを理解しています。しかし、ここでの証拠は明らかです：7.0は独立した作品であり、LGPLライセンスコードベースの派生物ではありません。MITライセンスは正当にそれに適用されます。

書き直しがClaude Codeを使用して行われたため、リポジトリには多くの興味深い成果物が利用可能です。2026-02-25-chardet-rewrite-plan.mdは特に詳細で、書き直しプロセスの各段階を順を追って説明しています——まずテストから始め、次に計画された置換コードを肉付けしています。

このケースを自信を持って解決するのを特に困難にしているいくつかの捻りがあります：

ダンは10年以上にわたりchardetに没頭しており、明らかに元のコードベースから強い影響を受けています。
Claude Codeが作業中にコードベースの一部を参照した例が一つあります。計画に示されているように——それはmetadata/charsets.pyを見ました。これは文字セットとそのプロパティをデータクラスの辞書として表現したファイルです。
さらに複雑なことに：Claude自体は、その膨大な量のトレーニングデータの一部としてchardetでトレーニングされた可能性が非常に高いです——ただし、これを確実に確認する方法はありません。コードベースでトレーニングされたモデルは、倫理的または法的に正当化できるクリーンルーム実装を生成できるでしょうか？
新しいchardetリリースが古いものと同じPyPIパッケージ名を使用したという事実は、どれほど重要でしょうか？新しい名前での新規リリースの方がより正当化しやすかったでしょうか？

これがどのように展開するかは全くわかりません。個人的には、書き直しは正当であるという考えに傾いていますが、この問題の両側の議論は完全に信頼できるものです。

私はこれを、既存の成熟したコードの新規実装に関するコーディングエージェントのより大きな問題の縮図として見ています。この問題はまずオープンソースの世界に影響を与えていますが、まもなく商業世界のコンパックのようなシナリオでも現れ始めると予想しています。

商業企業が彼らの厳重に保持された知的財産（IP）が脅威にさらされていることを認識すれば、十分な資金を持った訴訟が起こると予想しています。

タグ: ライセンス, マーク・ピルグリム, オープンソース, AI, 生成AI, LLM（大規模言語モデル）, AI支援プログラミング, AI倫理, コーディングエージェント

原文を表示

Over the past few months it's become clear that coding agents are extraordinarily good at building a weird version of a "clean room" implementation of code.

The most famous version of this pattern is when Compaq created a clean-room clone of the IBM BIOS back in 1982. They had one team of engineers reverse engineer the BIOS to create a specification, then handed that specification to another team to build a new ground-up version.

This process used to take multiple teams of engineers weeks or months to complete. Coding agents can do a version of this in hours - I experimented with a variant of this pattern against JustHTML back in December.

There are a *lot* of open questions about this, both ethically and legally. These appear to be coming to a head in the venerable chardet Python library.

chardet was created by Mark Pilgrim back in 2006 and released under the LGPL. Mark retired from public internet life in 2011 and chardet's maintenance was taken over by others, most notably Dan Blanchard who has been responsible for every release since 1.1 in July 2012.

Two days ago Dan released chardet 7.0.0 with the following note in the release notes:

Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x. Just way faster and more accurate!

Yesterday Mark Pilgrim opened #327: No right to relicense this project:

[...] First off, I would like to thank the current maintainers and everyone who has contributed to and improved this project over the years. Truly a Free Software success story.
However, it has been brought to my attention that, in the release 7.0.0, the maintainers claim to have the right to "relicense" the project. They have no such right; doing so is an explicit violation of the LGPL. Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a "clean room" implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights.

Dan's lengthy reply included:

You're right that I have had extensive exposure to the original codebase: I've been maintaining it for over a decade. A traditional clean-room approach involves a strict separation between people with knowledge of the original and people writing the new implementation, and that separation did not exist here.
However, the purpose of clean-room methodology is to ensure the resulting code is not a derivative work of the original. It is a means to an end, not the end itself. In this case, I can demonstrate that the end result is the same — the new code is structurally independent of the old code — through direct measurement rather than process guarantees alone.

Dan goes on to present results from the JPlag tool - which describes itself as "State-of-the-Art Source Code Plagiarism & Collusion Detection" - showing that the new 7.0.0 release has a max similarity of 1.29% with the previous release and 0.64% with the 1.1 version. Other release versions had similarities more in the 80-93% range.

He then shares critical details about his process, highlights mine:

For full transparency, here's how the rewrite was conducted. I used the superpowers brainstorming skill to create a design document specifying the architecture and approach I wanted based on the following requirements I had for the rewrite [...]
I then started in an empty repository with no access to the old source tree, and explicitly instructed Claude not to base anything on LGPL/GPL-licensed code. I then reviewed, tested, and iterated on every piece of the result using Claude. [...]
I understand this is a new and uncomfortable area, and that using AI tools in the rewrite of a long-standing open source project raises legitimate questions. But the evidence here is clear: 7.0 is an independent work, not a derivative of the LGPL-licensed codebase. The MIT license applies to it legitimately.

Since the rewrite was conducted using Claude Code there are a whole lot of interesting artifacts available in the repo. 2026-02-25-chardet-rewrite-plan.md is particularly detailed, stepping through each stage of the rewrite process in turn - starting with the tests, then fleshing out the planned replacement code.

There are several twists that make this case particularly hard to confidently resolve:

Dan has been immersed in chardet for over a decade, and has clearly been strongly influenced by the original codebase.

There is one example where Claude Code referenced parts of the codebase while it worked, as shown in the plan - it looked at metadata/charsets.py, a file that lists charsets and their properties expressed as a dictionary of dataclasses.

More complicated: Claude itself was very likely trained on chardet as part of its enormous quantity of training data - though we have no way of confirming this for sure. Can a model trained on a codebase produce a morally or legally defensible clean-room implementation?

How significant is the fact that the new release of chardet used the same PyPI package name as the old one? Would a fresh release under a new name have been more defensible?

I have no idea how this one is going to play out. I'm personally leaning towards the idea that the rewrite is legitimate, but the arguments on both sides of this are entirely credible.

I see this as a microcosm of the larger question around coding agents for fresh implementations of existing, mature code. This question is hitting the open source world first, but I expect it will soon start showing up in Compaq-like scenarios in the commercial world.

Once commercial companies see that their closely held IP is under threat I expect we'll see some well-funded litigation.

Tags: licensing, mark-pilgrim, open-source, ai, generative-ai, llms, ai-assisted-programming, ai-ethics, coding-agents

この記事をシェア

Simon Willison Blog重要度42026年7月5日 07:53

より優れたモデル、劣化したツール

KDnuggets2026年7月3日 21:00

Python で Claude API を使い始めるガイド

Simon Willison Blog2026年7月3日 04:33

Simon Willison Blog の llm-coding-agent 0.1a0 リリース

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む