The Register AI/ML·2026年4月29日 02:51·約10分

過去の時代を生きるヴィンテージチャットボットは年配の親戚のよう

#LLM #AGI #ベンチマーク #データセット

TL;DR

1930 年以前のデータのみで訓練された大規模言語モデル「Talkie」が公開され、AI の歴史的理解や AGI テストへの応用可能性が議論されている。

AI深層分析2026年5月8日 00:12

注目/ 5段階

深度40%

キーポイント

Talkie モデルの概要と制約

1930 年以前に出版された書籍や新聞など、英語圏のデジタルスキャンデータのみで訓練された 130 億パラメータの「ヴィンテージ」LLM で、ナチスプロパガンダなどの現代の有害コンテンツを排除している。

研究目的と AGI テスト

単なる懐古趣味ではなく、過去の知識のみで科学理論（相対性理論など）を再発見できるか検証することで、AI の推論能力や AGI 評価基準の確立を目指す実験的プロジェクトである。

現代モデルとの比較と限界

Python プログラミング問題などのタスクで現代データ訓練モデルと比較された結果、単純な解決策は生成できるものの、複雑な処理には明確な限界があることが示された。

影響分析・編集コメントを表示

影響分析

このニュースは、LLM の訓練データ選定における「歴史的文脈」や「バイアス除去」という新たなアプローチを示しており、AI の安全性や推論能力の根本的な評価基準を見直す契機となる可能性があります。ただし、実社会での即時的な業務応用よりも、研究機関における AGI 検証ツールとしての意義が主眼にあるため、業界全体への直接的なインパクトは限定的です。

編集コメント

AI の「過去」を学ぶことで未来の知能を測ろうとするユニークな試みですが、実用性よりも研究意義が強調された興味深いケーススタディです。

ナチスプロパガンダを垂れ流したり、自分自身をメカヒトラーと呼んだりするボットとのやり取りに疲れたなら、イーロン・マスクの xAI からサインオフすることもできます。あるいは、より確実な方法として、1930 年を最後にトレーニングデータが終了している大規模言語モデル（LLM）を使用してください。これはナチスがドイツで権力を掌握する 3 年前であり、第二次世界大戦が始まる 9 年前です。

ある AI 研究者のグループは、1930 年末までに出版された英語書籍、新聞、雑誌、科学ジャーナル、特許、判例法のデジタルスキャンのみをトレーニングデータとして使用して訓練された、130 億パラメータの「ヴィンテージ」言語モデル「Talkie」を発表しました。1931 年以前の作品が選ばれたのは、1930 年が米国における現在のパブリックドメイン（公有）年のためです。つまり、第二次世界大戦やフランクリン・D・ルーズベルトの当選、アメリヤ・イヤーハートの大西洋横断単独飛行、あるいは電子レンジの仕組みについて情報を求めているなら、あなたは運が悪いです。ベティ・ブープ、フラッパー、大恐慌が始まった頃の米国経済の状態、自動車のラジオ導入による社会学的影響について尋ねるなら、あなたは正しい場所に来ました。

もちろん、これはヴィンテージ AI モデルが現れたのは初めてではありません。ビクトリア朝文学や 1900 年以前の科学テキストをトレーニングデータとした他のモデルも既に存在しています。しかし、その創作者によると、これが彼らが知る限り最大規模のものです。「Talkie は私たちが知る限り最大のヴィンテージ言語モデルであり、私たちは大幅にスケーリングし続ける計画です」と、それを開発したチームは述べています。

確かに、フラッパーや哲学者バートランド・ラッセルの初期作品のようなスタイルでしか幻覚（ハルシネーション）を起こさない AI と話すのは面白いかもしれませんが、AI へのあらゆる問い合わせが地球を燃やし尽くす可能性がある中で、なぜ知識のカットオフが 1930 年である必要があるのか、本当に問わなければなりません。私たちは Talkie の創作者に連絡を取りましたが、返信はありませんでした。しかし、その解説文には十分な説明が記載されています。「これらのモデルは魅力的な会話相手ですが、ヴィンテージ LLM（大規模言語モデル）の振る舞いや能力を慎重に研究することが、AI 全般に対する私たちの理解を深める可能性にも興奮しています」と Talkie チームは書いています。

例えば、彼らは AI が未来を予測する能力を試すことを例に挙げ、別の提案では、Google DeepMind の共同創設者兼 CEO デミス・ハサビスが AGI（汎用人工知能）の良きテストになると述べたものを実行することを提案しています。モデルの知識を 1911 年にカットオフし、アインシュタインが 1915 年に理論を開発した際に持っていた情報と同じものを使って一般相対性理論を導き出そうとさせるのです。

つまり、この AI は、それらを発見した人々が利用できた知識のみを用いて、正確な科学的発見を行うことができるのでしょうか？Talkie が一般相対性理論の導出のような厳しいテストにかけられたかどうかは不明ですが、同じアーキテクチャを持ちながら現代データで訓練されたモデルに対して Python プログラミングの問題を解けるかどうかが試されました。いくつかの正しい解決策を生成しましたが、かなりの制限がありました。「ヴィンテージモデルが生成したすべての正しい解決策は、単純な 1 行プログラム（2 つの入力を加算するなど）か、コンテキスト内の例プログラムの小さな修正に限られます」と Talkie チームは述べています。つまり、「この能力が顕著になるにはまだ長い道のりがある」とチームは言っています。

LLM の性能向上が Talkie の目的の主要な一部であることは確かですが、開発チームはそれが唯一の目標ではないと語りました。トロント大学のコンピュータサイエンスおよび統計学准教授であり、Talkie を開発した 3 人の一人であるデビッド・デュベナウドは、電子メールで The Register にこう述べています。「すべての予測がすでに起こった事実に基づくため、Talkie は長期予測手法の評価にも役立つことを願っています。」デュベナウドはまた、チームが Talkie を用いて文化的変化を研究する関心を持っていることも説明しました。「例えば、これらのモデルを使用して、当時の言語の暗黙的な前提や意味に基づき、ある法律が制定された当時にどのように解釈されていたかを理解しようと試みることができます」とデュベナウドは語りました。「第三の動機は、モデルがいかにして自己概念を形成するかを理解することです。『LLM がどのように振る舞うか』はいくつかの意味で予言的実現（セルフフルフィリング・プロフェシー）なので、LLM 自体が何であるかも知らないモデルに話しかけることで、これについて学ぶことができます」とデュベナウドは付け加えました。

それでも、130 億パラメータという規模ゆえに、デュベナウドは Talkie と現代データで訓練された AI モデルの間には大きな能力のギャップがあることを認めています。「アマチュアの研究努力として、データや計算資源においてこのギャップを完全に埋めることは決して期待していません」と、コンピュータサイエンス教授は語りました。

さて、これらの制限 aside、それ以外ではどのように機能するのでしょうか？

非デジタルのトレーニングデータを責めないでください

Talkie のベンチマークに使用された同じアーキテクチャを持つ現代データ訓練モデルに戻ると、科学的発見や AGI の証明が欠けているのはヴィンテージ版だけではないようです。「平均的に、Talkie は質問の時代錯誤を補正した後も、同じ数の FLOPs（1 秒あたりの浮動小数点演算回数）で訓練されているにもかかわらず、標準的な LLM 評価においてその現代 counterpart よりも劣っています」とチームは書いています。しかし、言語理解や数値計算の核心的なテストでは、Talkie は現代モデルと同様に良好に機能したと Talkie の創作者は指摘し、他の分野での劣った性能の原因を光学文字認識（OCR）にあると推測しています。

「1930 年にはデジタル出版が存在しなかったため、データセット内のすべてのテキストは物理ソースから転写する必要があり、これはネイティブなデジタルテキストには見られないノイズの一種を導入します」と Talkie チームは述べています。OCR 処理された文書に直面したことがある人なら、このようなコンピュータビジョンツールがいかに簡単に誤りを犯すかを知っているでしょう。それは Talkie のような AI が不適切な、あるいは意味不明な回答を吐き出す原因になり得ます。

Talkie に関する彼らの作業を通じて、チームは OCR 処理された 1931 年以前のテキストのみで言語モデルを訓練しても、同じ文書の人間による転写コピーで訓練されたモデルの性能の 30% にしか達しないことを突き止めました。正規表現（Regex）を用いたデータクリーニングにより、OCR 処理されたテキストの性能は人間による転写コピーの 70% まで向上しますが、これは Talkie の創作者にとってあまりにも大きな乖離です。彼らは Talkie のためのより多くのトレーニングデータを生成するために独自の OCR エンジンの開発に取り組んでいます。

また、チームによると、Talkie は「時間的リーケージ（temporal leakage）」という問題も抱えています。1936 年に FDR が大統領であることを特定し、彼の立法業績の一部を列挙することができましたが、トレーニングデータは 1931 年でカットオフされているはずです。チームによれば、これは「事前トレーニングコーパスの不完全なフィルタリングの一例」であり、現在も対応を進めているとのことです。

つまり、Talkie はタイムカプセル内のチャットボットの完璧な例からはほど遠いですが、開発チームは今後数ヶ月でモデルのスケーリングに注力する意向です。課題には、英語以外のテキストへの対応、トレーニングデータの可能な限り多くの再 OCR 処理、時代錯誤検出手法の強化、そして歴史家と協力してより良いポストトレーニングデータを投入することが含まれます。

ナチズムに対する正確な見解を持っているかどうか気になるかもしれませんが、その見解は 1920 年代に固定されています。ナチスがドイツにおける反ユダヤ主義かつ権威主義的な政党であることを知っていますが、彼らを指導しているのは 1870 年生まれのヘルマン・ヨーゼフ・フォン・ヒトラーという人物だと考えています（アドルフ・ヒトラーの 20 年前です）。

計画通りにいけば、GPT-3 レベルの Talkie バージョンは今年の夏までに公開されるはずです。「予備的な推定では、歴史的文書のコーパスを 1 トリリオントークン以上まで拡大できる可能性も示唆されており、これは GPT-3.5 レベルのモデルを作成するのに十分であるべきです。これはオリジナルの ChatGPT と同様の能力を持つものです」と Talkie の創作者は付け加えました。

その間、現在のバージョンの Talkie は GitHub および Hugging Face からダウンロード可能であり、興味のある方は Web インターフェースを通じてチャットすることもできますが、警告には注意してください。「Talkie はトレーニングに使用されたテキストの文化と価値観を反映しています…不正確または攻撃的な出力を生じる可能性があります」と Talkie の Web クライアントにある勧告には記載されています。「メッセージはストリーミングされますが、モデレーション（内容審査）は最後にのみ適用されるため、フラグが付けられる前に一時的に問題のあるコンテンツが表示される可能性があることにご注意ください。」

原文を表示

If you're tired of interacting with a bot that spews Nazi propaganda or refers to itself as MechaHitler, you could sign off of Elon Musk's xAI. Or, just to be sure, use an LLM whose training data ends in 1930, three years before the Nazis took power in Germany and nine years before World War II started. A trio of AI researchers has released a 13-billion-parameter "vintage" language model they call Talkie, which has been trained solely on digital scans of English-language books, newspapers, periodicals, scientific journals, patents, and case law that were published before the end of 1930. Pre-1931 works were chosen because 1930 is the current public domain year in the United States. In other words, if you're looking for information on World War II, the election of Franklin D. Roosevelt, Amelia Earhart's solo Atlantic flight, or how a microwave oven works, you're out of luck. Ask it about Betty Boop, flappers, the state of the US economy as the Great Depression began, or the sociological effects of the introduction of car radios, and you've come to the right place. This isn't the first vintage AI model to appear, mind you, with others trained on Victorian literature and pre-1900 scientific texts already out in the world. It is, according to its creators, the largest they are aware of. "Talkie is the largest vintage language model we are aware of, and we plan to continue scaling significantly," the team behind it noted. Neat, but … why? Sure, it could be neat to chat with an AI that would only hallucinate things in the style of a flapper, or the early works of philosopher Bertrand Russell, but with every query to an AI potentially burning up the planet, one really has to ask why an AI with a knowledge cutoff of 1930 is necessary. We reached out to Talkie's creators, and while we didn't hear back, the writeup gives plenty of explanations. "These models are fascinating conversation partners … but we are also excited by the possibility that the careful study of the behaviors and capabilities of vintage LMs will advance our understanding of AI in general," the Talkie team wrote. As one example, they cite testing an AI's ability to predict the future; in another they propose undertaking what Google DeepMind co-founder and CEO Demis Hassabis has said would be a good test of AGI: Cut a model's knowledge off at 1911, and have it try to come up with general relativity with the same information Einstein had when he developed the theory in 1915. In other words, can this AI make accurate scientific discoveries using only the knowledge available to people who made them? It's not clear whether Talkie has been put to a test as tough as coming up with general relativity, but it was pushed to see if it could solve Python programming test problems against a model with identical architecture but trained on modern data. It did generate some correct solutions, but with considerable limits. "All correct solutions generated by the vintage models are simple one-line programs (such as adding two inputs), or small modifications to in-context example programs," the Talkie team said. In other words, "There is still a long way to go before this capability is notable," per the team. While improving LLM performance is a major part of Talkie's objective, the team behind it told us that it wasn't the only goal. David Duvenaud, associate professor in computer science and statistics at the University of Toronto and one of the three people behind Talkie, told The Register in an email that he hopes Talkie will also be able to help evaluate long-term forecasting methods, given all its predictions will be based on things that've already happened. Duvenaud also explained that his team is interested in using Talkie to study cultural change. "For instance, we can use these models to try to understand how a law would have been interpreted at the time it was written, based on the implicit assumptions and meaning of language at the time," Duvenaud told us. "A third motivation is understanding how models form their own self-conception," Duvenaud added. "'How an LLM acts' is a self-fulfilling prophecy in some senses, so we can learn about this by talking to models who don't even know what an LLM is." Still, with just 13 billion parameters, Duvenaud admitted that there's a big capability gap between Talkie and AI models trained on modern data. "As an amateur research effort, we never expect to be able to fully close this gap, in data or compute," the compsci prof told us. Okay, so aside from those limitations, how does it perform otherwise? Don't blame me, blame your non-digital training data Going back again to the same-architecture model trained on modern data used to benchmark Talkie, it looks like it's not just scientific discovery or proof of AGI that's lacking from the vintage version. "On average, talkie underperforms its modern counterpart in standard LM evaluations, even after correcting for question anachronism, despite being trained with the same number of FLOPs," the team wrote. It did do similarly well to the modern model on core language understanding and numeracy tests, Talkie's creators noted, and they suspect they know what's to blame for its subpar performance elsewhere: optical character recognition (OCR). "Because there was no digital publishing in 1930, all text in our dataset had to be transcribed from a physical source, which introduces a form of noise not seen in natively digital text," team Talkie said. Anyone who's had to deal with OCR'ed documents knows how easily such computer vision tools can get things wrong, which can easily cause an AI like Talkie to regurgitate bad, or even nonsensical, responses. Through their work on Talkie, the team determined that training a language model on OCR'ed pre-1931 texts only gave it 30 percent of the performance of a model trained on human-transcribed copies of the same documents. Regex data cleansing increases the performance of OCR'ed texts to 70 percent of human transcribed copies, but that's too large a discrepancy for Talkie's creators, who're working on their own OCR engine for generating more training data for Talkie. Talkie also has a problem with "temporal leakage," said the team: It was able to identify FDR as the president in 1936 and list some of his legislative accomplishments despite its training data supposedly cutting off at 1931. According to the team, that's just "an example of imperfect filtering of the pre-training corpus" and something they're still working on. Talkie is far from a perfect example of a chatbot in a time capsule, in other words, but the team behind it says that they're intent on scaling the model in the coming months. Tasks will include moving beyond English-language texts, re-OCR'ing as much of its training data as possible, strengthening anachronism detection methods, and working with historians to input better post-training data. If you're wondering about whether it takes an accurate view of Nazism, just know that its view is stuck in the 1920s. It knows that the Nazis "are" an antisemitic, authoritarian political party in Germany, but it thinks they are led by someone named Hermann Joseph von Hitler, a person who was born in 1870 (20 years before Adolf Hitler). If all goes according to plan, a GPT-3-level version of Talkie should be out by this summer. "A preliminary estimate also suggests we can grow our corpus to well over a trillion tokens of historical text, which should be sufficient to create a GPT-3.5 level model - similar in capability to the original ChatGPT," Talkie's creators added. In the meantime, the current version of Talkie is available to download from GitHub and Hugging Face, and can also be chatted with via a web interface for those curious - just mind the warning. "Talkie reflects the culture and values of the texts it was trained on … It can produce outputs that are inaccurate or offensive," reads an advisory on Talkie's web client. "Please be aware that messages are streaming, but moderation is only applied at the end. As a result, you may see objectionable content briefly before it is flagged." ®

この記事をシェア

KDnuggets重要度42026年6月27日 00:00

Apple Silicon で MLX を用いた言語モデルのファインチューニング

The Zvi重要度42026年6月26日 23:51

ホワイトハウスが個別に GPT-5.6 のアクセス権をその場しのぎで決定する方針へ

AWS Machine Learning Blog重要度42026年6月26日 23:42

AWS を活用した保険仲介向けドメイン特化型 AI の先駆者、Cara の取り組み

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む