TechCrunch AI·2026年2月24日 02:53·約7分

Guide Labs、解釈可能な新種のLLMを発表

#LLM #解釈可能性 #オープンソース #AI安全性

TL;DR

Guide Labsは、行動が容易に解釈可能な新アーキテクチャで訓練された80億パラメータのLLM「Steerling-8B」をオープンソース化しました。

AI深層分析2026年2月24日 03:40

重要/ 5段階

キーポイント

Guide Labsが解釈可能性を設計段階から組み込んだ新アーキテクチャのLLM「Steerling-8B」をオープンソース公開

モデルの出力トークンをトレーニングデータの起源まで遡って追跡可能にする「コンセプトレイヤー」を導入

従来のモデル解釈手法（モデルの神経科学）ではなく、設計段階から解釈可能性をエンジニアリングするアプローチ

量子計算などの「発見された概念」も追跡可能で、創発的挙動を維持しつつ透明性を確保

影響分析・編集コメントを表示

影響分析

LLMのブラックボックス問題に対する根本的なアプローチを示し、AIの安全性・説明責任・信頼性向上に寄与する可能性がある。特に、バイアス検出や事実検証、規制対応において重要な基盤技術となり得る。

編集コメント

「モデルを理解する神経科学」から「理解できるモデルを設計する工学」へのパラダイム転換を提案。実用化されれば、企業のAIガバナンスや規制対応が劇的に変わる可能性がある。

深層学習モデルを扱う上での課題は、往々にしてその動作理由を理解することにある。xAIがGrokの奇妙な政治観を微調整するために繰り返し苦闘していること、ChatGPTが追従癖に悩まされていること、あるいはありふれた虚構生成（ハルシネーション）など、何十億ものパラメータを持つニューラルネットワークの内部を探るのは容易ではない。

CEOのジュリアス・アデバヨとチーフ・サイエンス・オフィサーのアヤ・アブデルサラム・イスマイルによって設立されたサンフランシスコのスタートアップ、Guide Labsは本日、その問題に対する一つの答えを提示する。同社は月曜日、その動作が容易に解釈可能となるように設計された新しいアーキテクチャで訓練された80億パラメータのLLM「Steerling-8B」をオープンソース化した。このモデルが生成するすべてのトークンは、LLMの訓練データにおけるその起源まで遡って追跡可能だ。

これは、モデルが引用する事実の参照元を特定するような単純なものから、モデルのユーモアや性別に対する理解を把握するような複雑なものまで幅広い。

「もし性別をエンコードする方法が1兆通りあり、私が持っている1兆の要素のうち10億個でそれをエンコードした場合、エンコードしたその10億個すべてを見つけ出さなければならず、さらにそれを確実にオンにしたりオフにしたりできる必要があります」と、アデバヨはTechCrunchに語った。「現在のモデルでもそれは可能ですが、非常に脆いのです…。それはある種、聖杯を求めるような課題の一つです」

アデバヨはこの研究を、MITで博士号を取得中に開始した。2020年に共著した広く引用された論文では、深層学習モデルを理解する既存の方法は信頼性が低いことが示された。この研究は最終的に、LLMを構築する新しい方法の創出につながった。開発者はモデル内に「コンセプト層」を挿入し、データを追跡可能なカテゴリに分類する。これには初期段階でのデータアノテーション（注釈付け）がより多く必要となるが、他のAIモデルを活用することで、彼らはこれまでで最大の概念実証としてこのモデルの訓練に成功した。

「人々が行っている解釈可能性の研究は…モデルに対する神経科学のようなもので、我々はそれをひっくり返しました」とアデバヨは言う。「我々が実際に行っているのは、神経科学を行う必要がなくなるように、モデルを一から設計することです」

画像クレジット: Guide Labs

このアプローチに対する懸念の一つは、LLMを非常に興味深いものにしている創発的行動の一部を排除してしまう可能性があることだ。つまり、訓練されていない事柄について新しい方法で一般化する能力である。アデバヨによれば、これは同社のモデルでも依然として起こっているという。彼のチームは、量子コンピューティングのようにモデルが独自に発見した「発見された概念」を追跡している。

Techcrunchイベント TechCrunch Founder Summit で最大300ドルまたは30%オフ

1,000人以上の創業者と投資家が、成長、実行、実世界でのスケーリングに焦点を当てた丸一日のイベント、TechCrunch Founder Summit 2026に集結します。業界を形作ってきた創業者や投資家から学び、同じ成長段階を進む仲間とつながり、すぐに実践できる戦術を持ち帰れます。オファーは3月13日まで。

TechCrunch Founder Summit で最大300ドルまたは30%オフ

アデバヨは、この解釈可能なアーキテクチャは誰もが必要とするものになると主張する。消費者向けLLMでは、この技術によりモデルビルダーは、著作権保護素材の使用をブロックしたり、暴力や薬物乱用などの主題に関する出力をより適切に制御したりすることが可能になるはずだ。規制産業では、より制御可能なLLMが求められる。例えば金融では、融資申請者を評価するモデルが財務記録のような要素は考慮するが、人種は考慮しない必要がある。科学的研究においても解釈可能性は必要であり、これはGuide Labsが技術を開発したもう一つの分野だ。タンパク質折りたたみは深層学習モデルの大きな成功例だが、科学者たちは、なぜ自身のソフトウェアが成功する組み合わせを見つけ出せたのかについて、より深い洞察を必要としている。

「このモデルが実証しているのは、解釈可能なモデルの訓練はもはや一種の科学ではなく、エンジニアリングの問題だということです」とアデバヨは言う。「我々はその科学を解明し、それをスケールさせることができます。そして、この種のモデルが、はるかに多くのパラメータを持つ最先端レベルのモデルの性能に追いつかない理由はありません」

Guide Labsによれば、Steerling-8Bはその新しいアーキテクチャのおかげで、既存モデルの能力の90%を達成しつつ、より少ない訓練データで済むという。Y Combinatorから輩出され、2024年11月にInitialized Capitalから900万ドルのシード資金を調達した同社の次のステップは、より大規模なモデルを構築し、ユーザーにAPIおよびエージェント的アクセスを提供し始めることだ。

「現在我々がモデルを訓練している方法は非常に原始的であり、本質的な解釈可能性を民主化することは、人類にとって長期的には実に良いことになるでしょう」とアデバヨはTechCrunchに語った。「超知能となるようなこれらのモデルを追求していく中で、自分にとってある種不可解な存在が、自分に代わって決定を下すようなことは望まないはずです」

ティム・ファーンホルツティム・ファーンホルツは、技術、金融、公共政策について執筆するジャーナリストである。民間宇宙産業の台頭を詳細に取材し、『Rocket Billionaires: Elon Musk, Jeff Bezos and the New Space Race』の著者でもある。以前は、グローバルビジネスニュースサイトQuartzで10年以上シニアレポーターを務め、キャリアはワシントンD.C.での政治記者として始まった。連絡または取材依頼の確認は、tim.fernholz@techcrunch.com へメールするか、Signalでtim_fernholz.21に暗号化メッセージを送信することで可能。バイオを見る 10月13-15日サンフランシスコ、カリフォルニア州 2月27日までにパスを最大680ドル節約。投資家と出会い、次のポートフォリオ企業を発見しよう。250人以上の技術リーダーからの話を聞き、200以上のセッションに深く入り込み、次なるものを構築する300以上のスタートアップを探索できる。この一度きりの割引をお見逃しなく。

返したくない9,000ポンドの怪物

サム・アルトマンが思い出させたいこと：人間も大量のエネルギーを使っている

Google副社長が警告：2種類のAIスタートアップは生き残れないかもしれない

xAIにとって朗報：Grokは『Baldur's Gate』に関する質問に今やかなり上手く答えられる

ラッセル・ブランダム

FBIが警告：ATM「ジャックポッティング」攻撃が増加、ハッカーは数百万ドルの現金を盗む

Germと呼ばれるスタートアップが、Blueskyアプリから直接起動する初のプライベートメッセンジャーに

Meta自身の研究が発見：親の監督は十代の若者の強迫的なソーシャルメディア使用を抑えるのにあまり役立たない

原文を表示

The challenge of wrangling a deep learning model is often understanding why it does what it does: Whether it’s xAI’s repeated struggle sessions to fine-tune Grok’s odd politics, ChatGPT’s struggles with sycophancy, or run-of-the-mill hallucinations, plumbing through a neural network with billions of parameters isn’t easy.

Guide Labs, a San Francisco start-up founded by CEO Julius Adebayo and chief science officer Aya Abdelsalam Ismail, is offering an answer to that problem today. On Monday, the company open-sourced an 8 billion parameter LLM, Steerling-8B, trained with a new architecture designed to make its actions easily interpretable: Every token produced by the model can be traced back to its origins in the LLM’s training data.

That can as a simple as determining the reference materials for facts cited by the model, or as complex as understanding the model’s understanding of humor or gender.

“If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I’ve encoded, and then you have to be able to reliably turn that on, turn them off,” Adebayo told TechCrunch. “You can do it with current models, but it’s very fragile … It’s sort of one of the holy grail questions.”

Adebayo began this work while earning his PhD at MIT, co-authoring a widely cited 2020 paper that showed existing methods of understanding deep learning models were not reliable. That work ultimately led to the creation of a new way of building LLMs: Developers insert a concept layer in the model that buckets data into traceable categories. This requires more up front data annotation, but by using other AI models to help, they were able to train this model as their largest proof of concept yet.

“The kind of interpretability people do is…neuroscience on a model, and we flip that,” Adebayo said. “What we do is actually engineer the model from the ground up so that you don’t need to do neuroscience.”

Image Credits:Guide Labs

One concern with this approach is that it might eliminate some of the emergent behaviors that make LLMs so intriguing: Their ability to generalize in new ways about things they haven’t been trained on yet. Adebayo says that still happens in his company’s model: His team tracks what they call “discovered concepts” that the model discovered on its own, like quantum computing.

Techcrunch event Save up to $300 or 30% to TechCrunch Founder Summit

1,000+ founders and investors come together at TechCrunch Founder Summit 2026 for a full day focused on growth, execution, and real-world scaling. Learn from founders and investors who have shaped the industry. Connect with peers navigating similar growth stages. Walk away with tactics you can apply immediately. Offer ends March 13.

Save up to $300 or 30% to TechCrunch Founder Summit

Adebayo argues this interpretable architecture will be something everyone needs. For consumer-facing LLMs, these techniques should allow model builders to do things like block the use of copyrighted materials, or better control outputs around subjects like violence or drug abuse. Regulated industries will require more controllable LLMs, for example in finance, where a model evaluating loan applicants needs to consider things like financial records but not race. There’s also a need for interpretability in scientific work, another area where Guide Labs has developed technology. Protein folding has been a big success of deep learning models, but scientists need more insight into why their software figured out successful combinations.

“This model demonstrates is that training interpretable models is no longer a sort of science; it’s now an engineering problem,” Adebayo said. “We figured out the science and we can scale them, and there is no reason why this kind of wouldn’t match the performance of the frontier level models,” which have many more parameters.

Guide Labs says that Steerling-8B can achieved 90% of the capability of existing models, but uses less training data, thanks to its novel architecture. The next step for the company, which emerged from Y Combinator and raised a $9 million seed round from Initialized Capital in November 2024, is to build a larger model and begin offering API and agentic access to users.

“The way we’re current training models is super primitive, and so democratizing inherent interpretability is actually going to be a long term good thing for our our within the human race,” Adebayo told TechCrunch. “As we’re going after these models that are going to be super intelligent, you don’t want something to be making decisions on your behalf that’s sort of mysterious to you.”

Tim Fernholz Tim Fernholz is a journalist who writes about technology, finance and public policy. He has closely covered the rise of the private space industry and is the author of Rocket Billionaires: Elon Musk, Jeff Bezos and the New Space Race. Formerly, he was a senior reporter at Quartz, the global business news site, for more than a decade, and began his career as a political reporter in Washington, D.C. You can contact or verify outreach from Tim by emailing tim.fernholz@techcrunch.com or via an encrypted message to tim_fernholz.21 on Signal. View Bio October 13-15 San Francisco, CA Save up to $680 on your pass before February 27. Meet investors. Discover your next portfolio company. Hear from 250+ tech leaders, dive into 200+ sessions, and explore 300+ startups building what’s next. Don’t miss these one-time savings.

The 9,000-pound monster I don’t want to give back

Sam Altman would like to remind you that humans use a lot of energy, too

Google VP warns that two types of AI startups may not survive

Great news for xAI: Grok is now pretty good at answering questions about Baldur’s Gate

Russell Brandom

FBI says ATM ‘jackpotting’ attacks are on the rise, and netting hackers millions in stolen cash

A startup called Germ becomes the first private messenger that launches directly from Bluesky’s app

Meta’s own research found parental supervision doesn’t really help curb teens’ compulsive social media use

この記事をシェア

Simon Willison Blog2026年7月5日 10:00

sqlite-utils 4.0rc2、主にClaude Fable（約149.25ドル分）が執筆

TechCrunch AI2026年7月5日 00:51

ミストラル AI とは？OpenAI の競合企業に関する全知識

MarkTechPost重要度52026年7月4日 07:20

Mistral AI、Apache-2.0ライセンスのLean 4用コードエージェント「Leanstral 1.5」を公開しPutnamBenchで672問中587問を解決

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

TechCrunch AI·2026年2月24日 02:53·約7分

Guide Labs、解釈可能な新種のLLMを発表

#LLM #解釈可能性 #オープンソース #AI安全性

TL;DR

Guide Labsは、行動が容易に解釈可能な新アーキテクチャで訓練された80億パラメータのLLM「Steerling-8B」をオープンソース化しました。

AI深層分析2026年2月24日 03:40

重要/ 5段階

キーポイント

Guide Labsが解釈可能性を設計段階から組み込んだ新アーキテクチャのLLM「Steerling-8B」をオープンソース公開

モデルの出力トークンをトレーニングデータの起源まで遡って追跡可能にする「コンセプトレイヤー」を導入

従来のモデル解釈手法（モデルの神経科学）ではなく、設計段階から解釈可能性をエンジニアリングするアプローチ

量子計算などの「発見された概念」も追跡可能で、創発的挙動を維持しつつ透明性を確保

影響分析・編集コメントを表示

影響分析

編集コメント

画像クレジット: Guide Labs

Techcrunchイベント TechCrunch Founder Summit で最大300ドルまたは30%オフ

TechCrunch Founder Summit で最大300ドルまたは30%オフ

返したくない9,000ポンドの怪物

サム・アルトマンが思い出させたいこと：人間も大量のエネルギーを使っている

Google副社長が警告：2種類のAIスタートアップは生き残れないかもしれない

xAIにとって朗報：Grokは『Baldur's Gate』に関する質問に今やかなり上手く答えられる

ラッセル・ブランダム

FBIが警告：ATM「ジャックポッティング」攻撃が増加、ハッカーは数百万ドルの現金を盗む

Germと呼ばれるスタートアップが、Blueskyアプリから直接起動する初のプライベートメッセンジャーに

Meta自身の研究が発見：親の監督は十代の若者の強迫的なソーシャルメディア使用を抑えるのにあまり役立たない

原文を表示

That can as a simple as determining the reference materials for facts cited by the model, or as complex as understanding the model’s understanding of humor or gender.

Image Credits:Guide Labs

Techcrunch event Save up to $300 or 30% to TechCrunch Founder Summit

Save up to $300 or 30% to TechCrunch Founder Summit

The 9,000-pound monster I don’t want to give back

Sam Altman would like to remind you that humans use a lot of energy, too

Google VP warns that two types of AI startups may not survive

Great news for xAI: Grok is now pretty good at answering questions about Baldur’s Gate

Russell Brandom

FBI says ATM ‘jackpotting’ attacks are on the rise, and netting hackers millions in stolen cash

A startup called Germ becomes the first private messenger that launches directly from Bluesky’s app

Meta’s own research found parental supervision doesn’t really help curb teens’ compulsive social media use

この記事をシェア

Simon Willison Blog2026年7月5日 10:00

sqlite-utils 4.0rc2、主にClaude Fable（約149.25ドル分）が執筆

TechCrunch AI2026年7月5日 00:51

ミストラル AI とは？OpenAI の競合企業に関する全知識

MarkTechPost重要度52026年7月4日 07:20

Mistral AI、Apache-2.0ライセンスのLean 4用コードエージェント「Leanstral 1.5」を公開しPutnamBenchで672問中587問を解決

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Guide Labs、解釈可能な新種のLLMを発表

キーポイント

影響分析

編集コメント

関連記事

Guide Labs、解釈可能な新種のLLMを発表

キーポイント

影響分析

編集コメント

関連記事