TechCrunch AI·2026年4月17日 05:26·約9分で読める

Physical Intelligence、注目のロボティクス新興企業が、教えられていないタスクを理解できる新しいロボット脳を発表

#ロボティクスAI #汎用人工知能 #自律ロボット #Physical Intelligence #未学習タスク推論

TL;DR

ロボティクススタートアップPhysical Intelligenceが発表した新モデル「π0.7」は、教えられていないタスクも推論できる汎用ロボット脳への初期段階でありながら意味のある一歩と位置付けられている。

AI深層分析2026年4月17日 06:42

注目/ 5段階

深度40%

キーポイント

汎用ロボット脳への初期段階

新モデルπ0.7は、長年求められてきた汎用ロボット脳の実現に向けた初期段階でありながら意味のある進展とされている。

未学習タスクの推論能力

このモデルは、教えられていないタスクを自ら推論できる能力を持つとされている。

スタートアップの技術的進展

注目のロボティクススタートアップPhysical Intelligenceによる技術発表であり、同社の研究開発の方向性を示している。

影響分析・編集コメントを表示

影響分析

この発表は、ロボットの自律性と適応性を高める汎用人工知能の実現に向けた重要なマイルストーンを示している。実用化にはまだ課題があるものの、ロボティクスとAIの融合分野における技術的進展の方向性を明確にしている。

編集コメント

汎用ロボット脳という野心的な目標への具体的な一歩として注目されるが、実証データや詳細な技術内容が記事では示されておらず、今後の実証と進展が期待される。

2年前に設立されたサンフランシスコ拠点のロボットスタートアップ、Physical Intelligenceは、ベイエリアで最も注目を集めるAI企業の一つとして静かに台頭してきたが、木曜日に発表した新しい研究により、最新のモデルが明示的に訓練されていないタスクをロボットに実行させることができることが示された。同社の研究者自身も、この能力が自分たちを驚かせたと語っている。

新モデル「π0.7」は、同社が長年追求してきた汎用ロボットブレイン（general-purpose robot brain）の目標に向けた、初期ではあるが意味のある一歩を表している。これは、見知らぬタスクを指示され、平易な言葉で指導を受け、実際にそれを遂行できるブレインのことだ。もしこの知見が検証に耐えれば、ロボティクスAIは言語モデルの分野で目撃されたものと同様の転換点（inflection point）に近づいている可能性を示唆する。つまり、基礎となるデータが予測する以上に、能力が増幅し始める段階だ。

まず前提として：論文の核心となる主張は、構成的一般化（compositional generalization）である。これは、異なる文脈で学習したスキルを組み合わせて、モデルが未体験の問題を解決する能力のことだ。これまでロボットの訓練における標準的なアプローチは、実質的に暗記に過ぎなかった。特定のタスクのデータを収集し、そのデータで専門モデルを訓練し、新しいタスクごとにそれを繰り返すという手法だ。Physical Intelligenceによれば、π0.7はこのパターンを打破する。

「データ収集のために用意したタスクを正確にこなすだけから、実際に新しい方法でそれらを組み合わせてリミックスする段階へと移行する閾値を超えると、」Physical Intelligenceの共同創設者であり、ロボティクス向けAIを専門とするUC Berkeleyの教授であるSergey Levineは語る、「能力はデータ量に対して線形以上で向上していく。このより有利なスケーリング特性（scaling property）は、言語や画像認識といった他の分野でも目にしてきたものだ。

論文で最も印象的な実証は、訓練データでほぼ一度も目にしたことがなかったエアーフライヤー（air fryer）を扱ったものだ。研究チームが調査したところ、訓練データセット全体で関連するエピソードはわずか2つしか見つからなかった。1つは別のロボットが単にエアーフライヤーの蓋を閉めたもので、もう1つはオープンソースデータセットからのもので、別のロボットが誰かの指示に従ってプラスチックボトルを1台の中に置いたものだった。モデルはこれらの断片と、より広範なウェブベースの事前訓練データ（pretraining data）を何らかの形で統合し、家電の動作原理に関する機能的な理解へと結びつけた。

「知識がどこから来ているのか、あるいはどこで成功しどこで失敗するのかを追跡するのは非常に難しい」と、Physical Intelligenceの研究者でありStanfordのコンピュータサイエンス博士課程学生であるLucy Shiは話す。それでも、一切の指導なしで、モデルはサツマイモを調理する目的でこの家電を使う試みにおいて、まずまずの出来を示した。ステップバイステップの音声指示（verbal instructions）があれば――要するに、新人従業員に何かを説明するかのように人間がロボットにタスクの手順を踏ませる場合――モデルは成功して作業を遂行した。

この指導機能（coaching capability）が重要なのは、追加のデータ収集やモデル再学習なしに、ロボットを新しい環境に展開し、リアルタイムで改善できる可能性を示唆しているためです。

では、これが何を意味するのか？研究者たちはモデルの限界を隠すことなく、楽観しすぎないよう細心の注意を払っている。少なくとも1つのケースでは、彼らは自らのチームのせいにしている。

「失敗の原因がロボットやモデルにあるとは限らない」とShiは語る。「我々自身にある。プロンプトエンジニアリング（prompt engineering）が上手くないという点だ。」彼女は初期のエアフライヤー実験について、成功率が5%だったと説明する。タスクの説明方法をモデルに伝える方法を約30分かけて磨き上げた後、成功率は95%に跳ね上がったと彼女は言う。

画像クレジット：Physical Intelligence

また、このモデルは単一の上位レベルの指示から複雑なマルチステップタスクを自律的に実行する能力もまだ備えていない。「『トーストを作ってきて』といった高レベルの指示をそのまま下すことはできない」とLevineは言う。「ただし、手順を一つずつ示す——『トースターの場合、この部分を開けて、あのボタンを押して、これを実行する』——と指示すれば、実際にはかなりよく機能することが多い。」

チームはまた、ロボティクスにおける標準化されたベンチマーク（benchmarks）が実際には存在せず、これが主張の外部検証を困難にしていることも認めている。その代わりに同社は、π0.7を以前の専門モデル——個別のタスク用に訓練された専用システム——と比較して評価し、コーヒー作成、洗濯物の折りたたみ、箱組み立てを含む一連の複雑な作業において、汎用モデルがそれらのパフォーマンスに匹敵することを見出した。

研究者たちの言葉を信じるなら、この研究で最も注目すべき点は、単一のデモ（demo）ではなく、結果が彼らをどれほど驚かせたかという点である。彼らの仕事はトレーニングデータに何が含まれているかを正確に把握し、したがってモデルが何ができ、何をすべきでないかを知ることだ。

「私の経験では、データの内容を深く把握していれば、モデルが何ができるかをおおよそ推測できるものだ」とPhysical Intelligenceの研究科学者であるAshwin Balakrishnaは語る。「私はめったに驚かない。しかし、過去数ヶ月は本当に驚いた初めての時期だ。ただランダムにギアセットを購入し、ロボットに『これ回転できる？』と尋ねただけで、それがそのまま機能した。」

Levineは、研究者が初めてGPT-2がアンデス山脈のユニコーンに関するストーリーを生成しているのを目撃した瞬間を思い出した。「ペルーのユニコーンについて、いったいどこで学んだのだ？」と彼は言う。「そんな奇妙な組み合わせだ。そして、ロボティクスにおいてそれを目にできるのは本当に特別だと思う。」

当然、批評家はここで不快な非対称性を指摘するだろう。言語モデルは学習のためにインターネット全体を利用できた。ロボットはそうではなく、いくら巧妙なプロンプトを使ってもそのギャップを完全に埋めることはできない。しかし、どこに懐疑性が向けられると予想するか問われると、Levineは全く別の場所を指差す。

「ロボット一般化（ジェネラリゼーション）のデモに対して常に持ち出される批判は、タスクが退屈だということです」と彼は言う。「ロボットがバック宙を跳んでいるわけではありません。」彼はそんな枠組みに反論し、印象的なロボットのデモと実際に一般化する（ジェネラル化する）ロボットシステムの間の違いこそが本質だと主張する。彼によれば、一般化（ジェネラリゼーション）は慎重に振付されたスタントよりも常に派手さには欠けるが――しかし、はるかに有用である。

論文自体は全体を通して慎重な表現を用いており、π0.7 が一般化（ジェネラリゼーション）の「初期兆候」や新機能の「初期デモンストレーション」を示していると記している。これらは研究成果であり、実装された製品ではない。

これらの知見に基づくシステムが実世界での展開（リアルワールド・デプロイメント）にいつ準備できるかという直接的な質問に対し、Levine は推測を避ける。「楽観する十分な理由があると思いますし、確かにここ数年で私が2年前に予想していたよりもはるかに速く進んでいます」と彼は言う。「しかし、その質問に答えるのは非常に難しいです。」

Physical Intelligence はこれまでに10億ドル以上を調達しており、直近の評価額は56億ドルとなっている。同社への投資家熱狂の大きな部分は、Lachy Groom 共同創設者に起因する。彼はシリコンバレーで最も評価の高いエンジェル投資家（エンジェル・インベスターズ）の一人として長年活動し、Figma、Notion、Ramp などに投資してきたが、最終的に Physical Intelligence が自分が探していた企業だと判断した。この経歴は、同スタートアップが投資家に対して商業化のタイムライン（コマーシャライゼーション・タイムライン）を提供することを拒否しているにもかかわらず、本格的な機関投資家の資金を集めるのに役立っている。

同社は現在、その評価額を約2倍の110億ドルとする新ラウンド（ニュー・ラウンド）の資金調達について協議中であると報じられている。チームはコメントを控えた。

原文を表示

Physical Intelligence, the two-year-old, San Francisco-based robotics startup that has quietly become one of the most closely watched AI companies in the Bay Area, published new research Thursday showing that its latest model can direct robots to perform tasks they were never explicitly trained on — a capability the company’s own researchers say caught them off guard.

The new model, called π0.7, represents what the company describes as an early but meaningful step toward the long-sought goal of a general-purpose robot brain: one that can be pointed at an unfamiliar task, coached through it in plain language, and actually pull it off. If the findings hold up to scrutiny, they suggest that robotic AI may be approaching an inflection point similar to what the field saw with large language models — where capabilities begin compounding in ways that outpace what the underlying data would seem to predict.

But first: The core claim in the paper is compositional generalization — the ability to combine skills learned in different contexts to solve problems the model has never encountered. Until now, the standard approach to robot training has been essentially rote memorization — collect data on a specific task, train a specialist model on that data, then repeat for every new task. π0.7, Physical Intelligence says, breaks that pattern.

“Once it crosses that threshold where it goes from only doing exactly the stuff that you collect the data for to actually remixing things in new ways,” says Sergey Levine, a co-founder of Physical Intelligence and a UC Berkeley professor focused on AI for robotics, “the capabilities are going up more than linearly with the amount of data. That much more favorable scaling property is something we’ve seen in other domains, like language and vision.”

The paper’s most striking demonstration involves an air fryer the model had essentially never seen in training. When the research team investigated, they found only two relevant episodes in the entire training dataset: one where a different robot merely pushed the air fryer closed, and one from an open source dataset where yet another robot placed a plastic bottle inside one on someone’s instructions. The model had somehow synthesized those fragments, plus broader web-based pretraining data, into a functional understanding of how the appliance works.

“It’s very hard to track down where the knowledge is coming from, or where it will succeed or fail,” says Lucy Shi, a Physical Intelligence researcher and Stanford computer science Ph.D. student. Still, with zero coaching, the model made a passable attempt at using the appliance to cook a sweet potato. With step-by-step verbal instructions — essentially, a human walking the robot through the task the way you might explain something to a new employee — it performed successfully.

That coaching capability matters because it suggests robots could be deployed in new environments and improved in real time without additional data collection or model retraining.

So what does it all mean? The researchers aren’t shy about the model’s limitations and are careful not to get ahead of themselves. In at least one case, they point the finger squarely at their own team.

“Sometimes the failure mode is not on the robot or on the model,” says Shi. “It’s on us. Not being good at prompt engineering.” She describes an early air fryer experiment that produced a 5% success rate. After spending about half an hour refining how the task was explained to the model, it jumped to 95%, she says.

Image Credits:Physical Intelligence

The model also isn’t yet capable of executing complex multi-step tasks autonomously from a single high-level command. “You can’t tell it, ‘Hey, go make me some toast’,” Levine says. “But if you walk it through — ‘for the toaster, open this part, push that button, do this’ — then it actually tends to work pretty well.”

The team also acknowledged that standardized benchmarks for robotics don’t really exist, which makes external validation of their claims difficult. Instead, the company measured π0.7 against its own previous specialist models — purpose-built systems trained on individual tasks — and found that the generalist model matched their performance across a range of complex work, including making coffee, folding laundry, and assembling boxes.

What may be most notable about the research — if you take the researchers at their word — is not any single demo but the degree to which the results surprised them, people whose job it is to know exactly what is in the training data and therefore what the model should and shouldn’t be able to do.

“My experience has always been that when I deeply know what’s in the data, I can kind of just guess what the model will be able to do,” says Ashwin Balakrishna, a research scientist at Physical Intelligence. “I’m rarely surprised. But the last few months have been the first time where I’m genuinely surprised. I just bought a gear set randomly and asked the robot, ‘Hey, can you rotate this gear?’ And it just worked.”

Levine recalled the moment researchers first encountered GPT-2 generating a story about unicorns in the Andes. “Where the heck did it learn about unicorns in Peru?” he says. “That’s such a weird combination. And I think that seeing that in robotics is really special.”

Naturally, critics will point to an uncomfortable asymmetry here: Language models had the entire internet to learn from. Robots don’t, and no amount of clever prompting fully closes that gap. But when asked where he expects the skepticism, Levine points somewhere else entirely.

“The criticism that can always be leveled at any robotic generalization demo is that the tasks are kind of boring,” he says. “The robot is not doing a backflip.” He pushes back on that framing, arguing that the distinction between an impressive robot demo and a robotic system that actually generalizes is precisely the point. Generalization, he suggests, will always look less dramatic than a carefully choreographed stunt — but it is considerably more useful.

The paper itself uses careful hedging language throughout, describing π0.7 as showing “early signs” of generalization and “initial demonstrations” of new capabilities. These are research results, not a deployed product.

When asked directly when a system based on these findings might be ready for real-world deployment, Levine declines to speculate. “I think there’s good reason to be optimistic, and certainly it’s progressing faster than I expected a couple of years ago,” he says. “But it’s very hard for me to answer that question.”

Physical Intelligence has raised over $1 billion to date and was most recently valued at $5.6 billion. A significant part of the investor enthusiasm around the company traces to Lachy Groom, a co-founder who spent years as one of Silicon Valley’s most well-regarded angel investors — backing Figma, Notion, and Ramp, among others — before deciding that Physical Intelligence was the company he’d been looking for. That pedigree has helped the startup attract serious institutional money even as it has refused to offer investors a commercialization timeline.

The company is now said to be in discussions for a new round that would nearly double that valuation figure to $11 billion. The team declined to comment.

この記事をシェア

The Decoder★32026年4月17日 20:57

Physical Intelligence、LLMのような汎化能力を持つロボットモデルを公開（欠点も含む）

米国スタートアップPhysical Intelligenceが、言語モデルがテキスト断片を再構成するように訓練済みスキルを組み合わせるロボット基盤モデル「π0.7」を発表した。研究者はこれをロボット工学における「構成的汎化」の初期兆候と位置付けている。

The Decoder★42026年4月18日 03:59

Google DeepmindのGemini Robotics-ER 1.6がロボットの計画・知覚能力を向上

Google Deepmindが、ロボットの計画・知覚能力を強化するGemini Robotics-ER 1.6を発表した。計測機器の読み取り能力も向上している。

Google DeepMind★42026年4月14日 00:52

Gemini Robotics-ER 1.6：強化された身体化推論による実世界ロボットタスクの実現

GoogleのGemini Robotics ER 1.6は、空間推論と多視点理解を強化し、自律ロボットの実世界タスク遂行能力を向上させる技術を発表した。

ニュース一覧に戻る元記事を読む