TechCrunch AI·2026年6月18日 00:00·約7分で読める

ロボット学習データの収集は汚く地味な作業だ。一部の AI ラボはすでに XDOF にその業務を委託している

#ロボット工学 #データ収集 #XDOF #AI インフラ

TL;DR

AI ラボがロボット学習データの収集という地味で困難な業務を専門企業 XDOF に委託し、対価を支払う事例が明らかになった。

AI深層分析2026年6月17日 16:08

注目/ 5段階

深度40%

キーポイント

データ収集の専門化と外注化

ロボット訓練に必要なデータ収集は「汚く地味な作業」とされ、一部の AI ラボがこの業務を XDOF という専門企業に委託している。

ボトルネック解消への動き

AI 開発における高品質データの不足という課題に対し、専用企業によるデータ収集サービスの市場形成が進んでいる。

業界構造の変化の兆し

大規模 AI ラボがインフラ構築やデータ収集といった下流工程を外部に委ねることで、開発リソースの集中化が加速する可能性を示唆している。

影響分析・編集コメントを表示

影響分析

この記事は、ロボット工学や汎用 AI の発展において、アルゴリズムの開発だけでなく、高品質なトレーニングデータの確保が重要なボトルネックとなっていることを示しています。専門企業への外注が進むことで、データ収集の効率化とコスト削減が可能になり、AI ラボはよりコアとなる研究開発にリソースを集中できるようになるでしょう。

編集コメント

AI 開発の裏側で、データ収集という地味だが不可欠な工程が専門化されつつある事実が浮き彫りになりました。これは業界全体の成熟度を示す重要なシグナルです。

2 週間前、OpenAI は 2021 年に閉鎖したロボティクスプログラムを再始動すると発表しました。これは、最大の AI ラボが機械に物理世界で動作することを教えるために競い合っているという最新の兆候です。しかし、能力のあるロボットを構築するには、言語モデルで使用されているデータに匹敵するトレーニングデータが必要であり、AI 業界にはまだそれがありません。

このギャップは、新しい種類のインフラビジネスを生み出しています。広大な公開テキストの海で訓練された大規模言語モデル（LLM）とは異なり、ロボットは物理的な相互作用を捉えたデータを必要としますが、そのようなデータはほとんど存在しません。YouTube の動画やギグワーカーが撮影した映像は解像度が低く、物理世界との整合性を取ることも困難です。

本日、ステルスモードから脱却した XDOF（発音："ecks-doff"）は、AI における次の大きなボトルネックはモデルやチップではなく、ロボットに物理世界での相互作用を教えるために必要なデータフィードバックループであると賭けています。

このスタートアップは、最先端ラボやロボティクス企業が容易に自社で構築できないデータパイプライン、収集ツール、注釈システムを構築することを目指しており、そのために Thrive Capital、Spark Capital、a16z、Lux、WndrCo から 7 千万ドルの資金調達を行いました。共同創業者兼 CEO の Philipp Wu によると、XDOF は約 60 人の従業員を抱えており、すでに 20 の顧客と協力していますが、その中にはいくつかの最先端 AI ラボも含まれており、名前は明かせないとしています。

「トップのラボはみなロボット技術の追求に注力しています」とウー氏は語った。「言語モデル競争で少し遅れをとることによる弊害をすでに目撃している……この技術を遅すぎた時期に追及し、皆が同じ状況に陥るような事態にはなりたくない。物理的な AI が次のフロンティアとなるのだ。」

ウー氏自身も、カリフォルニア大学バークレー校の博士課程在籍中にこの問題に直面した。彼の焦点は、ロボットが大規模データセットからスキルを学習できるようにすることだった。しかし、ただ一つの課題があった。

「大規模なデータを扱うことができませんでした」と彼は TechCrunch に語った。「鶏と卵のような問題がありました——まず実際にデータを収集しなければ、ロボティクス用のファウンデーションモデルの訓練方法を問うことさえできなかったのです。」

ウー氏と後の XDOF 共同創設者兼 CTO のフレッド・シェン氏は、GELLO と呼ばれるプロジェクトに取り組んだ。これは低コストの遠隔操作システムで、人間オペレーターがロボットアームを制御して訓練データを生成できるものだ。「この研究はロボティクス分野で非常に影響力のある論文となりました。多くの人が同様のニーズとボトルネックを抱えており、データ収集のためにこの種のデバイスを活用し始めたからです」とウー氏は述べた。

機会を見極めたウー、シェントゥ、そして第3の共同創設者兼最高経営責任者のネモ・ジンが、2024 年 10 月にロボットモデルの開発を目指す企業向けにデータエコシステムを提供する XDOF を立ち上げました。データ提供だけでは行き詰まりやすい事業となることを認識し、同社はデータクリーニング、ツール開発、アノテーションにも注力しており、ロボットトレーナーのための自己強化型のフィードバックループを構築しています。

スタート地点として、同社は UC Berkeley の AI Research ラボと提携し、これまでで最も大規模な高品質なロボットトレーニングデータの収集である「ABC」を発表します。これには 130,000 件のロボット操作データトラジェクトリ、300 時間のシミュレーション、そして 100 時間の評価が含まれています。このようにスケールアップされた事前学習用データが学術界に提供されるのはかつてありませんでした。

「言語、画像生成、および他の分野において、モデルとデータが公開されると、コミュニティは必ずしも予想していなかった成果を達成することが見てきました」と、発表の組織化を手伝った Berkeley の博士課程学生であるデイビッド・マカリスター氏は TechCrunch に語りました。

チームはこのデータを既に活用し、T シャツの折りたたみや段ボール箱の平坦化、AirPods をケースに収めるなどのベンチマークタスクにおいてロボットを訓練しています。

無制限の自由度

同社はデータピラミッドの3つの階層にわたって事業を展開する計画です。最も価値が高いのは、実際に展開されるロボット上で収集されたテレオペレーションデータであり、次いで GELLO のような事例で一般化されたデータを収集するテレオペレーションロボットが続き、最後に XDOF が独自のウェアラブルセンサーを構築して収集する予定の「自己中心（egocentric）」データとして、人間が日常的なタスクを実行する際に得られるデータがあります。

「カメラの選択はデータの質に影響し、それがハンドトラッキングアルゴリズムのパフォーマンスに直結します」と Wu は述べています。「最初からハードウェアを適切に設計しなければ、収集したデータには予期せぬ特定の課題が生じる可能性があります」

同社は世界中でテレオペレーターと自己中心型データ操作員の大軍を雇用・訓練する計画です。この労働集約的なモデルは明白な疑問を投げかけます：なぜ主要な AI ラボはこのデータ生産作業を自分たちで行わないのでしょうか？

「広さ数百平方フィートの倉庫に数百台のロボットが必要になります」と Wu は説明します。「これらのロボットの維持管理、物理パラメータの較正、そして操作員の適切な訓練が必要です」

これは、多くの AI ラボがアウトソーシングしたいと考えているような、集中力、資本、および運用規模を要する構築であり、まさに XDOF が賭けている市場です。

社名 XDOF は、ロボット工学用語である「自由度（degrees of freedom）」にかけてつけられた言葉です。これは、ロボットが実行できる独立した動作の数を表すものです。あなたの腕は肩から手首まで 7 つの自由度を持っています。ヒューマノイドロボット企業 Figure AI の最新ロボットは 30 もの自由度を備えています。社名の「X」には、同社の野望が込められています。「任意の自由度、無限の自由度」と、ウー氏は語ります。

*当記事内のリンクを通じてご購入いただいた場合、私たちは少額のコミッションを受け取る可能性があります。これは当社の編集独立性には影響しません。*

原文を表示

Two weeks ago, OpenAI said it would relaunch the robotics program it shuttered in 2021 — the latest signal that the biggest AI labs are racing to teach machines to operate in the physical world. But building capable robots requires something the AI industry doesn’t yet have, which is the training data to match that used for language models.

That gap is creating a new kind of infrastructure business. Unlike LLMs that were trained on a vast sea of publicly available text, robots need data that captures physical interaction, and that kind of data barely exists. YouTube videos and footage captured by gig workers are low-fidelity and hard to reconcile with the physical world.

XDOF (pronounced “ecks-doff”), emerging from stealth today, is betting that the next great bottleneck in AI isn’t models or chips, but the data feedback loop needed to teach robots how to interact with the physical world.

The startup aims to build the data pipelines, collection tools, and annotation systems that frontier labs and robotics companies can’t easily build themselves — and has raised $70 million from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo to do it. Co-founder and CEO Philipp Wu says XDOF, which has about 60 employees, is already working with 20 customers, including several frontier AI labs, but cannot name them.

“All of the top labs are trying to pursue robotics,” Wu said. “We’ve already seen some of the downfalls of falling a little bit behind in the language model race … you don’t want to be in this type of situation where you pursue this technology too late, and everyone is in this boat where physical AI is the next frontier.”

Wu ran into this problem himself as a PhD student at UC Berkeley. His focus was on enabling robots to learn skills from large-scale datasets. There was just one problem.

“We didn’t have large-scale data to work with,” he told TechCrunch. “There was this chicken-and-egg problem — we first needed to actually collect data before we could even ask how to train a foundation model for robotics.”

Wu and his future XDOF co-founder and CTO, Fred Shentu, worked on a project called GELLO, a low-cost teleoperation system that lets a human operator control a robotic arm to generate training data. “It ended up becoming a very influential paper in robotics, because a lot of people had similar needs and bottlenecks, and many started leveraging this type of device for data collection,” Wu said.

Spotting the opportunity, Wu, Shentu, and third co-founder and Chief Operating Officer Nemo Jin launched XDOF in October 2024 to provide a data ecosystem for companies pursuing robotics models. Mindful that data provision alone can be a dead-end business, the company is also focused on data cleaning, tooling, and annotation — creating a self-reinforcing feedback loop for robot trainers.

As a starting point, the company is partnering with UC Berkeley’s AI Research lab to release what it believes is the largest collection of high-quality robot training data ever assembled, dubbed ABC. It includes 130,000 trajectories of robot manipulation data, 300 hours of simulation, and 100 hours of evaluations. That kind of scaled-up pre-training data has never been available to academia before.

“We’ve seen in language, image generation, and other fields, that when models and data are released, the community achieves things that you wouldn’t necessarily have expected,” David McAllister, a Berkeley PhD student who helped organize the release, told TechCrunch.

The team has already used the data to train robots on benchmark tasks like folding T-shirts and flattening boxes, or loading AirPods into their cases.

Unlimited degrees of freedom

The company plans to work across three tiers of a data pyramid. The most valuable tier is teleoperation data collected on the actual robot being deployed; next comes teleoperated robots gathering more general data, as with GELLO; and finally “egocentric” data gathered by humans performing everyday tasks, for which XDOF plans to build its own wearable sensors.

“Your camera choice is going to affect the quality of your data — which is going to affect how your hand-tracking algorithm performs,” Wu said. “If you don’t design the hardware well from the start, the data you collect might have very specific problems that you didn’t anticipate.”

The company plans to hire and train armies of teleoperators and egocentric data operators around the world — a labor-intensive model that raises an obvious question: Why aren’t the major labs doing this data production work themselves?

“You need a warehouse of hundreds of thousands of square feet with hundreds of robots,” Wu said. “You need to maintain these robots, calibrate their physical parameters, and properly train operators.”

It’s a build-out that requires focus, capital, and operational scale that most AI labs would rather outsource — which is precisely the market XDOF is betting on.

The name XDOF is a play on the robotics term “degrees of freedom,” which describes the number of independent motions a robot can perform. Your arm, from shoulder to wrist, has seven degrees of freedom. Humanoid robotics company Figure AI’s latest robot has 30. The X in the company’s name captures its ambition: “Arbitrary degrees of freedom, unlimited degrees of freedom,” Wu says.

*When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.*

この記事をシェア

404 Media★42026年6月15日 23:53

裁判所、Meta の成人向けサイトデータ収集訴訟を却下せず Meta に訴えられ得ると判断

連邦裁判官は、Strike 3 ホールディングス（Blacked.com 等の運営会社）が Meta を相手取り、同社による成人向け動画の無断スクレイピングに対する訴訟の却下請求を退けた。これにより Meta は著作権侵害で訴えられる可能性が残された。

TechCrunch AI★42026年6月5日 00:05

シリコンバレーは家庭用ロボットに備えられているか？Hello Robot はそう考える

スタートアップ企業 Hello Robot が、家庭内でのロボット導入に向けた準備が整っているとし、その実現可能性を主張している。

404 Media★42026年6月3日 09:16

Google、Play ストア開発者のコードを AI 訓練用に購入へ

Google は Android アプリ開発者に対し、AI コーディングツールの訓練に使用するコードへのアクセス権を購入する提案を非公式に行っている。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

TechCrunch AI·2026年6月18日 00:00·約7分で読める

ロボット学習データの収集は汚く地味な作業だ。一部の AI ラボはすでに XDOF にその業務を委託している

#ロボット工学 #データ収集 #XDOF #AI インフラ

TL;DR

AI ラボがロボット学習データの収集という地味で困難な業務を専門企業 XDOF に委託し、対価を支払う事例が明らかになった。

AI深層分析2026年6月17日 16:08

注目/ 5段階

深度40%

キーポイント

データ収集の専門化と外注化

ロボット訓練に必要なデータ収集は「汚く地味な作業」とされ、一部の AI ラボがこの業務を XDOF という専門企業に委託している。

ボトルネック解消への動き

AI 開発における高品質データの不足という課題に対し、専用企業によるデータ収集サービスの市場形成が進んでいる。

業界構造の変化の兆し

大規模 AI ラボがインフラ構築やデータ収集といった下流工程を外部に委ねることで、開発リソースの集中化が加速する可能性を示唆している。

影響分析・編集コメントを表示

影響分析

編集コメント

無制限の自由度

原文を表示

Wu ran into this problem himself as a PhD student at UC Berkeley. His focus was on enabling robots to learn skills from large-scale datasets. There was just one problem.

The team has already used the data to train robots on benchmark tasks like folding T-shirts and flattening boxes, or loading AirPods into their cases.

Unlimited degrees of freedom

It’s a build-out that requires focus, capital, and operational scale that most AI labs would rather outsource — which is precisely the market XDOF is betting on.

*When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.*

この記事をシェア

404 Media★42026年6月15日 23:53

裁判所、Meta の成人向けサイトデータ収集訴訟を却下せず Meta に訴えられ得ると判断

TechCrunch AI★42026年6月5日 00:05

シリコンバレーは家庭用ロボットに備えられているか？Hello Robot はそう考える

スタートアップ企業 Hello Robot が、家庭内でのロボット導入に向けた準備が整っているとし、その実現可能性を主張している。

404 Media★42026年6月3日 09:16

Google、Play ストア開発者のコードを AI 訓練用に購入へ

Google は Android アプリ開発者に対し、AI コーディングツールの訓練に使用するコードへのアクセス権を購入する提案を非公式に行っている。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

ロボット学習データの収集は汚く地味な作業だ。一部の AI ラボはすでに XDOF にその業務を委託している

キーポイント

影響分析

編集コメント

無制限の自由度

Unlimited degrees of freedom

関連記事

ロボット学習データの収集は汚く地味な作業だ。一部の AI ラボはすでに XDOF にその業務を委託している

キーポイント

影響分析

編集コメント

無制限の自由度

Unlimited degrees of freedom

関連記事