Hugging Face Blog·2026年3月17日 06:58·約6分

ヘルスケアロボティクスにおける物理的AIの台頭

#Physical AI #ロボティクス #医療AI #オープンデータセット #Vision-Language-Action #手術ロボティクス

TL;DR

Hugging Face Blogは、35組織が協力して構築した医療ロボティクス向け初の大規模オープンデータセット「Open-H-Embodiment」と、それに基づく手術ロボティクス向け初のポリシーモデル「GR00T-H」を発表し、医療AIの焦点を認識から物理的動作へと移行させるPhysical AIの基盤を整備した。

AI深層分析2026年3月17日 07:41

重要/ 5段階

深度40%

キーポイント

医療ロボティクス向け初の大規模オープンデータセット「Open-H-Embodiment」の構築

35の組織が協力し、シミュレーション、ベンチトップ演習、実際の臨床手順を含む778時間の医療ロボティクス訓練データをCC-BY-4.0ライセンスで公開した。

Physical AIの必要性と従来の認識ベースAIの限界

医療では「認識」だけでなく「実行」が重要であり、従来の静的な認識のみのデータセットでは、具体化、接触力学、閉ループ制御が欠如しているため不十分であると指摘している。

手術ロボティクス向け初のポリシーモデル「GR00T-H」の開発

Open-H-Embodimentデータセットの約600時間のデータで訓練された、手術ロボティクスタスク向けの初のVision-Language-Action（VLA）ポリシーモデルである。

標準化された基盤の整備による研究開発の加速

標準化されたロボットボディ、同期された視覚・力・運動学データ、シミュレーションと実世界のペアリング、クロス具体化ベンチマークを提供し、Physical AIの基礎構築を目指している。

GR00T-Hの設計選択

手術ロボティクスの高精度要件に対応するため、固有の身体性プロジェクター、状態ドロップアウト、相対的エンドエフェクター動作、メタデータ付きタスクプロンプトの4つの設計選択を採用している。

Cosmos-H-Surgical-Simulatorの能力

従来のシミュレーターが苦手とする現実世界の複雑さを克服し、物理的に妥当な手術ビデオを生成し、シミュレーションと実世界のギャップを埋める効率的なツールとして機能する。

次世代手術ロボティクスの方向性

知覚制御を超えた推論可能な自律性を目指し、システムが長い手順全体で説明、計画、適応を行えるようにすることを目標としている。

影響分析・編集コメントを表示

影響分析

この取り組みは、医療ロボティクス分野におけるAI研究の重要なインフラを構築し、従来の認識中心のAIから、実際の物理世界で動作する「Physical AI」へのパラダイムシフトを促進する。オープンなデータセットとモデルにより、研究の民主化と開発スピードの向上が期待され、手術の自動化や医療アクセスの改善など、実社会への大きな影響が予想される。

編集コメント

医療AIの重要な転換点を示すニュース。オープンな基盤整備により、業界全体の進歩を牽引する可能性が高い。実用化への道筋が具体的に見える点が評価できる。

記事に戻るヘルスケアロボティクスのためのPhysical AIの台頭

Upvote 1

Open-H-Embodimentの紹介: コミュニティ協業により構築された、初のヘルスケアロボティクス向けオープンデータセット

著者: Nigel Nelson, Lukas Zbinden, Mostafa Toloui, Sean Huver

従来のヘルスケアAIは主に知覚ベースであり、信号を解釈して病理・解剖構造を分類または領域分割するモデルが中心でした。しかし、ヘルスケアには「行為」が伴うため、身体性、接触力学、閉ループ制御を欠いた、静的な知覚専用の従来データセットでは不十分です。Physical AIの基盤を構築するには、標準化されたロボットボディ、視覚・力・運動学の同期データ、シミュレーションと実機の対応付け、クロスエンボディメントのベンチマークが必要です。

Open-H-Embodiment

Open-H-Embodimentは、手術ロボティクスと超音波検査におけるAI自律化とワールドファウンデーションモデルの学習・評価に必要な、オープンで共有可能な基盤を構築するコミュニティ主導のデータセット・イニシアチブです。Axel Krieger教授（ジョンズ・ホプキンス大学）、Nassir Navab教授（ミュンヘン工科大学）、Mahdi Azizian博士（NVIDIA）らによる運営委員会によって始められ、現在35の組織が参加しています。

世界中の参加者が集結し、ヘルスケアロボティクスにおけるPhysical AIの発展を推進する初の大規模データセットを構築しました。

Open-H-Embodiment サンプルデータ

Balgrist, CMR Surgical, 香港中文大学, Great Bay University, 香港浸会大学, Hamlyn, ImFusion, ジョンズ・ホプキンス大学, リーズ大学, Mohamed bin Zayed University of Artificial Intelligence, Moon Surgical, NVIDIA, Northwell Health, Obuda University, 香港理工大学, 山東大学斉魯病院, Rob Surgical, Sanoscience, Surgical Data Science Collective, Semaphor Surgical, スタンフォード大学, ドレスデン工科大学, ミュンヘン工科大学, Tuodao, Turin, ブリティッシュコロンビア大学, カリフォルニア大学バークレー校, カリフォルニア大学サンディエゴ校, イリノイ大学シカゴ校, テネシー大学, テキサス大学, Vanderbilt, Virtual Incision.

CC-BY-4.0ライセンスのヘルスケアロボティクス学習データ778時間を含み、主に手術ロボティクスですが、超音波および大腸内視鏡自律化データも含みます。

シミュレーション、ベンチトップ演習（縫合など）、実際の臨床手技を網羅しています。

商用ロボット（CMR Surgical, Rob Surgical, Tuodao）と研究用ロボット（dVRK, Franka, Kuka）を使用しています。

このデータで追加学習された2つの新規の寛容なオープンソースモデルとともに公開されています。

GR00T-H: 手術ロボティクス向けVision-Language-Actionモデル

1つ目はGR00T-Hで、Isaac GR00T NシリーズのVision-Language-Actionモデルの派生モデルです。約600時間のOpen-H-Embodimentデータで学習され、手術ロボティクスタスク向けの初のポリシーモデルです。

NVIDIAのオープンソースエコシステムを基盤として、Isaac GR00T-HはCosmos Reason 2 2BをVision-Language Modelバックボーンとして活用しています。

アーキテクチャ設計の選択

手術ロボティクスは高精度を要求しますが、ケーブル駆動システムなどの特殊なハードウェアは模倣学習を困難にします。これに対処するため、GR00T-Hは4つの主要な設計選択を採用しています:

固有エンボディメントプロジェクタ: 各ロボット特有の運動学を、共有された正規化された行動空間にマッピングする、固有の学習可能なMLP。

状態ドロップアウト（100%）: 推論時に固有感覚入力を完全に除外し、各システムごとに学習されたバイアス項を生成することで、実世界での性能を向上させます。

相対エンドエフェクタ動作: 運動学的な不一致を克服するため、共通の相対エンドエフェクタ動作空間を使用して学習します。

タスクプロンプトへのメタデータ注入: 器具名と制御インデックスのマッピングを、VLMのタスクプロンプトに直接組み込みます。

GR00T-Hのプロトタイプは、SutureBotベンチマークにおいて完全なエンドツーエンドの縫合を実行する能力を示し、堅牢な長期器用動作を実証しています。

GR00T-Hによるエンドツーエンド縫合の実行

Cosmos-H-Surgical-Simulator

Cosmos-H-Surgical-Simulatorは、行動条件付き手術ロボティクスのためのワールドファウンデーションモデルです。従来のシミュレータは、軟組織、反射、血液、煙といった実世界の複雑さに対処できません。

主な機能

シミュレーションと実世界のギャップの克服: NVIDIA Cosmos Predict 2.5 2Bからファインチューニングされ、運動学的動作から直接、物理的に妥当な手術ビデオを生成します。

効率性の向上: 600ロールアウトの生成に、実世界のベンチトップ手法では2日かかっていたところ、シミュレーションではわずか40分しかかかりません。

物理シミュレータとしてのWFM: データから組織の変形や工具との相互作用を暗黙的に学習します。

合成データ生成: 現実的な合成ビデオと動作のペアを生成し、データ量が不足しているデータセットを拡張します。

ファインチューニングの詳細

このモデルは、Open-H-Embodimentデータセット（9種類のロボットエンボディメント、32データセット）を用い、64基のA100 GPUで約10,000 GPU時間をかけてファインチューニングされました。統一された44次元の動作空間を採用しています。

次のステップ: 手術ロボティクスのための推論へ

Open-H-Embodimentイニシアチブのバージョン2の目標は、知覚的な制御を超え、推論能力を備えた自律性—手術ロボティクスにおける「ChatGPTの瞬間」—へと進化させることです。システムが長い手技の中で説明、計画、適応を行えるようにするためには、意図、結果、失敗モードを記録した注釈付きタスクトレースを含む、推論対応データへとOpen-H-Embodimentを拡張する必要があります。この取り組みにはコミュニティの参加が不可欠です。ヘルスケアロボティクスの未来を共に形作るため、Open-H GitHubリポジトリにアクセスしてご参加ください。

今すぐ始めましょう

Open-H-Embodimentデータセットとモデルを使って作業を始めるには、以下のリソースをご利用ください:

Open-H-Embodiment: HF Dataset / Github Repo

NVIDIA Isaac GR00T-Hモデル: HF Model / GR00T-H Github Repo

NVIDIA Cosmos-H-Surgical-Simulator: HF Model / Github Repo

Cosmos Cookbook: 独自のエンボディメントのためにカスタムWFMを構築するステップバイステップのワークフロー

Hugging Faceで探索: Hugging FaceとGitHubで新しいオープンなCosmosモデルとデータセットをチェックするか、build.nvidia.comでモデルをお試しください。

原文を表示

Back to Articles The Rise of Physical AI for Healthcare Robotics

Upvote 1

Introducing Open-H-Embodiment: The first healthcare robotics open dataset, built by a community collaboration

Authors: Nigel Nelson, Lukas Zbinden, Mostafa Toloui, Sean Huver

Healthcare AI has mainly been perception-based, focusing on models that interpret signals and classify or segment pathology/anatomy. However, healthcare involves "doing," making the static, perception-only datasets of the past—which lack embodiment, contact dynamics, and closed-loop control—insufficient. The field needs standardized robot bodies, synchronized vision–force–kinematics data, sim-to-real pairing, and cross-embodiment benchmarks to build the foundation for Physical AI.

Open-H-Embodiment

Open-H-Embodiment is a community‑driven dataset initiative building the open, shared foundation needed to train and evaluate AI autonomy and world foundation models for surgical robotics and ultrasound. Started by a steering committee including Prof. Axel Krieger (Johns Hopkins), Prof. Nassir Navab (Technical University of Munich), and Dr. Mahdi Azizian (NVIDIA), the effort now spans 35 organizations.

Participants from around the world came together to build the first large scale dataset to advance the cause of physical AI in healthcare robotics.

Open-H-Embodiment Sample data

Balgrist, CMR Surgical, The Chinese University of Hong Kong, Great Bay University, Hong Kong Baptist University, Hamlyn, ImFusion, Johns Hopkins University, Leeds University, Mohamed bin Zayed University of Artificial Intelligence, Moon Surgical, NVIDIA, Northwell Health, Obuda University, The Hong Kong Polytechnic University, Qilu Hospital of Shandong University, Rob Surgical, Sanoscience, Surgical Data Science Collective, Semaphor Surgical, Stanford, Dresden University of Technology, Technical University of Munich, Tuodao, Turin, University of British Columbia, UC Berkeley, UC San Diego, University of Illinois Chicago, University of Tennessee, University of Texas, Vanderbilt, and Virtual Incision.

Comprises 778 hours of CC-BY-4.0 healthcare robotics training data, largely surgical robotics, but also ultrasound and colonoscopy autonomy data.

Spans simulation, benchtop exercises (e.g., suturing), and real clinical procedures.

Uses commercial robots (CMR Surgical, Rob Surgical, Tuodao) and research robots (dVRK, Franka, Kuka).

Released alongside two new, permissively open-source models post-trained on this data.

GR00T-H: Vision Language Action Model for Surgical Robotics

First is GR00T-H, a derivative of the Isaac GR00T N series of Vision-Language-Action (VLA) models. Trained on roughly 600 hours of Open-H-Embodiment data, GR00T-H is the first policy model for surgical robotics tasks.

Building on NVIDIA’s open-source ecosystem, Isaac GR00T-H leverages Cosmos Reason 2 2B as its Vision-Language Model (VLM) backbone.

Architectural Design Choices

Surgical robotics requires high precision, but specialized hardware (like cable-driven systems) makes imitation learning (IL) difficult. To handle this, GR00T-H uses four key design choices:

Unique Embodiment Projectors: A unique, learnable MLP maps each robot's specific kinematics to a shared, normalized action space.

State Dropout (100%): Proprioceptive input is dropped during inference to create a learned bias term for each system, yielding better real-world results.

Relative EEF Actions: Training uses a common relative End-Effector (EEF) action space to overcome kinematic inconsistencies.

Metadata in Task Prompts: Instrument names and control index mapping are injected directly into the VLM task prompt.

A prototype of GR00T-H has demonstrated the ability to execute a complete, end-to-end suture in the SutureBot benchmark, highlighting robust long-horizon dexterity.

GR00T-H performing end-to-end suturing.

Cosmos-H-Surgical-Simulator

Cosmos-H-Surgical-Simulator is a World Foundation Model (WFM) for action-conditioned surgical robotics. Traditional simulators fail due to real-world complexities like soft-tissue, reflections, blood, and smoke.

Key Capabilities

Overcoming the Sim-to-Real Gap: Fine-tuned from NVIDIA Cosmos Predict 2.5 2B, it generates physically plausible surgical video directly from kinematic actions.

Efficiency Gains: For 600 rollouts, it took only 40 minutes in simulation versus 2 days using real-world benchtop methods.

WFM as a Physics Simulator: Implicitly learns tissue deformation and tool interaction from data.

Synthetic Data Generation: Generates realistic synthetic video-action pairs to augment underrepresented datasets.

Fine-Tuning Details

The model was fine-tuned on the Open-H-Embodiment dataset (9 robot embodiments, 32 datasets) using 64x A100 GPUs for approximately 10,000 GPU-hours. It utilizes a unified 44-dimensional action space.

What is Next: Towards Reasoning For Surgical Robotics

The goal for version 2 of the Open-H-Embodiment effort is to move beyond perceptual control to reasoning-capable autonomy—a surgical robotics ChatGPT moment—where systems can explain, plan, and adapt across long procedures. This requires extending Open-H-Embodiment into reasoning-ready data with annotated task traces capturing intents, outcomes, and failure modes. This effort needs community engagement, and we invite you to get involved. Visit our Open-H Github Repo to help shape the future of healthcare robotics.

Get started today

Access the following resources to start working with the Open-H-embodiment dataset and models:

Open-H-Embodiment: HF Dataset / Github Repo

NVIDIA Isaac GR00T-H model: HF Model / GR00T-H Github Repo

NVIDIA Cosmos-H-Surgical-Simulator: HF Model / Github Repo

Cosmos Cookbook: Step-by-step workflows to build your own WFM for your embodiment

Explore on Hugging Face: Check out new open Cosmos models and datasets on Hugging Face and GitHub or try models on build.nvidia.com.

この記事をシェア

Hugging Face Blog2026年7月1日 09:00

Hugging Face と Cerebras が Gemma 4 をリアルタイム音声 AI に導入

Hugging Face Blog重要度42026年7月1日 03:32

ScarfBench：エンタープライズ向け Java フレームワーク移行における AI エージェントのベンチマーク

Hugging Face Blog重要度42026年6月30日 23:39

専門化が不可避である理由

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Hugging Face Blog·2026年3月17日 06:58·約6分

ヘルスケアロボティクスにおける物理的AIの台頭

#Physical AI #ロボティクス #医療AI #オープンデータセット #Vision-Language-Action #手術ロボティクス

TL;DR

AI深層分析2026年3月17日 07:41

重要/ 5段階

深度40%

キーポイント

医療ロボティクス向け初の大規模オープンデータセット「Open-H-Embodiment」の構築

Physical AIの必要性と従来の認識ベースAIの限界

手術ロボティクス向け初のポリシーモデル「GR00T-H」の開発

Open-H-Embodimentデータセットの約600時間のデータで訓練された、手術ロボティクスタスク向けの初のVision-Language-Action（VLA）ポリシーモデルである。

標準化された基盤の整備による研究開発の加速

GR00T-Hの設計選択

Cosmos-H-Surgical-Simulatorの能力

次世代手術ロボティクスの方向性

知覚制御を超えた推論可能な自律性を目指し、システムが長い手順全体で説明、計画、適応を行えるようにすることを目標としている。

影響分析・編集コメントを表示

影響分析

編集コメント

記事に戻るヘルスケアロボティクスのためのPhysical AIの台頭

Upvote 1

Open-H-Embodimentの紹介: コミュニティ協業により構築された、初のヘルスケアロボティクス向けオープンデータセット

著者: Nigel Nelson, Lukas Zbinden, Mostafa Toloui, Sean Huver

Open-H-Embodiment

世界中の参加者が集結し、ヘルスケアロボティクスにおけるPhysical AIの発展を推進する初の大規模データセットを構築しました。

Open-H-Embodiment サンプルデータ

シミュレーション、ベンチトップ演習（縫合など）、実際の臨床手技を網羅しています。

商用ロボット（CMR Surgical, Rob Surgical, Tuodao）と研究用ロボット（dVRK, Franka, Kuka）を使用しています。

このデータで追加学習された2つの新規の寛容なオープンソースモデルとともに公開されています。

GR00T-H: 手術ロボティクス向けVision-Language-Actionモデル

NVIDIAのオープンソースエコシステムを基盤として、Isaac GR00T-HはCosmos Reason 2 2BをVision-Language Modelバックボーンとして活用しています。

アーキテクチャ設計の選択

固有エンボディメントプロジェクタ: 各ロボット特有の運動学を、共有された正規化された行動空間にマッピングする、固有の学習可能なMLP。

相対エンドエフェクタ動作: 運動学的な不一致を克服するため、共通の相対エンドエフェクタ動作空間を使用して学習します。

タスクプロンプトへのメタデータ注入: 器具名と制御インデックスのマッピングを、VLMのタスクプロンプトに直接組み込みます。

GR00T-Hによるエンドツーエンド縫合の実行

Cosmos-H-Surgical-Simulator

主な機能

物理シミュレータとしてのWFM: データから組織の変形や工具との相互作用を暗黙的に学習します。

合成データ生成: 現実的な合成ビデオと動作のペアを生成し、データ量が不足しているデータセットを拡張します。

ファインチューニングの詳細

次のステップ: 手術ロボティクスのための推論へ

今すぐ始めましょう

Open-H-Embodimentデータセットとモデルを使って作業を始めるには、以下のリソースをご利用ください:

Open-H-Embodiment: HF Dataset / Github Repo

NVIDIA Isaac GR00T-Hモデル: HF Model / GR00T-H Github Repo

NVIDIA Cosmos-H-Surgical-Simulator: HF Model / Github Repo

Cosmos Cookbook: 独自のエンボディメントのためにカスタムWFMを構築するステップバイステップのワークフロー

Hugging Faceで探索: Hugging FaceとGitHubで新しいオープンなCosmosモデルとデータセットをチェックするか、build.nvidia.comでモデルをお試しください。

原文を表示

Back to Articles The Rise of Physical AI for Healthcare Robotics

Upvote 1

Introducing Open-H-Embodiment: The first healthcare robotics open dataset, built by a community collaboration

Authors: Nigel Nelson, Lukas Zbinden, Mostafa Toloui, Sean Huver

Open-H-Embodiment

Participants from around the world came together to build the first large scale dataset to advance the cause of physical AI in healthcare robotics.

Open-H-Embodiment Sample data

Comprises 778 hours of CC-BY-4.0 healthcare robotics training data, largely surgical robotics, but also ultrasound and colonoscopy autonomy data.

Spans simulation, benchtop exercises (e.g., suturing), and real clinical procedures.

Uses commercial robots (CMR Surgical, Rob Surgical, Tuodao) and research robots (dVRK, Franka, Kuka).

Released alongside two new, permissively open-source models post-trained on this data.

GR00T-H: Vision Language Action Model for Surgical Robotics

Building on NVIDIA’s open-source ecosystem, Isaac GR00T-H leverages Cosmos Reason 2 2B as its Vision-Language Model (VLM) backbone.

Architectural Design Choices

Surgical robotics requires high precision, but specialized hardware (like cable-driven systems) makes imitation learning (IL) difficult. To handle this, GR00T-H uses four key design choices:

Unique Embodiment Projectors: A unique, learnable MLP maps each robot's specific kinematics to a shared, normalized action space.

State Dropout (100%): Proprioceptive input is dropped during inference to create a learned bias term for each system, yielding better real-world results.

Relative EEF Actions: Training uses a common relative End-Effector (EEF) action space to overcome kinematic inconsistencies.

Metadata in Task Prompts: Instrument names and control index mapping are injected directly into the VLM task prompt.

A prototype of GR00T-H has demonstrated the ability to execute a complete, end-to-end suture in the SutureBot benchmark, highlighting robust long-horizon dexterity.

GR00T-H performing end-to-end suturing.

Cosmos-H-Surgical-Simulator

Key Capabilities

Overcoming the Sim-to-Real Gap: Fine-tuned from NVIDIA Cosmos Predict 2.5 2B, it generates physically plausible surgical video directly from kinematic actions.

Efficiency Gains: For 600 rollouts, it took only 40 minutes in simulation versus 2 days using real-world benchtop methods.

WFM as a Physics Simulator: Implicitly learns tissue deformation and tool interaction from data.

Synthetic Data Generation: Generates realistic synthetic video-action pairs to augment underrepresented datasets.

Fine-Tuning Details

The model was fine-tuned on the Open-H-Embodiment dataset (9 robot embodiments, 32 datasets) using 64x A100 GPUs for approximately 10,000 GPU-hours. It utilizes a unified 44-dimensional action space.