Apple Machine Learning·2026年2月23日 09:00·約6分

Apple 推論と計画ワークショップ 2025

#推論・計画 #自律エージェント #LLM #マルチモーダル #Apple #ベンチマーク

TL;DR

AppleがAIの推論と計画能力を向上させる研究を進めており、自律的なAIシステムの開発を目指している。

AI深層分析2026年2月25日 22:41

重要/ 5段階

キーポイント

Appleが主催した推論・計画ワークショップの内容を公開

推論と計画は自律的AIシステムの基盤として位置付けられている

多様なモダリティへの接地や複数システムの協調など重要な研究課題が議論された

具体的な研究発表としてLLMから具身化エージェント、ベンチマーク、効率的推論手法などが紹介された

影響分析・編集コメントを表示

影響分析

AppleがAI推論・計画分野の研究コミュニティ形成に積極的に関与していることを示す重要なシグナル。業界全体の研究動向をリードする立場を強化し、自律的AIエージェント開発競争におけるAppleの戦略的方向性を明確にしている。

編集コメント

Appleが公開した研究ワークショップの内容から、次世代AIシステム開発における推論・計画技術の重要性と、業界全体の研究動向をリードする姿勢が明確に読み取れる。

Apple Workshop on Reasoning and Planning 2025

AppleのML研究者、イマン・ミルザデがワークショップで発表している様子。

推論と計画立案は、知的AIシステムの基盤であり、システムが計画を立て、対話し、適応し、最終的には自律的に動作することを可能にします。Appleでは、AIシステムにおける推論能力を理解し、発展させることは長年にわたり活発な研究分野であり、推論の最先端を進める新たな技術を探求するとともに、現在のアプローチの能力（と限界）に関する分野全体の理解を深める数多くの論文が生み出されてきました。

昨年、Appleは「Workshop on Reasoning and Planning」を主催し、Appleの研究者と広範な研究コミュニティのメンバーを集め、この分野の最先端を進展させることに焦点を当てた2日間のイベントを開催しました。ワークショップは、推論と計画立案、エージェントへの応用、モデル開発という3つの主要分野に焦点を当てました。

これらのトピックに関する発表と議論では、推論および計画立案モデルが、単純な指示に基づいて複雑なタスクをどのように処理し完了するかが探求されました。ワークショップ参加者は、以下のような問いについて議論しました：

推論プロセスを様々なモダリティや実体（エンボディメント）にどのように接地させるか；

探索/テスト時の計算量と基盤モデルの能力の間にはどのような相互作用があるか；

複数の推論システムはどのように協働できるか。

グループはまた、メモリと適応を活用するアーキテクチャ、そして信頼性が高く、安全で効率的な方法で計画を立て推論する方法についても探求しました。さらに、ワークショップ参加者は、環境とシミュレーターを用いた推論システムのためのモデル開発、およびスケーラブルなトレーニングと信頼性の高いベンチマークのために特別に設計されたデータ生成技術について議論しました。これらの研究分野における進歩が相まって、複雑で動的な課題に取り組むことができる、適応性が高く効率的で知的なシステムの創造が可能になるでしょう。

この投稿では、ワークショップで行われた選択された講演の録画と、議論された論文のまとめを共有します。

Apple Workshop on Reasoning and Planning 2025 動画

LLMから具身化AIエージェントへ：教訓と手法

発表者：Alexander Toshev

MMAU：多様な領域におけるエージェント能力の包括的ベンチマーク…

発表者：Yanchao Sun

適応的並列推論による、より効率的で正確な推論…

発表者：Alane Suhr

視覚推論における堅牢性の評価と向上について…

発表者：Melanie Mitchell

基盤モデルの時代におけるオープンエンドおよびAI生成アルゴリズム…

発表者：Jeff Clune

推論、知能、大規模言語モデル

発表者：Iman Mirzadeh

長期的対話型LLMエージェントのための強化学習

発表者：Philipp Krähenbühl

エージェントのためのインターネット規模トレーニングに向けて

発表者：Ruslan Salakhutdinov

ワークショップリソース

ワークショップで発表された論文

AbstRaL: 抽象的思考の強化によるLLMの推論能力拡張 by Silin Gao (Apple在籍時の研究), Antoine Bosselut (EPFL), Samy Bengio, Emmanuel Abbe

大規模言語モデルのための適応可能な論理制御 by Honghua Zhang (UCLA), Po-Nien Kung (UCLA), Masahiro Yoshida, Guy Van den Broeck (UCLA), Nanyun Peng (UCLA)

Adapt On-the-Go: 単一ライフロボット展開のための行動変調 by Annie S. Chen (Stanford University), Govind Chada (Stanford University), Laura Smith (UC Berkeley), Archit Sharma (Stanford University), Zipeng Fu (Stanford University), Sergey Levine (UC Berkeley), Chelsea Finn (Stanford University)

視覚言語モデルを用いた脚式ロボット適応のための常識推論 by Annie S. Chen (Stanford University), Alec M. Lessing (Stanford University), Andy Tang (Stanford University), Govind Chada (Stanford University), Laura Smith (UC Berkeley), Sergey Levine (UC Berkeley), Chelsea Finn (Stanford University)

具身化エージェントインターフェース：具身化意思決定のためのLLMベンチマーク by Manling Li (Stanford University, Northwestern University), Shiyu Zhao (Stanford University), Qineng Wang (Stanford University, Northwestern University) Kangrui Wang (Stanford University, Northwestern University), Yu Zhou (Stanford University), Sanjana Srivastava (Stanford University), Cem Gokmen (Stanford University), Tony Lee (Stanford University), Li Erran Li (Amazon), Ruohan Zhang (Stanford University), Weiyu Liu (Stanford University), Percy Liang (Stanford University), Li Fei-Fei (Stanford University), Jiayuan Mao (MIT), Jiajun Wu (Stanford University)

Ferret-UI: マルチモーダルLLMによるモバイルUI理解の接地 by Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan

Ferret-UI 2: プラットフォーム横断的なユニバーサルユーザーインターフェース理解の習得 by Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun Li, Mohana Prasad Sathya Moorthy, Jeff Nichols, Yinfei Yang, Zhe Gan

マルチモーダルLLMから汎用具身化エージェントへ：手法と教訓 by Andrew Szot, Bogdan Mazoure, Omar Attia, Aleksei Timofeev, Harsh Agrawal, Devon Hjelm, Zhe Gan, Zsolt Kira, Alexander Toshev

行動へのマルチモーダル大規模言語モデルの接地 by Andrew Szot, Bogdan Mazoure, Harsh Agrawal, Devon Hjelm, Zsolt Kira, Alexander Toshev

GSM-Symbolic: 言語モデルの数学的推論の限界理解 by Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, Mehrdad Farajtabar

トランスフォーマーはどこまで推論できるか？グローバリティの障壁と帰納的スクラッチパッド by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Colin Sandon, Omid Saremi

思考の錯覚：問題複雑性のレンズを通した推論モデルの強みと限界の理解 by Parshin Shojaee (Appleでのインターンシップ期間中の研究), Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar

具身化タスクのための汎化可能なポリシーとしての大規模言語モデル by Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Rin Metcalf Susa, Natalie Mackraz, Devon Hjelm, Alexander Toshev

言語モデルによる適応的並列推論の学習 by Jiayi Pan (UC Berkeley), Xiuyu Li (UC Berkeley), Long Lian (UC Berkeley), Charlie Snell (UC Berkeley), Yifei Zhou (UC Berkeley), Adam Yala (UC Berkeley, UCSF), Trevor Darrell (UC Berkeley), Kurt Keutzer (UC Berkeley), Alane Suhr (UC Berkeley)

外部報酬なしで推論することを学習する by Xuandong Zhao (UC Berkeley), Zhewei Kang (UC Berkeley), Aosong Feng (Yale University), Sergey Levine (UC Berkeley), Dawn Song (UC Berkeley)

LLMのメタ認知的機能：数学的問題解決における探求 by Aniket Didolkar (Mila, University of Montreal), Anirudh Goyal (Mila, University of Montreal), Nan Rosemary Ke (Google DeepMind), Siyuan Guo (The University of Cambridge), Michal Valko (Google DeepMind), Timothy Lillicrap (Google DeepMind), Danilo Rezende (Google DeepMind), Yoshua Bengio (Mila, University of Montreal), Michael Mozer (Google DeepMind), Sanjeev Arora (Princeton University)

Mind2Web 2: Aの評価

原文を表示

Apple Workshop on Reasoning and Planning 2025

Apple ML researcher Iman Mirzadeh presenting at the workshop.

Reasoning and planning are the bedrock of intelligent AI systems, enabling them to plan, interact, adapt, and ultimately, operate independently. At Apple, understanding and advancing reasoning capablilities in AI systems has long been an area of active research, and has resulted in numerous publications that both explore new techniques to advance the frontier of reasoning, and further the field’s understanding of the capabilities (and limitations) of current approaches.

Last year, Apple hosted the Workshop on Reasoning and Planning, bringing together Apple researchers and members of the broader research community for a two-day event focused on advancing the state of the art in this area. The workshop focused on three key areas: Reasoning and Planning, Applications to Agents, and Model Development.

The presentations and discussions of these topics explored how reasoning and planning models process and complete complex tasks based on simple instructions. Workshop participants discussed questions, such as:

how to ground the reasoning process into various modalities and embodiments;

what is the interplay between search/test-time compute and the capabilities of foundation models;

how can multiple reasoning systems collaborate.

The group also explored architectures that leverage memory and adaptation and how to plan and reason in a trustworthy, safe, and efficient manner. In addition, workshop attendees discussed model development for reasoning systems using environments and simulators, along with data generation techniques specifically designed for scalable training and reliable benchmarking. Together, progress in these research areas will enable the creation of adaptable, efficient, and intelligent systems that are capable of tackling complex, dynamic challenges.

In this post, we share recordings of selected talks and a recap of the publications discussed at the workshop.

Apple Workshop on Reasoning and Planning 2025 Videos

From LLMs to Embodied AI Agents: Lessons and Methods

Presented by Alexander Toshev

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse…

Presented by Yanchao Sun

More Efficient and Accurate Reasoning with Adaptive Parallel…

Presented by Alane Suhr

On Evaluating and Improving Robustness in Visual Reasoning…

Presented by Melanie Mitchell

Open-Ended and AI-Generating Algorithms in the Era of Foundation…

Presented by Jeff Clune

Reasoning, Intelligence & Large Language Models

Presented by Iman Mirzadeh

Reinforcement Learning for Long-Horizon Interactive LLM Agents

Presented by Philipp Krähenbühl

Towards Internet-Scale Training For Agents

Presented by Ruslan Salakhutdinov

Workshop Resources

Published Work Presented at the Workshop

AbstRaL: Augmenting LLMs’ Reasoning by Reinforcing Abstract Thinking by Silin Gao (work done while at Apple), Antoine Bosselut (EPFL), Samy Bengio, Emmanuel Abbe

Adaptable Logical Control for Large Language Models by Honghua Zhang (UCLA), Po-Nien Kung (UCLA), Masahiro Yoshida, Guy Van den Broeck (UCLA), Nanyun Peng (UCLA)

Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment by Annie S. Chen (Stanford University), Govind Chada (Stanford University), Laura Smith (UC Berkeley), Archit Sharma (Stanford University), Zipeng Fu (Stanford University), Sergey Levine (UC Berkeley), Chelsea Finn (Stanford University)

Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models by Annie S. Chen (Stanford University), Alec M. Lessing (Stanford University), Andy Tang (Stanford University), Govind Chada (Stanford University), Laura Smith (UC Berkeley), Sergey Levine (UC Berkeley), Chelsea Finn (Stanford University)

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making by Manling Li (Stanford University, Northwestern University), Shiyu Zhao (Stanford University), Qineng Wang (Stanford University, Northwestern University) Kangrui Wang (Stanford University, Northwestern University), Yu Zhou (Stanford University), Sanjana Srivastava (Stanford University), Cem Gokmen (Stanford University), Tony Lee (Stanford University), Li Erran Li (Amazon), Ruohan Zhang (Stanford University), Weiyu Liu (Stanford University), Percy Liang (Stanford University), Li Fei-Fei (Stanford University), Jiayuan Mao (MIT), Jiajun Wu (Stanford University)

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs by Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms by Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun Li, Mohana Prasad Sathya Moorthy, Jeff Nichols, Yinfei Yang, Zhe Gan

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons by Andrew Szot, Bogdan Mazoure, Omar Attia, Aleksei Timofeev, Harsh Agrawal, Devon Hjelm, Zhe Gan, Zsolt Kira, Alexander Toshev

Grounding Multimodal Large Language Models in Actions by Andrew Szot, Bogdan Mazoure, Harsh Agrawal, Devon Hjelm, Zsolt Kira, Alexander Toshev

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Language Models by Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, Mehrdad Farajtabar

How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Colin Sandon, Omid Saremi

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity by Parshin Shojaee (work done during an internship at Apple), Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar

Large Language Models as Generalizable Policies for Embodied Tasks by Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Rin Metcalf Susa, Natalie Mackraz, Devon Hjelm, Alexander Toshev

Learning Adaptive Parallel Reasoning with Language Models by Jiayi Pan (UC Berkeley), Xiuyu Li (UC Berkeley), Long Lian (UC Berkeley), Charlie Snell (UC Berkeley), Yifei Zhou (UC Berkeley), Adam Yala (UC Berkeley, UCSF), Trevor Darrell (UC Berkeley), Kurt Keutzer (UC Berkeley), Alane Suhr (UC Berkeley)

Learning to Reason without External Rewards by Xuandong Zhao (UC Berkeley), Zhewei Kang (UC Berkeley), Aosong Feng (Yale University), Sergey Levine (UC Berkeley), Dawn Song (UC Berkeley)

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving by Aniket Didolkar (Mila, University of Montreal), Anirudh Goyal (Mila, University of Montreal), Nan Rosemary Ke (Google DeepMind), Siyuan Guo (The University of Cambridge), Michal Valko (Google DeepMind), Timothy Lillicrap (Google DeepMind), Danilo Rezende (Google DeepMind), Yoshua Bengio (Mila, University of Montreal), Michael Mozer (Google DeepMind), Sanjeev Arora (Princeton University)

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge by Boyu Gou (The Ohio State University), Zanming Huang (The Ohio State University), Yuting Ning (The Ohio State University), Yu Gu (The Ohio State University), Michael Lin (The Ohio State University), Weijian Qi (The Ohio State University), Andrei Kopanev (The Ohio State University), Botao Yu (The Ohio State University), Bernal Jiménez Gutiérrez (The Ohio State University), Yiheng Shu (The Ohio State University), Chan Hee Song (The Ohio State University), Jiaman Wu (The Ohio State University), Shijie Chen (The Ohio State University), Hanane Nour Moussa (The Ohio State University), Tianshu Zhang (The Ohio State University), Jian Xie (The Ohio State University), Yifei Li (The Ohio State University), Tianci Xue (The Ohio State University), Zeyi Liao (The Ohio State University), Kai Zhang (The Ohio State University), Boyuan Zheng (The Ohio State University), Zhaowei Cai (Amazon AGI), Viktor Rozgic (Amazon AGI), Morteza Ziyadi (Amazon AGI), Huan Sun (The Ohio State University), Yu Su (The Ohio State University)

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains by Guoli Yin, Haoping Bai, Shuang Ma, Feng Nan, Yanchao Sun, Zhaoyang Xu, Shen Ma, Jiarui Lu, Xiang Kong, Aonan Zhang, Dian Ang Yap, Yizhe Zhang, Karsten Ahnert, Vik Kamath, Mathias Berglund, Dominic Walsh, Tobias Gindele, Juergen Wiest, Zhengfeng Lai, George Horrell, Xiaoming Wang, Jiulong Shan, Meng Cao, Ruoming Pang, Zirui Wang

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization by Yiyou Sun (UC Berkeley), Shawn Hu (dmodel.ai), Georgia Zhou (UC Berkeley,), Ken Zheng (UC Berkeley,), Hannaneh Hajishirzi (Ai2, University of Washington), Nouha Dziri (Ai2), Dawn Song (UC Berkeley)

On the Modeling Capabilities of Large Language Models for Sequential Decision Making by Martin Klissarov, Devon Hjelm, Alexander Toshev, Bogdan Mazoure

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments by Tianbao Xie (The University of Hong Kong), Danyang Zhang (The University of Hong Kong), Jixuan Chen (The University of Hong Kong), Xiaochuan Li (The University of Hong Kong), Siheng Zhao (The University of Hong Kong), Ruisheng Cao (The University of Hong Kong), Toh Jing Hua (The University of Hong Kong), Zhoujun Cheng (The University of Hong Kong), Dongchan Shin (The University of Hong Kong), Fangyu Lei (The University of Hong Kong), Yitao Liu (The University of Hong Kong), Yiheng Xu (The University of Hong Kong), Shuyan Zhou (Carnegie Mellon University), Silvio Savarese (Salesforce Research), Caiming Xiong (Salesforce Research), Victor Zhong (University of Waterloo), Tao Yu (The University of Hong Kong)

Policy Learning from Tutorial Books via Understanding, Rehearsing and Introspecting by Xiong-Hui Chen (Nanjing University), Ziyan Wang (King’s College London), Yali Du (King’s College London), Shengyi Jiang (The University of Hong Kong), Meng Fang (University of Liverpool), Yang Yu (Nanjing University), Jun Wang (University College London)

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning by Zihan Wang (Northwestern University), Kangrui Wang (Northwestern University), Qineng Wang (Northwestern University), Pingyue Zhang (Northwestern University), Linjie Li (University of Washington), Zhengyuan Yang (Microsoft), Xing Jin (University of British Columbia), Kefan Yu (Northwestern University), Minh Nhat Nguyen (Singapore Management University), Licheng Liu (Northwestern University), Eli Gottlieb (Northwestern University), Yiping Lu (Northwestern University), Kyunghyun

この記事をシェア

TLDR AI2026年7月3日 09:00

メタの「Watermelon」が GPT-5.5 ベンチマークに匹敵

TLDR AI重要度42026年7月3日 09:00

Seed2.0 モデルカード（72 分間の読了）

MarkTechPost重要度42026年7月3日 06:38

RAG-Anything チュートリアル：Colab でテキスト、表、数式、画像を扱うマルチモーダル検索パイプラインの構築方法

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む