TLDR AI·2026年5月4日 09:00·約3分

エージェント学習のための合成コンピュータ環境

#Agentic AI #Reinforcement Learning #Synthetic Data #Long-horizon Planning

TL;DR

本論文は、エージェントの長期的な生産性タスクを訓練するための「Synthetic Computers at Scale」という手法を提案し、1000台の仮想環境で生成された膨大な学習データが性能向上に寄与することを示した。

AI深層分析2026年7月5日 08:12

重要/ 5段階

深度40%

キーポイント

スケーラブルな合成コンピュータ環境の構築

ディレクトリ構造や文書、スプレッドシートなど、現実的なコンテンツと階層構造を持つ「Synthetic Computers」を大量に生成する手法を提案している。

長期的なロールプレイシミュレーションの実行

一人のエージェントが目標を設定し、もう一人のエージェントがそのユーザーとして行動して月単位のタスクを完了させる、2000ターン以上の長期シミュレーションを実行する。

実証された性能向上とスケーラビリティ

1,000台の環境で得られた学習信号がドメイン内・外の評価で顕著な改善をもたらしたことを示し、計算リソース次第では数億規模への拡張が可能であると論じている。

自律的強化学習の基盤としての可能性

多様な職業や役割をカバーする合成ユーザーワールドは、エージェントの自己改善と長期的生産性シナリオにおける強化学習のための重要な基盤となり得ると結論付けている。

影響分析・編集コメントを表示

影響分析

この研究は、AI エージェントが現実世界の複雑な生産性タスクを習得するために必要な「経験」を生成する方法論に大きな転換点をもたらす可能性があります。単なるテキストデータの生成を超え、ファイルシステムや文書操作といった具体的な環境コンテキストをシミュレートすることで、実社会での自律的な業務遂行能力を飛躍的に高める基盤技術となり得ます。

編集コメント

エージェントが人間のように長期間にわたって複雑な業務を遂行するための学習環境を、人工的に大規模に生成する画期的なアプローチです。実社会での自律運用に向けた重要な一歩となる研究と言えます。

PDF を表示

HTML (実験的)

抄録：現実的な長期にわたる生産性業務は、ユーザー固有のコンピュータ環境に強く依存しており、多くの業務コンテキストがディレクトリ構造やコンテンツ豊富なアーティファクト（文書、スプレッドシート、プレゼンテーションなど）を通じて保存・整理されています。このような生産性シナリオのための合成データ作成をスケールさせるために、私たちは「Synthetic Computers at Scale」を導入しました。これは、現実的なフォルダ階層とコンテンツ豊富なアーティファクトを持つ環境を作成するためのスケーラブルな手法です。各合成コンピュータを条件として、長期のシミュレーションを実行します：あるエージェントがそのコンピュータのユーザーに固有で、複数の専門的な成果物と約 1 ヶ月分の人間による作業を必要とする生産性目標を作成し、別のエージェントがそのユーザーとして振る舞い、これらの目標が完了するまでコンピュータ上で作業を続けます。例えば、基盤となる情報へのアクセスのためにファイルシステムをナビゲートしたり、シミュレーションされた協力者と調整したり、専門的な成果物を生成したりします。

予備実験において、1,000 台の合成コンピュータを作成し、長期ホライズンのシミュレーションを実行しました。各実行にはエージェントの実行時間が 8 時間以上を要し、平均して 2,000 回以上のターンにわたります。これらのシミュレーションは豊かな経験的学習信号を生み出し、その有効性は、ドメイン内およびドメイン外の生産性評価におけるエージェント性能の顕著な改善によって検証されています。ペルソナが数十億規模で豊富に存在する現状を踏まえると、この手法は原理的に、十分な計算リソースがあれば数百万あるいは数十億台の合成ユーザー世界へとスケーリング可能であり、多様な職業、役割、文脈、環境、および生産性ニーズに対するより広範なカバレッジを実現できます。私たちは、大規模な合成コンピュータの作成と、それに伴う大規模シミュレーションが、長期ホライズンの生産性シナリオにおけるエージェントの自己改善やアジェンティック強化学習のための基盤的土台として極めて有望であると主張します。

コメント:

予備版；進行中の作業

主題:

人工知能 (cs.AI); 計算と言語 (cs.CL); マシンラーニング (cs.LG)

引用形式:

arXiv:2604.28181 [cs.AI]

（または、このバージョンについては arXiv:2604.28181v1 [cs.AI]）

https://doi.org/10.48550/arXiv.2604.28181

arXiv-issued DOI via DataCite

Submission history

From: Tao Ge [view email]

[v1]**

Thu, 30 Apr 2026 17:58:02 UTC (6,309 KB)

原文を表示

View PDF

HTML (experimental)

Abstract:Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.g., documents, spreadsheets, and presentations). Conditioned on each synthetic computer, we run long-horizon simulations: one agent creates productivity objectives that are specific to the computer's user and require multiple professional deliverables and about a month of human work; another agent then acts as that user and keeps working across the computer -- for example, navigating the filesystem for grounding, coordinating with simulated collaborators, and producing professional artifacts -- until these objectives are completed.
In preliminary experiments, we create 1,000 synthetic computers and run long-horizon simulations on them; each run requires over 8 hours of agent runtime and spans more than 2,000 turns on average. These simulations produce rich experiential learning signals, whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations. Given that personas are abundant at billion scale, this methodology can in principle scale to millions or even billions of synthetic user worlds with sufficient compute, enabling broader coverage of diverse professions, roles, contexts, environments, and productivity needs. We argue that scalable synthetic computer creation, together with at-scale simulations, is highly promising as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios.

Comments:

Preview version; work in progress

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Cite as:

arXiv:2604.28181 [cs.AI]

(or

arXiv:2604.28181v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2604.28181

arXiv-issued DOI via DataCite

Submission history

From: Tao Ge [view email] [v1]

Thu, 30 Apr 2026 17:58:02 UTC (6,309 KB)

この記事をシェア

AWS Machine Learning Blog重要度42026年7月3日 02:50

Amazon SageMaker AIにおける多ターン強化学習のベストプラクティス

MarkTechPost重要度42026年7月5日 11:31

Qwen の元リーダーが「ハイブリッド思考」の誤りと、なぜ今「エージェント」を支持するのか

MarkTechPost重要度42026年7月5日 01:04

NVIDIA HORIZON：Git ワークツリーを自律的に進化させるハンズフリーエージェントが RTL ベンチマークで完全達成

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

TLDR AI·2026年5月4日 09:00·約3分

エージェント学習のための合成コンピュータ環境

#Agentic AI #Reinforcement Learning #Synthetic Data #Long-horizon Planning

TL;DR

AI深層分析2026年7月5日 08:12

重要/ 5段階

深度40%

キーポイント

スケーラブルな合成コンピュータ環境の構築

ディレクトリ構造や文書、スプレッドシートなど、現実的なコンテンツと階層構造を持つ「Synthetic Computers」を大量に生成する手法を提案している。

長期的なロールプレイシミュレーションの実行

実証された性能向上とスケーラビリティ

自律的強化学習の基盤としての可能性

影響分析・編集コメントを表示

影響分析

編集コメント

PDF を表示

HTML (実験的)

コメント:

予備版；進行中の作業

主題:

人工知能 (cs.AI); 計算と言語 (cs.CL); マシンラーニング (cs.LG)

引用形式:

arXiv:2604.28181 [cs.AI]

（または、このバージョンについては arXiv:2604.28181v1 [cs.AI]）

https://doi.org/10.48550/arXiv.2604.28181

arXiv-issued DOI via DataCite

Submission history

From: Tao Ge [view email]

[v1]**

Thu, 30 Apr 2026 17:58:02 UTC (6,309 KB)

原文を表示

View PDF

HTML (experimental)

Abstract:Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.g., documents, spreadsheets, and presentations). Conditioned on each synthetic computer, we run long-horizon simulations: one agent creates productivity objectives that are specific to the computer's user and require multiple professional deliverables and about a month of human work; another agent then acts as that user and keeps working across the computer -- for example, navigating the filesystem for grounding, coordinating with simulated collaborators, and producing professional artifacts -- until these objectives are completed.
In preliminary experiments, we create 1,000 synthetic computers and run long-horizon simulations on them; each run requires over 8 hours of agent runtime and spans more than 2,000 turns on average. These simulations produce rich experiential learning signals, whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations. Given that personas are abundant at billion scale, this methodology can in principle scale to millions or even billions of synthetic user worlds with sufficient compute, enabling broader coverage of diverse professions, roles, contexts, environments, and productivity needs. We argue that scalable synthetic computer creation, together with at-scale simulations, is highly promising as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios.

Comments:

Preview version; work in progress

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Cite as:

arXiv:2604.28181 [cs.AI]

(or

arXiv:2604.28181v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2604.28181

arXiv-issued DOI via DataCite