Qwen-AgentWorld（29 分間の読了時間）

要約：世界モデルは、現在の観測と行動に基づいて環境の動態を予測し、推論や計画のための中核的な認知メカニズムとして機能します。本研究では、言語モデルに基づく世界モデリングが、一般化されたエージェントの境界をさらに押し広げる方法を探ります。(i) まず、エージェント型環境シミュレーションのための基盤モデル構築に焦点を当てます。7 つのドメインにわたるエージェント型環境を長期的な思考連鎖推論によってシミュレートできる初の言語世界モデルとして、Qwen-AgentWorld-35B-A3B および Qwen-AgentWorld-397B-A17B を導入します。実世界の 7 ドメインにおける 1,000 万本以上の環境相互作用トラジェクトリを活用し、3 つの段階からなるトレーニングパイプラインを通じて Qwen-AgentWorld を開発しました。CPT（事前継続的学習）は状態遷移動態と拡張された専門コーパスから汎用的な世界モデリング能力を注入し、SFT（教師あり微調整）は次状態予測推論を活性化し、RL（強化学習）はハイブリッド評価基準・ルール報酬を備えた専用フレームワークを通じてシミュレーションの忠実度を高めます。言語世界モデルを評価するために、9 つの確立されたベンチマーク上で 5 つの最先端モデルが実世界の相互作用を行った結果から構築した包括的なベンチマーク「AgentWorldBench」を発表します。実証結果は、Qwen-AgentWorld が既存の最先端モデルを大幅に上回ることを示しています。(ii) 基盤モデルを超えて、世界モデリングが一般化されたエージェントを強化する 2 つの補完的パラダイムについてもさらに調査します。第一に、分離された環境シミュレータとして、Qwen-AgentWorld はエージェント型強化学習のための数千の実世界環境のスケーラブルで制御可能なシミュレーションをサポートし、実環境でのトレーニング単独を超える成果をもたらします。第二に、統合されたエージェント基盤モデルとして、世界モデルによるトレーニングは下流の 7 つのエージェントベンチマーク全体でパフォーマンスを向上させる極めて効果的なウォームアップとして機能します。コード：this https URL

主題:

コンピュータ言語処理 (cs.CL)

参照形式:

arXiv:2606.24597 [cs.CL]

(または

arXiv:2606.24597v1 [cs.CL] は本バージョン用)

DataCite 経由で arXiv が発行した DOI

## 提出履歴

Fei Huang [メールを表示]

[v1]**

2026 年 6 月 23 日 (火) 13:53:55 UTC (3,883 KB)

原文を表示

Authors:Yuxin Zuo, Zikai Xiao, Li Sheng, Fei Huang, Jianhong Tu, Yuxuan Liu, Tianyi Tang, Xiaomeng Hu, Yang Su, Qingfeng Lan, Yantao Liu, Qin Zhu, Yinger Zhang, Bowen Yu, Haiquan Zhao, Haiyang Xu, Jianxin Yang, Jiayang Cheng, Junyang Wang, Lianghao Deng, Mingfeng Xue, Tianyi Bai, Yang Fan, Yubo Ma, Yucheng Li, Zeyu Cui, Zhihai Wang, Zhihui Xie, Zhuorui Ye, An Yang, Dayiheng Liu, Jingren Zhou, Ning Ding

Abstract:A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can further push the boundaries of general agents. (i) We first focus on building foundation models for agentic environment simulation. We introduce Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B, the first language world models capable of simulating agentic environments covering 7 domains via long chain-of-thought reasoning. Leveraging more than 10M environment interaction trajectories of 7 domains in real-world environments, we develop Qwen-AgentWorld through a three-stage training pipeline: CPT injects general-purpose world modeling capabilities from the state transition dynamics and augmented professional corpora, SFT activates next-state-prediction reasoning, and RL sharpens simulation fidelity through a tailored framework with hybrid rubric-and-rule rewards. To evaluate language world models, we present AgentWorldBench, a comprehensive benchmark constructed from real-world interactions of 5 frontier models on 9 established benchmarks. Empirical results demonstrate that Qwen-AgentWorld significantly outperforms existing frontier models. (ii) Beyond foundation models, we further investigate two complementary paradigms through which world modeling enhances general agents. First, as a decoupled environment simulator, Qwen-AgentWorld supports scalable and controllable simulation of thousands of real-world environments for agentic RL, yielding gains that surpass real-environment training alone. Second, as a unified agent foundation model, world-model training acts as a highly effective warm-up that improves downstream performance across 7 agentic benchmarks. Code: this https URL

Subjects:

Computation and Language (cs.CL)

Cite as:

arXiv:2606.24597 [cs.CL]

(or

arXiv:2606.24597v1 [cs.CL] for this version)

TechCrunch AI★42026年6月26日 02:38

arXiv-issued DOI via DataCite

Submission history

From: Fei Huang [view email] [v1]

Tue, 23 Jun 2026 13:53:55 UTC (3,883 KB)

この記事をシェア

Anthropic の Claude が有料消費者層で ChatGPT を凌駕し市場を席巻

Anthropic が提供する AI チャットボット「Claude」が、従来 ChatGPT が独占していた有料顧客市場において支持を集め、シェア拡大に成功していることが示された。

NVIDIA Developer Blog★42026年6月26日 01:43

NVIDIA TensorRT を用いた複数 GPU での AI 推論のスケーリングとマルチデバイス推論サポートの紹介

NVIDIA は、TensorRT の新機能であるマルチデバイス推論サポートを活用し、複数の GPU にわたって AI 推論を効率的にスケーリングする手法を発表した。これにより大規模モデルの実行性能が向上する。

AWS Machine Learning Blog★42026年6月26日 01:41

NVIDIA Blackwell を用いた Amazon SageMaker AI でのモデル学習の最適化

AWS は、NVIDIA の最新 GPU「Blackwell」を活用することで、Amazon SageMaker AI 上で大規模 AI モデルの学習におけるメモリ制約やシーケンス長の制限といった課題を克服し、実用的な運用範囲を広げる方法を発表した。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

TLDR AI·2026年6月25日 09:00·約3分で読める

Qwen-AgentWorld（29 分間の読了時間）

#LLM #エージェント #アリババ #Qwen #自動化

TL;DR

AI深層分析2026年6月26日 00:05

注目/ 5段階

深度40%

キーポイント

Qwen を基盤としたエージェント開発環境の公開

アリババ傘下の通義千問（Qwen）モデルを中核に据えた、新しいエージェント開発・実験プラットフォーム「Qwen-AgentWorld」が紹介された。

多機能なエージェント構築の実現可能性

開発者向けの実践的ガイドライン

影響分析・編集コメントを表示

影響分析

編集コメント

主題:

コンピュータ言語処理 (cs.CL)

参照形式:

arXiv:2606.24597 [cs.CL]

(または

arXiv:2606.24597v1 [cs.CL] は本バージョン用)

DataCite 経由で arXiv が発行した DOI

## 提出履歴

Fei Huang [メールを表示]

[v1]**

2026 年 6 月 23 日 (火) 13:53:55 UTC (3,883 KB)

原文を表示

Abstract:A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can further push the boundaries of general agents. (i) We first focus on building foundation models for agentic environment simulation. We introduce Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B, the first language world models capable of simulating agentic environments covering 7 domains via long chain-of-thought reasoning. Leveraging more than 10M environment interaction trajectories of 7 domains in real-world environments, we develop Qwen-AgentWorld through a three-stage training pipeline: CPT injects general-purpose world modeling capabilities from the state transition dynamics and augmented professional corpora, SFT activates next-state-prediction reasoning, and RL sharpens simulation fidelity through a tailored framework with hybrid rubric-and-rule rewards. To evaluate language world models, we present AgentWorldBench, a comprehensive benchmark constructed from real-world interactions of 5 frontier models on 9 established benchmarks. Empirical results demonstrate that Qwen-AgentWorld significantly outperforms existing frontier models. (ii) Beyond foundation models, we further investigate two complementary paradigms through which world modeling enhances general agents. First, as a decoupled environment simulator, Qwen-AgentWorld supports scalable and controllable simulation of thousands of real-world environments for agentic RL, yielding gains that surpass real-environment training alone. Second, as a unified agent foundation model, world-model training acts as a highly effective warm-up that improves downstream performance across 7 agentic benchmarks. Code: this https URL

Subjects:

Computation and Language (cs.CL)

Cite as:

arXiv:2606.24597 [cs.CL]

(or

arXiv:2606.24597v1 [cs.CL] for this version)