読み込み中…

TLDR AI·2026年6月4日 09:00·約2分

継続学習のための「睡眠」アプローチ（24 分読）

#継続的学習 #LLM #強化学習 #知識蒸留 #自律学習

TL;DR

人間のプロセスに着想を得た「Sleep」パラダイムにより、短期記憶を長期知識へ定着させ、強化学習を用いた自発的な学習で継続的学習と汎化能力を飛躍的に向上させる手法が提案された。

AI深層分析2026年6月5日 19:12

重要/ 5段階

深度40%

キーポイント

睡眠（Sleep）パラダイムの導入

人間のような学習プロセスに着想を得た新フレームワークで、モデルの短期記憶を長期パラメータへ転送し、継続的学習と知識移転を実現する。

知識種子化（Knowledge Seeding）による記憶定着

「メモリ統合」フェーズにおいて、小規模モデルの記憶を大規模ネットワークへ伝達する一般化された蒸留プロセスと強化学習に基づく模倣学習を組み合わせる。

夢（Dreaming）による自律的改善

「夢」フェーズでモデル自身が強化学習を用いて合成データのカリキュラムを生成し、人間の介入なしに新知識の練習と既存能力の洗練を行う。

長期的学習タスクでの実証

長期ホライズン、継続的学習、知識統合、few-shot 汎化などの課題において、睡眠ステージの重要性を実験的に裏付けた。

公開時期とバージョン

この論文の版は2025年9月以降にOpenReviewで公開されており、現在のバージョン（v1）は2026年6月2日に提出されたものです。

学術分類と識別子

機械学習および人工知能を専門分野とし、arXiv ID 2606.03979 および対応するDOIで一意に識別されます。

重要な引用

existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters

inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay

Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision

A version of this work has been publicly available from September 2025 on OpenReview

arXiv:2606.03979 [cs.LG]

影響分析・編集コメントを表示

影響分析

本論文は、LLM が一度学習した知識を永続化し、新たな情報を吸収しながらも既存の能力を維持する「継続的学習」の実現に向けた画期的なアプローチを示しています。特に、強化学習を活用してモデルが自ら学習データを生成・改善する自律的なメカニズムは、将来的に大規模モデルのコスト削減と適応速度の劇的向上をもたらす可能性があります。

編集コメント

人間の脳が睡眠中に情報を整理・定着させるプロセスを AI に模倣したこのアプローチは、現在の LLM が抱える「忘却」や「学習の非効率性」という根本課題への解決策として極めて注目すべきものです。

PDF を表示

HTML（実験的）

要約：過去数十年は、タスク固有の浅いモデルからより一般的な深層大規模言語モデル（LLM）へと至る機械学習アルゴリズムの設計において顕著な進展が見られました。即時予測やコンテキスト内学習を必要とするタスクで有望な結果を示す一方で、既存モデルには継続的に学習し、その時系列のコンテキスト知識を長期的パラメータに効果的に転移させる能力が欠けています。人間の学習プロセスに触発され、私たちは「睡眠（Sleep）」というパラダイムを導入します。これはモデルが継続的に学習し、リプレイを通じて短期的で脆弱な記憶を安定した長期知識へと凝縮し、「夢見（Dreaming）」プロセスによって再帰的に自己改善を行うことを可能にします。より詳細には、睡眠は2つの段階から構成されます：(1) 記憶定着：「知識の種まき」と呼ばれる上向きの凝縮プロセスであり、小さなモデルの記憶をより大規模なネットワークへ凝縮して容量を提供しつつ、既存の知識を保持します。概念実証として、私たちは{知識の種まき}のための新しい一般化された蒸留プロセス（すなわち、オンポリシー蒸留と強化学習（RL）に基づく模倣学習の組み合わせ）を発表します。(2) 夢見：モデルが自己改善を行う段階であり、モデルは強化学習を用いて合成データのカリキュラムを生成し、人間の監督なしに新しい知識を反復練習し、既存の能力を洗練させます。長期ホライズン、継続的学習、知識統合、少数ショット一般化タスクにおける私たちの実験は、睡眠段階の重要性を支持するものです。

コメント:

この作品のバージョンは、2025 年 9 月から OpenReview で一般公開されています。

主題:

機械学習 (cs.LG); 人工知能 (cs.AI)

引用形式:

arXiv:2606.03979 [cs.LG]

(または、このバージョンについては arXiv:2606.03979v1 [cs.LG])

https://doi.org/10.48550/arXiv.2606.03979

arXiv 発行の DOI (DataCite 経由)

提出履歴

送信者: Ali Behrouz [メールを見る]

[v1]**

2026 年 6 月 2 日 (火) 17:56:55 UTC (2,961 KB)

原文を表示

View PDF

HTML (experimental)

Abstract:The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.

Comments:

A version of this work has been publicly available from September 2025 on OpenReview

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as:

arXiv:2606.03979 [cs.LG]

(or

arXiv:2606.03979v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2606.03979

arXiv-issued DOI via DataCite

Submission history

From: Ali Behrouz [view email] [v1]

Tue, 2 Jun 2026 17:56:55 UTC (2,961 KB)

この記事をシェア

Apple Machine Learning重要度42026年7月20日 09:00

Apple、トークンレベルの生成長モデル「LenVM」発表

Latent Space重要度42026年7月21日 12:58

AI セキュリティと検証の重要性が強調される静かな日

TechCrunch AI重要度42026年7月21日 04:33

OpenAI、オープンウェイトモデルを懸念

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

TLDR AI·2026年6月4日 09:00·約2分

継続学習のための「睡眠」アプローチ（24 分読）

#継続的学習 #LLM #強化学習 #知識蒸留 #自律学習

TL;DR

AI深層分析2026年6月5日 19:12

重要/ 5段階

深度40%

キーポイント

睡眠（Sleep）パラダイムの導入

人間のような学習プロセスに着想を得た新フレームワークで、モデルの短期記憶を長期パラメータへ転送し、継続的学習と知識移転を実現する。

知識種子化（Knowledge Seeding）による記憶定着

夢（Dreaming）による自律的改善

「夢」フェーズでモデル自身が強化学習を用いて合成データのカリキュラムを生成し、人間の介入なしに新知識の練習と既存能力の洗練を行う。

長期的学習タスクでの実証

長期ホライズン、継続的学習、知識統合、few-shot 汎化などの課題において、睡眠ステージの重要性を実験的に裏付けた。

公開時期とバージョン

この論文の版は2025年9月以降にOpenReviewで公開されており、現在のバージョン（v1）は2026年6月2日に提出されたものです。

学術分類と識別子

機械学習および人工知能を専門分野とし、arXiv ID 2606.03979 および対応するDOIで一意に識別されます。

重要な引用

existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters

inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay

Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision

A version of this work has been publicly available from September 2025 on OpenReview

arXiv:2606.03979 [cs.LG]

影響分析・編集コメントを表示

影響分析

編集コメント

PDF を表示

HTML（実験的）

コメント:

この作品のバージョンは、2025 年 9 月から OpenReview で一般公開されています。

主題:

機械学習 (cs.LG); 人工知能 (cs.AI)

引用形式:

arXiv:2606.03979 [cs.LG]

(または、このバージョンについては arXiv:2606.03979v1 [cs.LG])

https://doi.org/10.48550/arXiv.2606.03979

arXiv 発行の DOI (DataCite 経由)

提出履歴

送信者: Ali Behrouz [メールを見る]

[v1]**

2026 年 6 月 2 日 (火) 17:56:55 UTC (2,961 KB)

原文を表示

View PDF

HTML (experimental)

Abstract:The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.

Comments:

A version of this work has been publicly available from September 2025 on OpenReview

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as:

arXiv:2606.03979 [cs.LG]

(or

arXiv:2606.03979v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2606.03979

arXiv-issued DOI via DataCite