Sebastian Raschka·2025年7月1日 20:11·約5分

LLM研究論文：2025年リスト（1月〜6月）

#LLM #Reasoning Models #Reinforcement Learning #Multimodal #Research Papers

TL;DR

Sebastian Raschka が、2025 年上半期の LLM 研究論文 200 件以上を推論モデルやマルチモーダルなどトピック別に整理し、学習者向けの包括的なリストと無料教材を提供した。

AI深層分析2026年5月3日 01:09

注目/ 5段階

深度40%

キーポイント

トピック別分類の導入

読者の要望に応え、日付順ではなく「推論モデル」「効率的トレーニング」「マルチモーダル」などのカテゴリ別に論文を整理した。

推論能力への注力

2025 年の研究動向が推論モデルに集中しており、特に検証可能な報酬を用いた強化学習（RL）によるトレーニング戦略が主流となっている。

定期的な更新形式への変更

研究のスピードに対応するため、年次リストから半年ごとのバイアニュアル更新形式へと変更し、情報の鮮度と可読性を確保した。

強化学習による推論能力の飛躍的向上

DeepSeek-R1、Kimi k1.5、R1-Searcherなどの研究により、ルールベースやプロセス報酬モデルを用いた強化学習（RL）がLLMの数学・ソフトウェア開発・検索能力を劇的に強化することが示されています。

思考プロセスの効率化と構造化

「Less is More for Reasoning (LIMO)」や「System 2 Reasoning」の研究は、推論におけるメタ思考（Meta Chain-of-Thought）や短いトークンでの効果的な学習、そして構造重視のデモンストレーションが重要であることを示唆しています。

特定ドメインへの適応と汎用化

金融（Fino1）、マルチモーダル（LMM-R1）、競プロなど特定の領域での推論能力の転移可能性や、失敗からの学習（Learning from Failures）を通じて、LLMの専門性と堅牢性が向上しています。

検索機能と強化学習の統合

Search-R1 や ReSearch などの研究は、LLM が推論プロセス中に検索エンジンを活用し、外部知識を動的に取得する能力を強化学習によって強化することを示しています。

影響分析・編集コメントを表示

影響分析

この記事は、急激に進化する LLM 研究領域において、研究者やエンジニアが特定のトピック（特に推論能力）を効率的に追跡するための重要な羅針盤となる。また、半年ごとの更新サイクルへの移行により、情報の鮮度を保ちつつ、学習リソースとしての実用性を高めている。

編集コメント

研究論文の洪水の中で、特定の技術トレンドを整理したこのリストは、夏場の学習や面接準備に非常に役立つ一冊です。特に推論能力に関する RL の動向は今後の LLM 進化の鍵となるため注目すべきトピックです。

LLM研究論文：2025年リスト（1月～6月）

2025年発表のLLM研究論文200本以上をトピック別に整理したコレクション

Sebastian Raschka, PhD2025年7月1日∙ 有料10059シェアご存知の方もいるかもしれませんが、私は（読みたい）研究論文や参照したい論文のリストを随時更新してまとめています。

約半年前に2024年のリストを共有したところ、多くの読者の方に役立ったようです。そこで、今回も同様のことをしようと考えていました。ただし今回は、頻繁に寄せられた一つのフィードバックを取り入れることにしました：「日付順ではなく、トピック別に論文を整理してもらえませんか？」

私が考えたカテゴリーは以下の通りです：

推論モデル

1a. 推論モデルの学習

1b. 推論時（Inference-Time）の推論戦略

1c. LLMの評価および／または推論の理解

LLMのためのその他の強化学習手法

その他の推論時スケーリング手法

効率的な学習とアーキテクチャ

拡散ベースの言語モデル

マルチモーダル＆視覚言語モデル

データと事前学習データセット

また、LLM研究が急速に共有され続けていることを受け、このリストを半年ごとの更新に分割することに決めました。こうすることで、リストは消化しやすく、タイムリーなものとなり、質の高い夏の読書材料を探しているすべての方にとって、役立つものになることを願っています。

なお、これは現時点では厳選されたリストに過ぎません。今後の記事では、より興味深い、あるいは影響力のある論文について、大規模なトピック別の記事で再訪し、議論することを計画しています。ご期待ください！

夏です！それはインターンシップの季節、技術面接、そしてたくさんの学びを意味します。中級から上級の機械学習およびAIトピックを復習する方を支援するため、私は自分の著書『Machine Learning Q and AI』の全30章を、この夏の間無料で公開しています：🔗 https://sebastianraschka.com/books/ml-q-and-ai/#table-of-contents 単に好奇心で何か新しいことを学びたい方でも、面接の準備をしている方でも、これがお役に立てば幸いです。楽しい読書を。面接を受ける方は、幸運を祈ります！

推論モデル

今年の私のリストは、推論モデルに非常に重点を置いています。そこで、これを3つのサブカテゴリーに細分化することにしました：学習、推論時スケーリング、そしてより一般的な理解／評価です。

1a. 推論モデルの学習

このサブセクションは、LLMの推論能力を向上させるために特別に設計された学習戦略に焦点を当てています。ご覧になる通り、最近の進歩の多くは（検証可能な報酬を用いた）強化学習を中心としており、これについては以前の記事でより詳細に取り上げました。

LLM推論のための強化学習の現状

Reinforcement Pre-Training（https://arxiv.org/abs/2506.08007）からの注釈付き図

1月8日, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682

1月13日, The Lessons of Developing Process Reward Models in Mathematical Reasoning, https://arxiv.org/abs/2501.07301

1月16日, Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models, https://arxiv.org/abs/2501.09686

1月20日, Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223

1月22日, Kimi k1.5: Scaling Reinforcement Learning with LLMs, https://arxiv.org/abs//2501.12599

1月22日, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2501.12948

2月3日, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.06807

2月5日, Demystifying Long Chain-of-Thought Reasoning in LLMs, Demystifying Long Chain-of-Thought Reasoning in LLMs, https://arxiv.org/abs/2502.03373

2月5日, LIMO: Less is More for Reasoning, https://arxiv.org/abs/2502.03387

2月5日, Teaching Language Models to Critique via Reinforcement Learning, https://arxiv.org/abs/2502.03492

2月6日, Training Language Models to Reason Efficiently, https://arxiv.org/abs/2502.04463

2月10日, Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning, https://arxiv.org/abs/2502.06781

2月10日, On the Emergence of Thinking in LLMs I: Searching for the Right Intuition, https://arxiv.org/abs/2502.06773

2月11日, LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!, https://arxiv.org/abs/2502.07374

2月12日, Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance, https://arxiv.org/abs/2502.08127

2月13日, Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging - An Open Recipe, https://arxiv.org/abs/2502.09056

2月20日, Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning, https://arxiv.org/abs/2502.14768

2月25日, SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution, https://arxiv.org/abs/2502.18449

3月4日, Learning from Failures in Multi-Attempt Reinforcement Learning, https://arxiv.org/abs/2503.04808

3月4日, The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models, https://arxiv.org/abs/2503.02875

3月10日, R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2503.05592

3月10日, LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL, https://arxiv.org/abs/2503.07536

3月12日, Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, https://arxiv.org/abs/2503.09516

3月16日, Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models, https://arxiv.org/abs/2503.13551

3月20日, Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't, https://arxiv.org/abs/2503.16219

3月25日, ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning, https://arxiv.org/abs/2503.19470

3月26日, Understanding R1-Zero-Like Training: A Critical Perspective, https://arxiv.org/abs/2503.20783

3月30日, RARE: Retrieval-Augmented Reasoning Modeling, https://arxiv.org/abs/2503.23513

3月31日, Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model, https://arxiv.org/abs/2503.24290

3月31日, JudgeLRM: Large Reasoning Models as a Judge, https://arxiv.org/abs/2504.00050

4月7日, Concise Reasoning via Reinforcement Learning, https://arxiv.org/abs/2504.05185

4月10日, VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning, https://arxiv.org/abs/2504.08837

4月11日, Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning, https://arxiv.org/abs/2504.08672

4月13日, Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability, https://arxiv.org/abs/2504.09639

4月21日, Learning to Reason under Off-Policy Guidance, https://arxiv.org/abs/2504.14945

4月22日, Tina: Tiny Reasoning Models via LoRA, https://arxiv.org/abs/2504.15777

4月29日, Reinforcement Learning for Reasoning in Large Language Models with One Training Example, https://arxiv.org/abs/2504.20571

4月30日, Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math, https://arxiv.org/abs/2504.21233

5月2日, Llama-Nemotron: Efficient Reasoning Models, https://arxiv.org/abs/2505.00949

5月5日, RM-R1: Reward Modeling as Reasoning, https://arxiv.org/abs/2505.02387

5月6日, Absolute Zero: Reinforced Self-play Reasoning with Zero Data, https://arxiv.org/abs/2505.03335

5月12日, INTELLECT-2: A Rea

原文を表示

LLM Research Papers: The 2025 List (January to June)

A topic-organized collection of 200+ LLM research papers from 2025

Sebastian Raschka, PhDJul 01, 2025∙ Paid10059ShareAs some of you know, I keep a running list of research papers I (want to) read and reference.

About six months ago, I shared my 2024 list, which many readers found useful. So, I was thinking about doing this again. However, this time, I am incorporating that one piece of feedback kept coming up: "Can you organize the papers by topic instead of date?"

The categories I came up with are:

Reasoning Models

1a. Training Reasoning Models

1b. Inference-Time Reasoning Strategies

1c. Evaluating LLMs and/or Understanding Reasoning

Other Reinforcement Learning Methods for LLMs

Other Inference-Time Scaling Methods

Efficient Training & Architectures

Diffusion-Based Language Models

Multimodal & Vision-Language Models

Data & Pre-training Datasets

Also, as LLM research continues to be shared at a rapid pace, I have decided to break the list into bi-yearly updates. This way, the list stays digestible, timely, and hopefully useful for anyone looking for solid summer reading material.

Please note that this is just a curated list for now. In future articles, I plan to revisit and discuss some of the more interesting or impactful papers in larger topic-specific write-ups. Stay tuned!

It's summer! And that means internship season, tech interviews, and lots of learning. To support those brushing up on intermediate to advanced machine learning and AI topics, I have made all 30 chapters of my Machine Learning Q and AI book freely available for the summer: 🔗 https://sebastianraschka.com/books/ml-q-and-ai/#table-of-contents Whether you are just curious and want to learn something new or prepping for interviews, hopefully this comes in handy. Happy reading, and best of luck if you are interviewing!

Reasoning Models

This year, my list is very reasoning model-heavy. So, I decided to subdivide it into 3 categories: Training, inference-time scaling, and more general understanding/evaluation.

1a. Training Reasoning Models

This subsection focuses on training strategies specifically designed to improve reasoning abilities in LLMs. As you may see, much of the recent progress has centered around reinforcement learning (with verifiable rewards), which I covered in more detail in a previous article.

The State of Reinforcement Learning for LLM Reasoning

Annotated figure from Reinforcement Pre-Training, https://arxiv.org/abs/2506.08007

8 Jan, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682

13 Jan, The Lessons of Developing Process Reward Models in Mathematical Reasoning, https://arxiv.org/abs/2501.07301

16 Jan, Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models, https://arxiv.org/abs/2501.09686

20 Jan, Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223

22 Jan, Kimi k1.5: Scaling Reinforcement Learning with LLMs, https://arxiv.org/abs//2501.12599

22 Jan, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2501.12948

3 Feb, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.06807

5 Feb, Demystifying Long Chain-of-Thought Reasoning in LLMs, Demystifying Long Chain-of-Thought Reasoning in LLMs, https://arxiv.org/abs/2502.03373

5 Feb, LIMO: Less is More for Reasoning, https://arxiv.org/abs/2502.03387

5 Feb, Teaching Language Models to Critique via Reinforcement Learning, https://arxiv.org/abs/2502.03492

6 Feb, Training Language Models to Reason Efficiently, https://arxiv.org/abs/2502.04463

10 Feb, Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning, https://arxiv.org/abs/2502.06781

10 Feb, On the Emergence of Thinking in LLMs I: Searching for the Right Intuition, https://arxiv.org/abs/2502.06773

11 Feb, LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!, https://arxiv.org/abs/2502.07374

12 Feb, Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance, https://arxiv.org/abs/2502.08127

13 Feb, Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging - An Open Recipe, https://arxiv.org/abs/2502.09056

20 Feb, Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning, https://arxiv.org/abs/2502.14768

25 Feb, SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution, https://arxiv.org/abs/2502.18449

4 Mar, Learning from Failures in Multi-Attempt Reinforcement Learning, https://arxiv.org/abs/2503.04808

4 Mar, The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models, https://arxiv.org/abs/2503.02875

10 Mar, R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2503.05592

10 Mar, LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL, https://arxiv.org/abs/2503.07536

12 Mar, Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, https://arxiv.org/abs/2503.09516

16 Mar, Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models, https://arxiv.org/abs/2503.13551

20 Mar, Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't, https://arxiv.org/abs/2503.16219

25 Mar, ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning, https://arxiv.org/abs/2503.19470

26 Mar, Understanding R1-Zero-Like Training: A Critical Perspective, https://arxiv.org/abs/2503.20783

30 Mar, RARE: Retrieval-Augmented Reasoning Modeling, https://arxiv.org/abs/2503.23513

31 Mar, Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model, https://arxiv.org/abs/2503.24290

31 Mar, JudgeLRM: Large Reasoning Models as a Judge, https://arxiv.org/abs/2504.00050

7 Apr, Concise Reasoning via Reinforcement Learning, https://arxiv.org/abs/2504.05185

10 Apr, VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning, https://arxiv.org/abs/2504.08837

11 Apr, Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning, https://arxiv.org/abs/2504.08672

13 Apr, Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability, https://arxiv.org/abs/2504.09639

21 Apr, Learning to Reason under Off-Policy Guidance, https://arxiv.org/abs/2504.14945

22 Apr, Tina: Tiny Reasoning Models via LoRA, https://arxiv.org/abs/2504.15777

29 Apr, Reinforcement Learning for Reasoning in Large Language Models with One Training Example, https://arxiv.org/abs/2504.20571

30 Apr, Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math, https://arxiv.org/abs/2504.21233

2 May, Llama-Nemotron: Efficient Reasoning Models, https://arxiv.org/abs/2505.00949

5 May, RM-R1: Reward Modeling as Reasoning, https://arxiv.org/abs/2505.02387

6 May, Absolute Zero: Reinforced Self-play Reasoning with Zero Data, https://arxiv.org/abs/2505.03335

12 May, INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning, https://arxiv.org/abs/2505.07291

12 May, MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining, https://arxiv.org/abs/2505.07608

14 May, Qwen3 Technical Report, https://arxiv.org/abs/2505.09388

15 May, Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models, https://arxiv.org/abs/2505.10554

19 May, AdaptThink: Reasoning Models Can Learn When to Think, https://arxiv.org/abs/2505.13417

19 May, Thinkless: LLM Learns When to Think, https://arxiv.org/abs/2505.13379

20 May, General-Reasoner: Advancing LLM Reasoning Across All Domains, https://arxiv.org/abs/2505.14652

21 May, Learning to Reason via Mixture-of-Thought for Logical Reasoning, https://arxiv.org/abs/2505.15817

21 May, RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning, https://arxiv.org/abs/2505.15034

23 May, QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning, https://www.arxiv.org/abs/2505.17667

26 May, Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles, https://arxiv.org/abs/2505.19914

26 May, Learning to Reason without External Rewards, https://arxiv.org/abs/2505.19590

29 May, Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents, https://arxiv.org/abs/2505.22954

30 May, Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning, https://arxiv.org/abs/2505.24726

30 May, ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models, https://arxiv.org/abs/2505.24864

2 Jun, Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning, https://arxiv.org/abs/2506.01939

3 Jun, Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening, https://www.arxiv.org/abs/2506.02355

9 Jun, Reinforcement Pre-Training, https://arxiv.org/abs/2506.08007

10 Jun, RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling, https://arxiv.org/abs/2506.08672

10 Jun, Reinforcement Learning Teachers of Test Time Scaling, https://www.arxiv.org/abs/2506.08388

12 Jun, Magistral, https://arxiv.org/abs/2506.10910

12 Jun, Spurious Rewards: Rethinking Training Signals in RLVR, https://arxiv.org/abs/2506.10947

16 Jun, AlphaEvolve: A coding agent for scientific and algorithmic discovery, https://arxiv.org/abs/2506.13131

17 Jun, Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs, https://arxiv.org/abs/2506.14245

23 Jun, Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training, https://arxiv.org/abs/2506.18777

26 Jun, Bridging Offline and Online Reinforcement Learning for LLMs, https://arxiv.org/abs/2506.21495

1b. Inference-Time Reasoning Strategies

This part of the list covers methods that improve reasoning dynamically at test time, without requiring retraining. Often, these papers are focused on trading of computational performance for modeling performance.

This post is for paid subscribers

この記事をシェア

TechCrunch AI2026年7月5日 00:51

ミストラル AI とは？OpenAI の競合企業に関する全知識

MarkTechPost重要度42026年7月4日 15:32

NVIDIA AI が自己改善型ロボットフレームワーク「ASPIRE」を発表、LIBERO-Pro の長期タスクでゼロショット成功率 31% を達成

MarkTechPost重要度52026年7月4日 07:20

Mistral AI、Apache-2.0ライセンスのLean 4用コードエージェント「Leanstral 1.5」を公開しPutnamBenchで672問中587問を解決

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む