TLDR AI·2026年6月2日 09:00·約3分で読める

JetBrains の Mellum 2（49 分読み）

#LLM #Mixture-of-Experts #オープンソース #コード生成 #JetBrains #推論効率化

TL;DR

JetBrains は、ソフトウェアエンジニアリングに特化したオープンウェイトの MoE モデル「Mellum 2」を Apache 2.0 ライセンスで公開し、12B パラメータ規模でありながら 2.5B の計算コストで動作する高性能なコード生成・推論モデルとして業界に貢献した。

AI深層分析2026年6月12日 23:18

重要/ 5段階

深度40%

キーポイント

MoE アーキテクチャと効率性

64 エキスパート中 8 つを活性化する Mixture-of-Experts (MoE) 構造を採用し、12B パラメータモデルでありながらトークンあたりの計算コストは 2.5B の密型モデル並みに抑えられている。

ソフトウェアエンジニアリング特化

コード生成・編集、デバッグ、ツール使用、アジェンティックコーディングなど、開発者支援に特化した機能と 128K のコンテキストウィンドウを備えている。

思考プロセスの明示化

回答を直接出力する「Instruct モデル」と、推論トレースを明示的に生成する「Thinking モデル」の 2 バリアントがリリースされ、複雑な推論タスクへの対応力を高めている。

学習データとトレーニング手法

約 10.6 トリオントークンの事前学習（Web データからコード・数学へシフト）と、Muon オプティマイザ、FP8 ハイブリッド精度、YaRN によるコンテキスト拡張など、最新のトレーニング技術が採用されている。

影響分析・編集コメントを表示

影響分析

このリリースは、オープンソースコミュニティにおいて、高コストなクローズドモデルに匹敵するコード特化 AI を低計算リソースで利用可能にする重要な転換点となりました。特に「Thinking モデル」の公開と MoE による効率化は、開発現場におけるローカル環境での大規模推論や、複雑なデバッグ支援の実用性を飛躍的に高めるものであり、AI エンジニアリングのパラダイムシフトを加速させるでしょう。

編集コメント

開発者向けに特化した高性能モデルが、計算コストを抑えつつオープンソースで公開された点は非常に画期的です。特に「Thinking モデル」の提供は、単なるコード補完を超えた複雑な推論タスクへの信頼性を担保する重要な要素と言えます。

PDF を表示

HTML（実験的）

要約：Mellum 2 を発表します。これは、1 トークンあたり 25 億パラメータが活性化するオープンウェイトの 120 億パラメータ Mixture-of-Experts (MoE) 言語モデルです。Mellum 2 はソフトウェアエンジニアリングに特化した汎用言語モデルであり、コード生成と編集、デバッグ、多段階推論、ツール使用および関数呼び出し、エージェント型コーディング、対話型プログラミング支援を網羅しています。これは完了指向の 40 億パラメータ密度モデルである Mellum の後継モデルです。アーキテクチャは Mixture-of-Experts (64 エキスパート中 8 が活性) を基盤とし、Grouped-Query Attention (4 つの KV ヘッド付き)、4 レイヤーに 3 レイヤー分の Sliding Window Attention、および補助的な事前学習目的と推測デコーディング用の内蔵ドラフトモデルの両方として機能する単一の Multi-Token Prediction ヘッドを組み合わせました。各選択は、コモディティ GPU における推論効率を設計制約としたアブレーション検証によって裏付けられています。事前学習は、多様なウェブデータから厳選されたコードおよび数学コンテンツへと段階的に混合比率をシフトさせる 3 フェーズのカリキュラムを通じて、約 10.6 トリリオントークンにわたって行われます。これは Muon オプティマイザを用いて FP8 ハイブリッド精度で最適化され、線形減衰でゼロに至る Warmup-Hold-Decay スケジュールで学習されます。事前学習済みベースモデルは、レイヤー選択型の YaRN を用いて 128K コンテキストウィンドウに拡張され、その後 2 つの段階（教師あり微調整に続く RLVR）でポストトレーニングが行われます。その結果、直接回答する Instruct モデルと、最終回答の前に明示的な推論トレースを出力する Thinking モデルという 2 つのリリース版が得られます。コード生成、数学・推論、ツール使用、知識、安全性の各ベンチマークにおいて、Mellum 2 は 40 億〜140 億パラメータ範囲のオープンウェイトベースラインと競合する性能を示しつつ、25 億パラメータ密度モデルに相当するトークンあたりの計算量で動作します。私たちは、アーキテクチャ決定、データパイプライン、トレーニングレシピに関する本報告書とともに、ベース、インストラクション、思考の各チェックポイントを Apache 2.0 ライセンスの下で公開します。

主題:

コンピュータ言語 (cs.CL)

引用形式:

arXiv:2605.31268 [cs.CL]

(またはこのバージョンについては

arXiv:2605.31268v1 [cs.CL])

https://doi.org/10.48550/arXiv.2605.31268

DataCite 経由の arXiv 発行 DOI

## 提出履歴

From: Nikiita Pavlichenko [メールを表示]

[v1]**

2026 年 5 月 29 日 (金) 13:01:11 UTC (1,508 KB)

原文を表示

View PDF

HTML (experimental)

Abstract:We present Mellum 2, an open-weight 12B-parameter Mixture-of-Experts (MoE) language model with 2.5B active parameters per token. Mellum 2 is a general-purpose language model specialized in software engineering, spanning code generation and editing, debugging, multi-step reasoning, tool use and function calling, agentic coding, and conversational programming assistance, and it is the successor to the completion-focused 4B dense Mellum model. The architecture builds on the Mixture-of-Experts (64 experts, 8 active) and combines Grouped-Query Attention with 4 KV heads, Sliding Window Attention on three of every four layers, and a single Multi-Token Prediction head that doubles as both an auxiliary pre-training objective and a built-in draft model for speculative decoding; each choice was validated by ablation with inference efficiency on commodity GPUs as a design constraint. Pre-training spans approximately 10.6 trillion tokens through a three-phase curriculum that progressively shifts the mixture from diverse web data toward curated code and mathematical content, optimized with Muon under FP8 hybrid precision and a Warmup-Hold-Decay schedule with linear decay to zero. The pre-trained base is extended to a 128K context window via a layer-selective YaRN and then post-trained in two stages (supervised fine-tuning followed by RLVR), yielding two released variants: an Instruct model that answers directly and a Thinking model that emits an explicit reasoning trace before its final answer. Across code generation, math and reasoning, tool use, knowledge, and safety benchmarks, Mellum 2 is competitive with open-weight baselines in the 4B-14B range while running at the per-token compute of a 2.5B dense model. We release the base, instruct, and thinking checkpoints, together with this report on the architecture decisions, data pipeline, and training recipe behind them, under the Apache 2.0 license.

Subjects:

Computation and Language (cs.CL)

Cite as:

arXiv:2605.31268 [cs.CL]

(or

arXiv:2605.31268v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2605.31268

arXiv-issued DOI via DataCite

Submission history

From: Nikiita Pavlichenko [view email] [v1]

Fri, 29 May 2026 13:01:11 UTC (1,508 KB)

この記事をシェア

MarkTechPost★32026年6月19日 11:44

Salesforce CodeGen チュートリアル：ユニットテストと安全性チェック付きの Python 関数の生成・検証・再ランク付け

Salesforce は Hugging Face からモデルを読み込み、自然言語から Python 関数を生成するエンドツーエンドワークフローを公開した。この手法には構文チェックや静的解析、ユニットテストによる検証が含まれる。

Latent Space2026年6月20日 17:06

[AINews] 今日特に大きな出来事はありませんでした

Latent Space は、GLM 5.2 が依然として注目されていると指摘しつつ、AIE WF 2026 の通常チケットが月曜日に完売すると発表しました。同サイト購読者向けに限定割引を提供し、参加者には Warp や Datadog などからのスポンサークレジットも付与されます。

TechCrunch AI★42026年6月20日 01:01

米国がアンソロピックの「Fable 5」発売を禁止、しかし市場は動じず

米国政府は国家安全保障上の懸念から、アマゾンの研究者らがガードレール回避手法を発見したとして、アンソロピックに対し最新モデル「Fable 5」と「Mythos 5」の販売差し止めを命じた。サイバーセキュリティ研究者らはこの措置が危険だとする公開書簡に署名し、同社も他モデルでも同様の抜け道が存在すると指摘している。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

TLDR AI·2026年6月2日 09:00·約3分で読める

JetBrains の Mellum 2（49 分読み）

#LLM #Mixture-of-Experts #オープンソース #コード生成 #JetBrains #推論効率化

TL;DR

AI深層分析2026年6月12日 23:18

重要/ 5段階

深度40%

キーポイント

MoE アーキテクチャと効率性

ソフトウェアエンジニアリング特化

思考プロセスの明示化

学習データとトレーニング手法

影響分析・編集コメントを表示

影響分析

編集コメント

PDF を表示

HTML（実験的）

主題:

コンピュータ言語 (cs.CL)

引用形式:

arXiv:2605.31268 [cs.CL]

(またはこのバージョンについては

arXiv:2605.31268v1 [cs.CL])

https://doi.org/10.48550/arXiv.2605.31268

DataCite 経由の arXiv 発行 DOI

## 提出履歴

From: Nikiita Pavlichenko [メールを表示]

[v1]**

2026 年 5 月 29 日 (金) 13:01:11 UTC (1,508 KB)

原文を表示

View PDF

HTML (experimental)

Abstract:We present Mellum 2, an open-weight 12B-parameter Mixture-of-Experts (MoE) language model with 2.5B active parameters per token. Mellum 2 is a general-purpose language model specialized in software engineering, spanning code generation and editing, debugging, multi-step reasoning, tool use and function calling, agentic coding, and conversational programming assistance, and it is the successor to the completion-focused 4B dense Mellum model. The architecture builds on the Mixture-of-Experts (64 experts, 8 active) and combines Grouped-Query Attention with 4 KV heads, Sliding Window Attention on three of every four layers, and a single Multi-Token Prediction head that doubles as both an auxiliary pre-training objective and a built-in draft model for speculative decoding; each choice was validated by ablation with inference efficiency on commodity GPUs as a design constraint. Pre-training spans approximately 10.6 trillion tokens through a three-phase curriculum that progressively shifts the mixture from diverse web data toward curated code and mathematical content, optimized with Muon under FP8 hybrid precision and a Warmup-Hold-Decay schedule with linear decay to zero. The pre-trained base is extended to a 128K context window via a layer-selective YaRN and then post-trained in two stages (supervised fine-tuning followed by RLVR), yielding two released variants: an Instruct model that answers directly and a Thinking model that emits an explicit reasoning trace before its final answer. Across code generation, math and reasoning, tool use, knowledge, and safety benchmarks, Mellum 2 is competitive with open-weight baselines in the 4B-14B range while running at the per-token compute of a 2.5B dense model. We release the base, instruct, and thinking checkpoints, together with this report on the architecture decisions, data pipeline, and training recipe behind them, under the Apache 2.0 license.

Subjects:

Computation and Language (cs.CL)

Cite as:

arXiv:2605.31268 [cs.CL]

(or

arXiv:2605.31268v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2605.31268

arXiv-issued DOI via DataCite