読み込み中…

NVIDIA Developer Blog·2026年6月4日 22:02·約14分

NVIDIA Nemotron 3 Ultra が長時間実行型エージェントの推論を高速化・効率化

#LLM #AI Agents #Reasoning #Mixture-of-Experts #NVIDIA Nemotron

TL;DR

NVIDIA は、長期間実行される AI エージェントの推論コストを削減し効率を向上させるため、550B パラメータの MoE モデル「Nemotron 3 Ultra」を発表した。

AI深層分析2026年6月5日 10:30

重要/ 5段階

深度40%

キーポイント

エージェントワークフローにおける課題と解決策

長期間実行されるマルチエージェントシステムでは、トークン数の増加に伴いコスト増や目標の逸脱（goal drift）が懸念されており、NVIDIA は「前線推論モデル」と「効率的な実行モデル」を組み合わせるアーキテクチャを提案している。

Nemotron 3 Ultra の技術的特徴

550B パラメータの MoE モデルであり、推論時に 55B パラメータが活性化される設計で、複雑な計画や矛盾する証拠の統合など、深い推論を要するタスクに特化している。

競合モデルとのベンチマーク比較

PinchBench におけるエージェント生産性で GLM 5.1 や Qwen3.5 を上回る 91% のスコアを記録し、特に指示従順性や専門業務タスクにおいて高い性能を示した。

高速スループットによる効率化

同クラスのオープンモデルと比較して 5 倍の高いスループットを達成しており、これにより長時間稼働するエージェントがタスクをより迅速かつ効率的に完了できるようになります。

推論速度の劇的な向上

Nemotron 3 Ultra は、Artificial Analysis のベンチマークにおいて、競合する最先端モデルと比較して5倍高速な推論を実現しています。

精度と速度の両立による優位性

このモデルは高い推論速度を維持しつつも、Artificial Analysis Intelligence Index において最高クラスの精度を達成しており、最も魅力的なパフォーマンス領域（クアドラント）に位置しています。

ベンチマークでの効率性向上

SWE-bench および Terminal bench 2.0 の実験において、同等モデルと比較して必要なトークン数とターンあたりのトークン数が削減されました。

重要な引用

Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete complex workflows.

Nemotron 3 Ultra is a 550B-parameter Mixture-of-Experts model with 55B active parameters, built for frontier reasoning and orchestration in agentic systems.

Developers can solve this using a system of models: frontier reasoning models for orchestration and complex planning, and efficient models for high-volume execution, validation, and tool calling.

Nemotron 3 Ultra delivers frontier accuracy in a smaller model

It achieves 5x higher throughput compared to other open models in its class, enabling long-running agents to complete tasks faster and more efficiently.

Nemotron 3 Ultra achieves 5x faster inference while delivering leading accuracy on the Artificial Analysis Intelligence Index leaderboard

影響分析・編集コメントを表示

影響分析

この発表は、AI エージェントが単なるチャットボットから複雑なワークフローを実行する自律的な存在へと進化していく中で、発生するトークンコストと推論の安定性という最大の課題に対する具体的な解決策を示すものです。特にオープンソースモデルとして高性能を維持しつつ、特定のタスクに特化したアーキテクチャを採用した点は、開発者が大規模エージェントシステムを構築・運用する際の基準となる可能性があります。

編集コメント

長期的なエージェント運用におけるコストと精度のトレードオフを解決する、実用的なアーキテクチャ提案が含まれています。特に MoE モデルの活用により、複雑な推論タスクを効率的に処理できる点は注目すべき進展です。

単発チャットボットは、推論を行い、文脈を維持し、ツールを使用し、多くのターンにわたって効率的に動作して複雑なワークフローを完了できる長期間実行型のエージェントへと進化しています。

しかし、これらのマルチエージェントワークフローではトークン数が急速に増加します。エージェントは計画を立て、ツールを呼び出し、サブエージェントを起動し、情報を取得した後、履歴、出力、推論ステップをモデルに継続的に戻します。タスクが長期間実行されるにつれて、この絶え間ない通信によりコストが増大し、目標の逸脱リスクも高まります。

開発者は、オーケストレーションと複雑な計画のためのフロンティア推論モデルと、大量の実行、検証、ツール呼び出しのための効率的なモデルというモデルシステムを用いてこれを解決できます。

NVIDIA は、長期間実行型エージェントがタスクをより高速に完了し、コストを削減できるよう支援するオープンモデル「NVIDIA Nemotron 3 Ultra」をリリースします。

エージェントオーケストレーションのための Nemotron 3 Ultra

Nemotron 3 Ultra は、550B パラメータの Mixture-of-Experts モデルであり、アクティブなパラメータ数は 55B です。これは、エージェントシステムにおけるフロンティア推論とオーケストレーションのために構築されたモデルです。

エージェントワークフロー内では、ほとんどの呼び出しは日常的なものですが、深い推論を必要とする重要なサブセットが存在します。Nemotron 3 Ultra はこれらの困難な呼び出しを処理するために構築されています：コーディングセッション全体にわたるアーキテクチャ決定の維持、数百の研究ソースにわたる矛盾する証拠の統合、あるいは数千の制約にわたるチップ設計の検証などです。

Nemotron 3 Ultra (550B) GLM 5.1 (744B) Kimi K2.6 (1T) Qwen3.5 (397B)

エージェント生産性PinchBench91%84%91%89%

長期計画EnterpriseOps-Gym33%40%29%30%

コーディングTerminal-Bench 2.054%64%67%53%

指示従順性IFBench82%77%74%78%

知識労働GDPVal-AA1,4481,5941,5081,192

専門業務タスクProfBench (Search)56%46%56%53%

長期コンテキストRuler @1M95%N/A (最大 256K)N/A (最大 256K)90%

*表 1. Nemotron 3 Ultra は、より小さなモデルで最先端の精度を実現します*

Nemotron 3 Ultra はまた高速です。同クラスの他のオープンモデルと比較してスループットが 5 倍向上しており、長時間実行されるエージェントがタスクをより迅速かつ効率的に完了できるようになります。

image*図 2. Nemotron 3 Ultra は、Artificial Analysis Intelligence Index リーダーボードで最先端の精度を提供しながら、推論速度を 5 倍向上させます*

Nemotron 3 Ultra は、効率性も重視して構築されています。SWE-bench や Terminal bench 2.0 における実験では、同等のモデルと比較して、必要なトークン総数とターンあたりのトークン数を削減し、ベンチマークを完了しました。これにより、エージェントタスクのコストは最大 30% 削減されます。

image*図 3. Nemotron 3 Ultra は、タスク完了までのコストを 30% 削減します*

Nemotron 3 Ultra を支える画期的な技術

高容量推論モデルにありがちな効率性と精度のトレードオフを緩和するため、Nemotron モデルはアーキテクチャ上の革新を導入しました:

エージェントハネス向けにポストトレーニング**Nemotron Ultra は、主要なハネス全体で一貫した精度を発揮するようにポストトレーニングされています。このモデルは、世界中でも最大規模の長期間実行・タスク解決・ツール使用データセットの一つを備え、NVIDIA の NeMo RL および Gym というオープンライブラリを使用してトレーニングされています。

Ultra は単発のチャットだけでなく、エージェント主導のオープンハネスに最適化されており、エージェントが計画を立て、ツールを呼び出し、観測結果を読み取り、サブエージェントへ委任し、出力を検証し、多くのターンにわたってエラーから回復するワークフロー内で動作するように設計されています。

ハイブリッド Mamba TransformerMamba レイヤーは、長文コンテキストの負荷に対するシーケンス効率を向上させます。一方、Transformer レイヤーは、エージェントが大きなコンテキストウィンドウから特定の事実を正確に検索する必要がある場合に、精密な想起能力を維持します。

NVFP4 精度****同じ NVFP4 チェックポイントは、NVIDIA Hopper、NVIDIA Blackwell、および Ampere GPU で実行可能です。専用の NVFP4 量子化カーネルにより、開発者はすべての NVIDIA GPU アーキテクチャで単一のチェックポイントを使用できます。また、Blackwell 上では、BF16 と比較して同等の応答性において、GPU あたりスループットが最大 5 倍向上します。

LatentMoE****LatentMoE はより効率的なエキスパートルーティングをサポートし、推論、コード生成、ツール呼び出し、ドメイン固有ロジックにわたるワークフローをモデルが処理可能にします。

Multi-token prediction****マルチトークン予測（MTP）は、単一の順伝播パスで複数の未来トークンを予測することで生成時間を短縮し、長い出力や多段階のワークフローにおけるスループットを向上させます。

Nemotron 3 Ultra が追加するマルチティーチャーオンポリシー蒸留

マルチティーチャーオンポリシー蒸留（MOPD）は、Ultra モデルがトレーニング中に独自の試行を生成しながら、複数の専門化された教師モデルから学習するトレーニング手法です。10 以上の専門化された教師モデルが訓練されており、それぞれが独自にドメイン固有のトレーニングパイプラインを持っています。各教師は自身の専門分野においてモデルを評価し、Ultra がドメイン横断的な推論能力をより効率的に向上させるのを支援します。

image*図 4. MOPD および Nemotron 3 Ultra で使用される特定の流れの視覚的ガイド*

MOPD では、学生モデルが各ドメインでロールアウトを生成し、対応する教師モデルから密な報酬信号を受信します。効率を最大化するため、MOPD は非同期で実行され、学生のロールアウト生成、教師によるスコアリング、学生の最適化が完全にパイプライン化されています。

MOPD は反復処理でもあります。MOPD 訓練済みチェックポイントの生成後、更新された学生モデルから新しいラウンドの教師訓練が開始され、その改善点は次の MOPD ステージに統合されます。

この学生と教師との共進化により、継続的な能力向上とドメイン横断での段階的な専門性の強化が可能になります。ユーザーは、Ultra モデルを訓練したライブラリである NeMo-RL を通じて MOPD レシピを試すことができます。

より強力なエージェント推論のための訓練データ

Nemotron オープンモデルのリリースにおいて、すべてのケースと同様に、訓練データパイプラインの多くは可能な限り自由に公開されています。企業や主権 AI 開発におけるパートナーにとって、訓練データの透明性と出所（プロヴェナンス）は、能力と同等に重要です。

ドメイン固有の前訓練データ **

10T トークンの前訓練基盤を踏まえ、Nemotron 3 Ultra は、3 つの主要な高価値ドメインギャップを対象とした 212B の新規トークンを追加します：

合成法務データ 4B トークンにより、プロキシ指標である LegalBench の平均値が 64.6% から 74.7% に向上

Wiki ベースの合成データ 35B トークンにより、プロキシ指標である SimpleQA が 40.2% から 50.2% に向上

2025 年 9 月 30 日までの 173B の更新された GitHub トークン

ポストトレーニングデータと RL エージェント環境

今回のリリースでは、10M の新しい SFT サンプル、複数のドメインにわたる 1M の新しい RL タスク、および 15 の新規 RL エージェント環境が追加され、累積的な Nemotron オープンデータの総量は、SFT サンプルで 50M、RL タスクで 2M、RL エージェント環境で 55 に達します。

その結果、Pi、OpenHands、Hermes、OpenCode、Mini SWE Agent において SWEBench Verified のスコアは 65% から 70.4% の範囲となり、どのフレームワークを採用しても一貫したパフォーマンスを発揮します。

ドメイン向けにファインチューニング

Nemotron 3 Ultra は、NVIDIA NeMo ライブラリを使用して LoRA、SFT、および強化学習（Reinforcement Learning: RL）によるファインチューニングが可能です。開発者は以下のレシピから始められます。

Nemotron 3 Ultra レシピ:

SFT LoRA: NeMo Automodel (H100 Recipe, GB200 Recipe)

Full SFT: NeMo Megatron Bridge Recipes

Reinforcement Learning: NeMo RL GRPO recipe、GRPO LoRA recipe、MOPD recipe

Deployment: Dynamo Recipe

実際の動作を確認する

このウォークスルーでは、build.nvidia.com で Nemotron 3 Ultra に搭載された Hermes Agent を使用して、自動研究フローを起動して実行する方法を示します。

*ビデオ 1. Hermes Agent と Nemotron 3 Ultra を用いた自律型アシスタントの構築チュートリアルウォークスルー*

NVIDIA NemoClaw および NVIDIA OpenShell でエージェントをより安全に実行する

Nemotron モデルは、主要なオープンエージェントフレームワークと統合されています。安全で常時稼働するエージェントシステムを構築するには、参照スタックを理解することが重要です。

Hermes Agent と OpenClaw: これらは多ターンワークフローのためのオーケストレーションループ、メモリ、およびツールを提供する人気のあるエージェントハーンです。Hermes Agent は現在公式に利用可能であり、Nemotron での使用が完全にサポートされています。

NVIDIA OpenShell: 現在は早期プレビュー版として利用可能です。OpenShell は、自律型エージェントとその生成コードが実行される安全なランタイム環境（NVIDIA Agent Toolkit の一部）です。

NVIDIA NemoClaw: これは環境を統合するオープンソースの設計図です。単一のコマンドで NemoClaw を実行すると、OpenShell ランタイムがインストールされ、Hermes Agent などの自律型エージェントを Nemotron などのオープンソースモデルとより安全に並列して実行するための安全な環境が提供されます。

より安全で音声対応のエージェントの構築

2 つの新規 Nemotron モデルも同時にリリースされます:

Nemotron 3.5 コンテンツセーフティ

より安全なエンタープライズ AI を構築するチーム向けに、Nemotron 3.5 Content Safety は、テキスト、画像、および複合入力全体にわたって不安全、禁止、またはポリシー違反のコンテンツを分類するためのオープンで効率的な 4B のガードレールモデルです。

23 の安全カテゴリと 12 の言語をカバーし、推論時のガードレールとして使用したり、LLM（大規模言語モデル）の安全性テストおよび評価のためのジャッジとして機能させたり、あるいは付随するトレーニングデータセットを用いてより安全な動作を行うようにモデルをポストトレーニングしたりできます。カスタムポリシーサポートと推論トレイルにより、企業はドメイン固有のルールに合わせた意思決定の適応、監査分類、そしてグローバルな AI ワークフロー全体での安全性制御の展開が可能になります。詳しくは Hugging Face の投稿をご覧ください。

Nemotron 3.5 ASR**

音声ネイティブエージェント向けに、Nemotron 3.5 ASR は、英語版の先行モデルである Nemotron 3 ASR と同じキャッシュ対応ストリーミングアーキテクチャを採用し、オーディオデルタを即座に処理します。冗長なバッファ計算を排除することで、エージェント群のための自然でリアルタイムな音声オーケストレーションにおいて 100 ミリ秒未満のレイテンシを実現します。

英語モデルはすでに強力な開発者による採用が進んでおり、2,000 万人以上の開発者に利用されている Microsoft GitHub Copilot CLI の音声入力機能を支えています。50 以上のオンデバイス ASR 構成に関する独立したベンチマークでは、リソース制約のあるハードウェア上でのリアルタイム英語ストリーミングにおいて Nemotron 3 ASR が最も有力な候補であると特定されました。現在、この同じアーキテクチャがマルチリンガル化され、単一のチェックポイントで 40 以上の言語をサポートするようになりました。

より広い採用に向けた更新されたオープンライセンス

Nemotron モデルのリリースは、オープン AI モデル配布用に特別に設計された Linux Foundation の包括的ライセンスである OpenMDW-1.1 へ移行しています。OpenMDW は、アーキテクチャ、パラメータ、ドキュメント、ソフトウェア、およびその他の関連アーティファクトを含むモデル資料の完全セットを単一のフレームワークの下でカバーするように設計されています。

これにより、開発者や企業は Nemotron モデルの使用、修正、再配布、デプロイに関する明確な利用規約を得ることができ、オープンモデルの評価と採用を遅らせる可能性のあるライセンスの不確実性が軽減されます。

今日から構築を開始する

Nemotron 3 Ultra は、重み（weights）、データ、レシピを含め完全にオープン化されており、開発者はドメイン固有のワークフローに合わせてモデルを適応させ、どこでもデプロイできます。主要な推論プラットフォーム全体で利用可能であり、NVIDIA NIM マイクロサービスとしてパッケージ化されているため、あらゆる場所で実行可能です。Pro サブスクリプションまたは API を通じて Perplexity で試すか、OpenRouter、Anaconda、または build.nvidia.com を通じてお試しください。重みは Hugging Face からダウンロードし、NVIDIA NIM を介して最適化されたインスタンスを起動するか、クックブックから始めて数分で実行を開始できます。

Nemotron 3 Ultra は、AWS JumpStart、Amazon EKS、Baseten、Bitdeer AI、CoreWeave、Crusoe、DeepInfra、Dell Enterprise Hub、DigitalOcean、Eigen AI、fal (ASR)、Fireworks AI、FriendliAI、GMI Cloud、Google Cloud、Lightning AI、Microsoft Foundry、Modal、Nebius Token Factory、Prime Intellect、Simplismart、Together AI (ASR と併用)、Vultr からも利用可能です。

エージェントハッチの入門ガイドについては、GitHub リポジトリをご覧ください。ここでは BlackBox AI、Cline、CrewAI、Factory AI、Hermes Agent、Kilo Code、LangChain Deep Agents、OpenClaw、OpenCode、OpenHands、Pi などの利用方法も紹介されています。

詳細な技術情報については、Nemotron 3 Ultra 技術報告書をお読みください。

*NVIDIA Nemotron* について最新情報をお知りになりたい方は、*NVIDIA のニュースに購読していただき、*LinkedIn*、*X*、*Discord*、そして *YouTube* で NVIDIA AI をフォローしてください。

*スタートアップのためのリソースについては、*Nemotron 開発者ページ* をご訪問ください。*Hugging Face* および *Blueprints*（*build.nvidia.com* 上）で、オープンな Nemotron モデルとデータセットを検索してください。

Nemotron ライブストリーム*、*チュートリアル*、および *NVIDIA フォーラム](https://forums.developer.nvidia.com/c/ai-data-science/nvidia-nemotron/669)* や*Discord* の開発者コミュニティと交流してください。

原文を表示

Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete complex workflows.

However, these multi-agent workflows cause token counts to grow quickly. Agents plan, call tools, invoke sub-agents, receive information, and then pass history, outputs, and reasoning steps back into the model continuously. As tasks run longer, this constant communication increases costs and the risk of goal drift.

Developers can solve this using a system of models: frontier reasoning models for orchestration and complex planning, and efficient models for high-volume execution, validation, and tool calling.

NVIDIA is releasing NVIDIA Nemotron 3 Ultra, an open model built to help long-running agents complete tasks faster while lowering cost.

Nemotron 3 Ultra for agent orchestration

Nemotron 3 Ultra is a 550B-parameter Mixture-of-Experts model with 55B active parameters, built for frontier reasoning and orchestration in agentic systems.

Within any agent workflow, most calls are routine, but a critical subset demands deeper reasoning. Nemotron 3 Ultra is built to handle these hard calls: sustaining architectural decisions across coding sessions, synthesizing contradictory evidence across hundreds of research sources, or verifying chip designs across thousands of constraints.

Nemotron 3 Ultra is also fast. It achieves 5x higher throughput compared to other open models in its class, enabling long-running agents to complete tasks faster and more efficiently.

Figure 2. Nemotron 3 Ultra achieves 5x faster inference while delivering leading accuracy on the Artificial Analysis Intelligence Index leaderboard

Nemotron 3 Ultra is also built for efficiency. In experiments on the SWE-bench and Terminal bench 2.0, it completed benchmarks using fewer total tokens and fewer tokens per turn than comparable models. This lowers the cost for agentic tasks by up to 30%.

Figure 3. Nemotron 3 Ultra lowers the cost to task completion by 30%

Breakthroughs powering Nemotron 3 Ultra

To mitigate the typical efficiency-accuracy tradeoffs for high-capacity reasoning models, the Nemotron models introduce architectural innovations:

Post-trained for agent harness**Nemotron Ultra is post-trained to deliver consistent accuracy across top harnesses. The model is trained using the NVIDIA NeMo RL and Gym open libraries with one of the largest suites of long-running, task-solving, tool-using datasets in the world.

Ultra is optimized for agent-led open harnesses, not just single-turn chat, and is designed to work within workflows where agents plan, call tools, read observations, delegate to sub-agents, validate outputs, and recover from errors across many turns.

Hybrid Mamba transformer****Mamba layers improve sequence efficiency for long-context workloads, while Transformer layers preserve precise recall when agents need to retrieve specific facts from large context windows.

NVFP4 precision****The same NVFP4 checkpoint runs on NVIDIA Hopper, NVIDIA Blackwell, and Ampere GPUs. Developers can use one checkpoint across all NVIDIA GPU architectures thanks to specialized NVFP4 quantization kernels. NVFP4 also delivers up to 5x higher throughput per GPU at the same interactivity compared to BF16 on Blackwell.

LatentMoE****LatentMoE supports more efficient expert routing, enabling the model to handle workflows spanning reasoning, code generation, tool calls, and domain-specific logic.

Multi-token prediction****Multi-token prediction (MTP) helps reduce generation time by predicting multiple future tokens in a single forward pass, improving throughput for long outputs and multi-turn workflows.

Nemotron 3 Ultra adds Multi-Teacher On-Policy Distillation

Multi-Teacher On-Policy Distillation (MOPD) is a training method in which Ultra learns from multiple specialized teacher models while generating its own attempts during training. More than 10 specialized teacher models are trained, each with its own domain-specific training pipeline. Each teacher scores the model in its area of expertise, helping Ultra improve reasoning across domains more efficiently.

Figure 4. A visual guide to MOPD and the specific flow used for Nemotron 3 Ultra

During MOPD, the student model generates rollouts across domains and receives dense reward signals from the corresponding teacher models. To maximize efficiency, MOPD runs asynchronously, with student rollout generation, teacher scoring, and student optimization fully pipelined.

MOPD is also iterative. After producing an MOPD-trained checkpoint, new rounds of teacher training are initialized from the updated student model, and the improvements are merged into the next MOPD stage.

This co-evolution between students and teachers enables continuous capability improvement and progressively stronger specialization across domains. Users can try MOPD recipes through NeMo-RL, the library that trained the Ultra model.

Training data for stronger agent reasoning

As with all Nemotron open model launches, much of the training data pipeline is released as permissively as possible. For partners in enterprise and sovereign AI development, training data transparency and provenance matter as much as capability.

Domain-specific pre-training data **

Building on a 10T token pre-training foundation, Nemotron 3 Ultra adds 212B new tokens targeting three high-value domain gaps:

4B tokens of synthetic legal data, increasing the proxy LegalBench average from 64.6% to 74.7%

35B tokens of synthesized Wiki-based data, boosting proxy SimpleQA from 40.2% to 50.2%

173B refreshed GitHub tokens through Sept. 30, 2025

Post-training data and RL environments

This launch is also releasing 10M new SFT samples, 1M new RL tasks across multiple domains, and 15 net-new RL environments, bringing the cumulative Nemotron open data totals to 50M SFT samples, 2M RL tasks, and 55 RL environments.

The result is SWEBench Verified scores between 65% and 70.4% across Pi, OpenHands, Hermes, OpenCode, and Mini SWE Agent—consistent performance regardless of which framework you deploy.

Finetune for your domain

Nemotron 3 Ultra can be fine-tuned using LoRA, SFT, and reinforcement learning using the NVIDIA NeMo libraries. Developers can get started with the following recipes.

Nemotron 3 Ultra Recipes:

SFT LoRA: NeMo Automodel (H100 Recipe, GB200 Recipe)

Full SFT: NeMo Megatron Bridge Recipes

Reinforcement Learning: NeMo RL GRPO recipe, GRPO LoRA recipe, MOPD recipe

Deployment: Dynamo Recipe

See it in action

This walkthrough shows how to spin up and run an autoresearch flow using Hermes Agent powered by Nemotron 3 Ultra on build.nvidia.com.

Run agents more safely with NVIDIA NemoClaw and NVIDIA OpenShell

Nemotron models integrate with leading open agent frameworks. To build a secure, always-on agentic system, it is important to understand the reference stack:

Hermes Agent and OpenClaw: These are popular agent harnesses that provide the orchestration loops, memory, and tools for multi-turn workflows. Hermes Agent is now officially available and fully supported for use with Nemotron.

NVIDIA OpenShell: Available now in early preview, OpenShell is the secure runtime environment (part of the NVIDIA Agent Toolkit) where autonomous agents and their generated code execute.

NVIDIA NemoClaw: This is the open-source blueprint that ties the environment together. With a single command, NemoClaw installs the OpenShell runtime—providing a secure environment for running autonomous agents like Hermes Agent more safely alongside open-source models like Nemotron.

Build safer and voice-enabled agents

Two new Nemotron models are also launching:

Nemotron 3.5 Content Safety**For teams building safer enterprise AI, Nemotron 3.5 Content Safety is an open, efficient 4B guardrail model for classifying unsafe, disallowed, or policy-violating content across text, images, and combined inputs.

Covering 23 safety categories and 12 languages, it can be used as an inference-time guardrail, as a judge for LLM safety testing and evaluation, or with the accompanying training dataset to post-train models for safer behavior. Custom policy support and reasoning trails help enterprises adapt safety decisions to domain-specific rules, audit classifications, and deploy safety controls across global AI workflows. Read the Hugging Face post to learn more.

Nemotron 3.5 ASR**

For voice-native agents, Nemotron 3.5 ASR uses the same cache-aware streaming architecture as its English predecessor, Nemotron 3 ASR, to process audio deltas instantly. Eliminating redundant buffered compute ensures sub-100 ms latency for natural, real-time voice orchestration for your agentic swarms.

The English model has seen strong developer adoption, including powering the voice input feature in Microsoft GitHub Copilot CLI, used by more than 20M developers. An independent benchmark of 50+ on-device ASR configurations identified Nemotron 3 ASR as the strongest candidate for real-time English streaming on resource-constrained hardware. Now, that same architecture goes multilingual with support for 40+ languages in a single checkpoint.

Updated open licensing for broader adoption

Nemotron model releases are moving to OpenMDW-1.1, the Linux Foundation’s permissive license purpose-built for open AI model distributions. OpenMDW is designed to cover the full set of model materials, including architecture, parameters, documentation, software, and other related artifacts, under a single framework.

This gives developers and enterprises clearer terms for using, modifying, redistributing, and deploying Nemotron models, while reducing the licensing ambiguity that can slow evaluation and adoption of open models.

Start building today

Nemotron 3 Ultra is fully open—including weights, data, and recipes—so developers can adapt the models to domain-specific workflows and deploy them anywhere. It is available across leading inference platforms and packaged as an NVIDIA NIM microservice, it can run anywhere. Try it on Perplexity with a Pro subscription or through API, OpenRouter, Anaconda, or build.nvidia.com. Download the weights from Hugging Face, launch an optimized instance through NVIDIA NIM, or start with the cookbooks to get running in minutes.

Nemotron 3 Ultra is also available through AWS JumpStart, Amazon EKS, Baseten, Bitdeer AI, CoreWeave, Crusoe, DeepInfra, Dell Enterprise Hub, DigitalOcean, Eigen AI, fal (ASR), Fireworks AI, FriendliAI, GMI Cloud, Google Cloud, Lightning AI, Microsoft Foundry, Modal, Nebius Token Factory, Prime Intellect, Simplismart, Together AI (along with ASR), and Vultr.

Check out the GitHub repository for getting-started instructions for agent harness, including BlackBox AI, Cline, CrewAI, Factory AI, Hermes Agent, Kilo Code, LangChain Deep Agents, OpenClaw, OpenCode, OpenHands, and Pi.

For the full technical details, read the Nemotron 3 Ultra technical report.

*Stay up to date on* NVIDIA Nemotron* by subscribing to *NVIDIA news* and** following NVIDIA AI on *LinkedIn*, *X*, *Discord*, and *YouTube*.*

*Visit the *Nemotron developer page* for resources to get started. Explore open Nemotron models and datasets on *Hugging Face* and *Blueprints* on *build.nvidia.com*.*

*Engage with *Nemotron livestreams*, *tutorials*, and the developer community on the *NVIDIA forum *and** *Discord*.*

この記事をシェア

MarkTechPost2026年7月20日 10:56

コミュニティが MiniCPM5-1B を微調整し、657MB の思考モデルを公開

TLDR AI2026年7月20日 09:00

Fable5とGPT-5.6のNP困難問題比較

MarkTechPost重要度42026年7月20日 07:20

Feyn AI が DB 事前検査型 Text-to-SQL モデル「SQRL」発表

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

NVIDIA Developer Blog·2026年6月4日 22:02·約14分

NVIDIA Nemotron 3 Ultra が長時間実行型エージェントの推論を高速化・効率化

#LLM #AI Agents #Reasoning #Mixture-of-Experts #NVIDIA Nemotron

TL;DR

NVIDIA は、長期間実行される AI エージェントの推論コストを削減し効率を向上させるため、550B パラメータの MoE モデル「Nemotron 3 Ultra」を発表した。

AI深層分析2026年6月5日 10:30

重要/ 5段階

深度40%

キーポイント

エージェントワークフローにおける課題と解決策

Nemotron 3 Ultra の技術的特徴

競合モデルとのベンチマーク比較

高速スループットによる効率化

推論速度の劇的な向上

Nemotron 3 Ultra は、Artificial Analysis のベンチマークにおいて、競合する最先端モデルと比較して5倍高速な推論を実現しています。

精度と速度の両立による優位性

ベンチマークでの効率性向上

SWE-bench および Terminal bench 2.0 の実験において、同等モデルと比較して必要なトークン数とターンあたりのトークン数が削減されました。

重要な引用

Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete complex workflows.

Nemotron 3 Ultra is a 550B-parameter Mixture-of-Experts model with 55B active parameters, built for frontier reasoning and orchestration in agentic systems.

Developers can solve this using a system of models: frontier reasoning models for orchestration and complex planning, and efficient models for high-volume execution, validation, and tool calling.

Nemotron 3 Ultra delivers frontier accuracy in a smaller model

It achieves 5x higher throughput compared to other open models in its class, enabling long-running agents to complete tasks faster and more efficiently.

Nemotron 3 Ultra achieves 5x faster inference while delivering leading accuracy on the Artificial Analysis Intelligence Index leaderboard

影響分析・編集コメントを表示

影響分析

編集コメント

エージェントオーケストレーションのための Nemotron 3 Ultra

Nemotron 3 Ultra (550B) GLM 5.1 (744B) Kimi K2.6 (1T) Qwen3.5 (397B)

エージェント生産性PinchBench91%84%91%89%

長期計画EnterpriseOps-Gym33%40%29%30%

コーディングTerminal-Bench 2.054%64%67%53%

指示従順性IFBench82%77%74%78%

知識労働GDPVal-AA1,4481,5941,5081,192

専門業務タスクProfBench (Search)56%46%56%53%

長期コンテキストRuler @1M95%N/A (最大 256K)N/A (最大 256K)90%

*表 1. Nemotron 3 Ultra は、より小さなモデルで最先端の精度を実現します*

image*図 2. Nemotron 3 Ultra は、Artificial Analysis Intelligence Index リーダーボードで最先端の精度を提供しながら、推論速度を 5 倍向上させます*

image*図 3. Nemotron 3 Ultra は、タスク完了までのコストを 30% 削減します*

Nemotron 3 Ultra を支える画期的な技術

高容量推論モデルにありがちな効率性と精度のトレードオフを緩和するため、Nemotron モデルはアーキテクチャ上の革新を導入しました:

Nemotron 3 Ultra が追加するマルチティーチャーオンポリシー蒸留

image*図 4. MOPD および Nemotron 3 Ultra で使用される特定の流れの視覚的ガイド*

より強力なエージェント推論のための訓練データ

ドメイン固有の前訓練データ **

10T トークンの前訓練基盤を踏まえ、Nemotron 3 Ultra は、3 つの主要な高価値ドメインギャップを対象とした 212B の新規トークンを追加します：

合成法務データ 4B トークンにより、プロキシ指標である LegalBench の平均値が 64.6% から 74.7% に向上

Wiki ベースの合成データ 35B トークンにより、プロキシ指標である SimpleQA が 40.2% から 50.2% に向上

2025 年 9 月 30 日までの 173B の更新された GitHub トークン

ポストトレーニングデータと RL エージェント環境

ドメイン向けにファインチューニング

Nemotron 3 Ultra レシピ:

SFT LoRA: NeMo Automodel (H100 Recipe, GB200 Recipe)

Full SFT: NeMo Megatron Bridge Recipes

Reinforcement Learning: NeMo RL GRPO recipe、GRPO LoRA recipe、MOPD recipe

Deployment: Dynamo Recipe

実際の動作を確認する

このウォークスルーでは、build.nvidia.com で Nemotron 3 Ultra に搭載された Hermes Agent を使用して、自動研究フローを起動して実行する方法を示します。

*ビデオ 1. Hermes Agent と Nemotron 3 Ultra を用いた自律型アシスタントの構築チュートリアルウォークスルー*

NVIDIA NemoClaw および NVIDIA OpenShell でエージェントをより安全に実行する

Hermes Agent と OpenClaw: これらは多ターンワークフローのためのオーケストレーションループ、メモリ、およびツールを提供する人気のあるエージェントハーンです。Hermes Agent は現在公式に利用可能であり、Nemotron での使用が完全にサポートされています。

NVIDIA OpenShell: 現在は早期プレビュー版として利用可能です。OpenShell は、自律型エージェントとその生成コードが実行される安全なランタイム環境（NVIDIA Agent Toolkit の一部）です。

NVIDIA NemoClaw: これは環境を統合するオープンソースの設計図です。単一のコマンドで NemoClaw を実行すると、OpenShell ランタイムがインストールされ、Hermes Agent などの自律型エージェントを Nemotron などのオープンソースモデルとより安全に並列して実行するための安全な環境が提供されます。

より安全で音声対応のエージェントの構築

2 つの新規 Nemotron モデルも同時にリリースされます:

Nemotron 3.5 コンテンツセーフティ

Nemotron 3.5 ASR**

より広い採用に向けた更新されたオープンライセンス

今日から構築を開始する

詳細な技術情報については、Nemotron 3 Ultra 技術報告書をお読みください。

原文を表示

Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete complex workflows.

Developers can solve this using a system of models: frontier reasoning models for orchestration and complex planning, and efficient models for high-volume execution, validation, and tool calling.

NVIDIA is releasing NVIDIA Nemotron 3 Ultra, an open model built to help long-running agents complete tasks faster while lowering cost.

Nemotron 3 Ultra for agent orchestration

Nemotron 3 Ultra is a 550B-parameter Mixture-of-Experts model with 55B active parameters, built for frontier reasoning and orchestration in agentic systems.

Nemotron 3 Ultra is also fast. It achieves 5x higher throughput compared to other open models in its class, enabling long-running agents to complete tasks faster and more efficiently.

Breakthroughs powering Nemotron 3 Ultra

To mitigate the typical efficiency-accuracy tradeoffs for high-capacity reasoning models, the Nemotron models introduce architectural innovations:

LatentMoE****LatentMoE supports more efficient expert routing, enabling the model to handle workflows spanning reasoning, code generation, tool calls, and domain-specific logic.

Nemotron 3 Ultra adds Multi-Teacher On-Policy Distillation

Training data for stronger agent reasoning

Domain-specific pre-training data **

Building on a 10T token pre-training foundation, Nemotron 3 Ultra adds 212B new tokens targeting three high-value domain gaps:

4B tokens of synthetic legal data, increasing the proxy LegalBench average from 64.6% to 74.7%

35B tokens of synthesized Wiki-based data, boosting proxy SimpleQA from 40.2% to 50.2%

173B refreshed GitHub tokens through Sept. 30, 2025

Post-training data and RL environments

The result is SWEBench Verified scores between 65% and 70.4% across Pi, OpenHands, Hermes, OpenCode, and Mini SWE Agent—consistent performance regardless of which framework you deploy.

Finetune for your domain

Nemotron 3 Ultra can be fine-tuned using LoRA, SFT, and reinforcement learning using the NVIDIA NeMo libraries. Developers can get started with the following recipes.

Nemotron 3 Ultra Recipes:

SFT LoRA: NeMo Automodel (H100 Recipe, GB200 Recipe)

Full SFT: NeMo Megatron Bridge Recipes

Reinforcement Learning: NeMo RL GRPO recipe, GRPO LoRA recipe, MOPD recipe

Deployment: Dynamo Recipe

See it in action

This walkthrough shows how to spin up and run an autoresearch flow using Hermes Agent powered by Nemotron 3 Ultra on build.nvidia.com.

Run agents more safely with NVIDIA NemoClaw and NVIDIA OpenShell

Nemotron models integrate with leading open agent frameworks. To build a secure, always-on agentic system, it is important to understand the reference stack:

Hermes Agent and OpenClaw: These are popular agent harnesses that provide the orchestration loops, memory, and tools for multi-turn workflows. Hermes Agent is now officially available and fully supported for use with Nemotron.

NVIDIA OpenShell: Available now in early preview, OpenShell is the secure runtime environment (part of the NVIDIA Agent Toolkit) where autonomous agents and their generated code execute.

NVIDIA NemoClaw: This is the open-source blueprint that ties the environment together. With a single command, NemoClaw installs the OpenShell runtime—providing a secure environment for running autonomous agents like Hermes Agent more safely alongside open-source models like Nemotron.

Build safer and voice-enabled agents

Two new Nemotron models are also launching:

Nemotron 3.5 ASR**

Updated open licensing for broader adoption

Start building today

For the full technical details, read the Nemotron 3 Ultra technical report.

*Stay up to date on* NVIDIA Nemotron* by subscribing to *NVIDIA news* and** following NVIDIA AI on *LinkedIn*, *X*, *Discord*, and *YouTube*.*

*Visit the *Nemotron developer page* for resources to get started. Explore open Nemotron models and datasets on *Hugging Face* and *Blueprints* on *build.nvidia.com*.*

*Engage with *Nemotron livestreams*, *tutorials*, and the developer community on the *NVIDIA forum *and** *Discord*.*

この記事をシェア

MarkTechPost2026年7月20日 10:56

コミュニティが MiniCPM5-1B を微調整し、657MB の思考モデルを公開

TLDR AI2026年7月20日 09:00

Fable5とGPT-5.6のNP困難問題比較

MarkTechPost重要度42026年7月20日 07:20

Feyn AI が DB 事前検査型 Text-to-SQL モデル「SQRL」発表

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

キーポイント

重要な引用

影響分析

編集コメント

エージェントオーケストレーションのための Nemotron 3 Ultra

Nemotron 3 Ultra を支える画期的な技術

Nemotron 3 Ultra が追加するマルチティーチャーオンポリシー蒸留

より強力なエージェント推論のための訓練データ

実際の動作を確認する

NVIDIA NemoClaw および NVIDIA OpenShell でエージェントをより安全に実行する

より安全で音声対応のエージェントの構築

より広い採用に向けた更新されたオープンライセンス

今日から構築を開始する

Nemotron 3 Ultra for agent orchestration

Breakthroughs powering Nemotron 3 Ultra

Nemotron 3 Ultra adds Multi-Teacher On-Policy Distillation

Training data for stronger agent reasoning

See it in action

Run agents more safely with NVIDIA NemoClaw and NVIDIA OpenShell

Build safer and voice-enabled agents

Updated open licensing for broader adoption

Start building today

関連記事

キーポイント

重要な引用

影響分析

編集コメント

エージェントオーケストレーションのための Nemotron 3 Ultra

Nemotron 3 Ultra を支える画期的な技術

Nemotron 3 Ultra が追加するマルチティーチャーオンポリシー蒸留

より強力なエージェント推論のための訓練データ

実際の動作を確認する

NVIDIA NemoClaw および NVIDIA OpenShell でエージェントをより安全に実行する

より安全で音声対応のエージェントの構築

より広い採用に向けた更新されたオープンライセンス

今日から構築を開始する

Nemotron 3 Ultra for agent orchestration

Breakthroughs powering Nemotron 3 Ultra

Nemotron 3 Ultra adds Multi-Teacher On-Policy Distillation

Training data for stronger agent reasoning

See it in action

Run agents more safely with NVIDIA NemoClaw and NVIDIA OpenShell

Build safer and voice-enabled agents

Updated open licensing for broader adoption

Start building today

関連記事