NVIDIA Developer Blog·2026年5月29日 09:07·約6分で読める

エンタープライズ対応マルチモーダルAI「Step 3.7 Flash」をNVIDIA GPU上で実行可能に

#多モーダル AI #Mixture-of-Experts #NVFP4 #LLM #StepFun #NVIDIA

TL;DR

StepFun が NVIDIA エコシステム向けに最適化した大規模 MoE モデル「Step 3.7 Flash」が、NVIDIA Developer Blog を通じて発表され、エンタープライズ用途でのリアルタイム多モーダル処理が可能になった。

AI深層分析2026年6月12日 00:09

重要/ 5段階

深度40%

キーポイント

高性能な MoE アーキテクチャの採用

198B パラメータのうち 11B をアクティブに使用し、256K のコンテキストウィンドウと 3 つの推論レベルを備えた、高スループットかつ低遅延なモデル設計。

NVIDIA エコシステムとの完全統合

NVFP4 量子化チェックポイントが Hugging Face で提供され、NVIDIA アクセラレータ上でメモリ帯域とストレージ要件を削減しながら推論速度を向上させる。

エンタープライズ向け多機能対応

画像・動画のネイティブ入力に対応し、金融分析や並行コードエージェントなど、複雑な推論と検索を要する実業務ユースケースに特化。

Step 3.7 Flash のアーキテクチャ

テキストと画像の処理を、ビジョンエンコーダーとコア言語モデルを通じて統合し、効率的なテキスト出力を実現するコンポーネント構成となっている。

マルチモーダル処理の統合

視覚情報と言語情報をシームレスに連携させることで、エンタープライズ向けの高品質な多機能 AI 応答を可能にする設計が示されている。

Open Source Framework Support

Step 3.7 Flash is deployable via SGLang, NVIDIA TensorRT-LLM, and vLLM to leverage kernels optimized for NVIDIA hardware.

Rapid Prototyping with Endpoints

Developers can prototype using GPU-accelerated endpoints on build.nvidia.com, featuring a demo notebook for multi-step document intelligence pipelines.

影響分析・編集コメントを表示

影響分析

この発表は、大規模多モーダルモデルを単なる研究段階から、実際の業務フローに組み込めるレベルまで引き上げる重要な転換点です。特に NVIDIA のハードウェアと StepFun のソフトウェアを NVFP4 量子化で最適化した点は、コスト効率の高い高機能 AI エージェントの普及を加速させる要因となります。

編集コメント

NVIDIA の開発者ブログを通じて、中国の AI スタートアップ StepFun の最新モデルがハードウェア最適化された形で紹介されている点は、グローバルな AI エコシステムの多様化と相互接続性を示す象徴的な事例です。

AI アプリケーションは、テキスト生成を超え、画像、ドキュメント、動画、言語をリアルタイムで知覚・検索・推論できるマルチモーダルシステムへと進化し、断片的な情報を実行可能なインサイトに変えています。

StepFun 最新作の Step 3.7 Flash は、これらの機能をプロダクションおよびエンタープライズスケールで実現可能にし、NVIDIA アクセラレーションインフラ上で利用できます。これは 198B パラメータの Mixture-of-Experts（MoE）ビジョンランゲージモデルであり、順伝播ごとに約 11B の活性化パラメータを持ち、知覚・検索・多段階推論を組み合わせたアジェンティックワークフローをプロダクションスケールで最適化しています。

ネイティブの画像および動画入力に対応し、低・中・高の 3 つの構成可能な推論レベルと 256k のコンテキストウィンドウを備え、金融分析、並行するコーディングエージェント、その他の高スループットなマルチモーダルユースケースといったエンタープライズ用途向けに設計されています。開発者は、メモリ帯域幅およびストレージ要件の削減により推論速度が向上するよう、Hugging Face を通じて入手可能な StepFun の NVFP4 量子化チェックポイントを利用できます。

モデル：Step 3.7 Flash

総パラメータ数：198B

ビジュアルエンコーダーパラメータ数：1.8B

活性化パラメータ数：11B

コンテキスト長：256K

エキスパート数：288（8 個が活性）

*表 1. パラメータ数、コンテキスト長、MoE 構成など、Step 3.7 Flash の主要仕様概要*

image*図 1. テキストおよびビジョン処理のための Step 3.7 Flash コンポーネントのハイレベルなダイアグラム*

Step 3.7 Flash は、NVIDIA ハードウェア向けに最適化されたカーネルを利用するために、SGLang、NVIDIA TensorRT-LLM、および vLLM などのオープンソースフレームワークと共にデプロイすることができます。

NVIDIA エンドポイントを使用した構築

開発者は、build.nvidia.com を通じて利用可能な GPU アクセラレーションされたエンドポイントを使用して、Step 3.7 Flash のプロトタイピングおよび評価を行うことができます。NVIDIA Nemotron Parse と Step 3.7 Flash を使用し、PDF を含む財務報告書、スライド資料、科学論文などのバウンディングボックスを持つ大規模で複雑なドキュメントから構造化された洞察を抽出して出力を整える、多段階のドキュメントインテリジェンスパイプラインについては、デモノートブックでお試しください。

*ビデオ 1. ドキュメントインテリジェンスパイプラインが如何使用可能なデータを抽出するかを確認し、JupyterLab ノートブックでワークフローに従ってください*

NVIDIA NIM を用いた本番環境対応デプロイ

NVIDIA NIM を利用すれば、Step 3.7 Flash の開発から本番環境への移行を容易に行うことができます。最適化されたコンテナ化された推論マイクロサービスとして提供される NIM は、企業が必要とするパフォーマンスチューニング、標準化された API、そして柔軟なデプロイ機能をモデルにパッケージ化しています。オンプレミス、クラウド、あるいはハイブリッド環境のいずれでもダウンロードして実行可能です。NIM は、推論リクエストを NIM サーバーへ送信するための標準的な OpenAI 推論インターフェースを提供します。

NVIDIA コンテナレジストリから NIM コンテナをダウンロードしてください（企業ライセンスが必要です）。
OpenAI クライアントを使用してサーバーを開始します。
エンドポイントに対してテキストまたは画像の入力を送信します。

from openai import OpenAI

client = OpenAI(

api_key="no-key-required"

)

completion = client.chat.completions.create(

model="stepfun/step-3.7-flash",

messages=[{"role":"user","content":"Explain particle physics?"}]

temperature=0.5,

top_p=1,

max_tokens=1024,

stream=True

)

for chunk in completion:

if chunk.choices[0].delta.content is not None:

print(chunk.choices[0].delta.content, end="")

NVIDIA NeMo Framework を用いた Day 0 ファインチューニング

Step 3.7 Flash は、NVIDIA NeMo framework のオープンソースライブラリを使用して、ドメイン固有データでカスタマイズ可能です。NVIDIA NeMo Automodel ライブラリは、ネイティブの PyTorch n-D 並列処理（n-D parallelisms）と最適化されたパフォーマンスを組み合わせ、チェックポイント変換を行わずに Hugging Face のモデルチェックポイントから直接 Day 0 ファインチューニングをサポートしています。Step 3.7 向けの Automodel ファインチューニングレシピは、Hopper GPU で秒間 600 トークンの速度で、教師ありファインチューニング（SFT）やメモリ効率の高い LoRA などの手法をサポートしています。

大規模な高度なトレーニングには、チームは NeMo Megatron-Bridge ファインチューニングレシピも利用でき、これにより追加のパフォーマンス最適化が提供されます。

NVIDIA Blackwell 上のデータセンター展開から、NVIDIA DGX Station を用いたデスクサイド環境、管理された NIM マイクロサービス、そして Day 0 ファインチューニングワークフローに至るまで、NVIDIA は開発および展開の異なる段階全体にわたって Step 3.7 Flash を統合するための幅広いオプションを提供しています。748 GB の整合性メモリを備えた DGX Station は、フル 256k コンテキスト長に対応する余裕と、より高速なローカル開発者の反復処理を実現するために、Step 3.7 Flash を実行するのに理想的です。

NVIDIA はオープンソースエコシステムへの積極的な貢献者であり、オープンソースライセンスの下で数百のプロジェクトを公開しています。NVIDIA は、AI の透明性を促進し、ユーザーが AI の安全性と回復力に関する取り組みを共有できるようにする Step 3.7 Flash などのオープンモデルにコミットしています。

始めるには、Hugging Face で Step 3.7 Flash を確認するか、build.nvidia.com でご自身のデータでテストを行うか、vLLM Playbook を使用してローカルの DGX Station 上で実行してください。

著者について

原文を表示

AI applications are moving beyond text generation to multimodal systems that can perceive, search, and reason across images, documents, video, and language in real time—turning fragmented information into actionable insights.

Step 3.7 Flash, the latest from StepFun, brings these capabilities to production and enterprise-scale, available on NVIDIA-accelerated infrastructure. It is a 198B-parameter Mixture-of-Experts vision-language model, with approximately 11B activated parameters per forward pass, optimized for agentic workflows that combine perception, search, and multi-step reasoning at production scale.

With native image and video input, three configurable reasoning levels—low, medium, and high—and a 256k context window, it is designed for enterprise use cases such as financial analysis, concurrent coding agents, and other high-throughput multimodal use cases. Developers can use StepFun’s NVFP4-quantized checkpoint available through Hugging Face for boosted inference due to reduced memory bandwidth and storage requirements.

Figure 1. A high-level diagram of the Step 3.7 Flash components for text and vision processing

Step 3.7 Flash can be deployed with open source frameworks such as SGLang, NVIDIA TensorRT-LLM, and vLLM to utilize kernels optimized for NVIDIA hardware.

Build with NVIDIA endpoints

Developers can use GPU-accelerated endpoints available through build.nvidia.com for prototyping and evaluating Step 3.7 Flash. Test this out in the demo notebook, which uses Step 3.7 Flash and NVIDIA Nemotron Parse. The multi-step document intelligence pipeline extracts structured insights from large, complex documents with bounding boxes like financial reports, slide decks, and scientific papers, including PDFs, and organizes the output.

Production-ready deployment with NVIDIA NIM

NVIDIA NIM makes it easy to take Step 3.7 Flash from development into production. Available as optimized, containerized inference microservices, NIM packages the model with the performance tuning, standardized APIs, and deployment flexibility enterprises need. Download and run it on-premises, in the cloud, or across hybrid environments. NIM provides a standard OpenAI inference for sending inference requests to the NIM server.

Download the NIM container from the NVIDIA container registry (enterprise license required).

Start a server with the OpenAI client.

Send either text or image input to the endpoint.

from openai import OpenAI

client = OpenAI(

api_key="no-key-required"

)

completion = client.chat.completions.create(

model="stepfun/step-3.7-flash",

messages=[{"role":"user","content":"Explain particle physics?"}]

temperature=0.5,

top_p=1,

max_tokens=1024,

stream=True

)

for chunk in completion:

if chunk.choices[0].delta.content is not None:

print(chunk.choices[0].delta.content, end="")

Day 0 fine-tuning with NVIDIA NeMo Framework

Step 3.7 Flash can be customized with domain-specific data using open libraries from the NVIDIA NeMo framework. NVIDIA NeMo Automodel library combines native PyTorch n-D parallelisms with optimized performance and supports Day 0 fine-tuning directly from Hugging Face model checkpoints without checkpoint conversion. The Automodel fine-tuning recipe for Step 3.7 supports techniques such as supervised fine-tuning (SFT) and memory-efficient LoRA at 600 tokens/sec on Hopper GPUs.

For advanced large-scale training, teams can also use the NeMo Megatron-Bridge fine-tuning recipe, which provides additional performance optimizations.

From data center deployments on NVIDIA Blackwell to deskside with NVIDIA DGX Station to managed NIM microservices and Day 0 fine-tuning workflows, NVIDIA provides a range of options for integrating Step 3.7 Flash across different stages of development and deployment. With 748 GB of coherent memory, DGX Station is ideal for running Step 3.7 Flash with increased headroom for the full 256k context length, and faster local developer iteration.

NVIDIA is an active contributor to the open-source ecosystem and has released several hundred projects under open source licenses. NVIDIA is committed to open models such as Step 3.7 Flash that promote AI transparency and enable users to share their AI safety and resilience work.

To get started, check out Step 3.7 Flash on Hugging Face, test it with your own data on build.nvidia.com, or locally on DGX Station using the vLLM Playbook.

About the Authors

この記事をシェア

AWS Machine Learning Blog★42026年6月5日 01:59

NVIDIA Nemotron 3 Ultra が Amazon SageMaker JumpStart で利用可能に

AWS は、推論速度を5倍向上させ、コストを最大30%削減する「NVIDIA Nemotron 3 Ultra」モデルを、Amazon SageMaker JumpStart でワンクリックデプロイ可能にしたと発表した。

MarkTechPost★42026年6月11日 17:33

Cohere が開発者向けコード生成モデル「North Mini Code」を発表：30B パラメータの MoE アーキテクチャで 3B アクティブ

Cohere AI チームは、ソフトウェアエンジニア向けのオープンウェイトコード生成モデル「North Mini Code」を公開した。このモデルは総パラメータ数 30B の混合専門家（MoE）アーキテクチャを採用し、トークン処理時に 3B のパラメータのみが活性化するように設計されている。

NVIDIA Developer Blog★42026年6月4日 22:02

NVIDIA Nemotron 3 Ultra が長時間実行型エージェントの推論を高速化・効率化

NVIDIA は、長時間実行型エージェントが推論を行い、文脈を維持し、ツールを活用して効率的に動作するための新モデル「Nemotron 3 Ultra」を発表した。これにより、単発チャットボットから複雑なタスクをこなすエージェントへの進化が加速する。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

NVIDIA Developer Blog·2026年5月29日 09:07·約6分で読める

エンタープライズ対応マルチモーダルAI「Step 3.7 Flash」をNVIDIA GPU上で実行可能に

#多モーダル AI #Mixture-of-Experts #NVFP4 #LLM #StepFun #NVIDIA

TL;DR

AI深層分析2026年6月12日 00:09

重要/ 5段階

深度40%

キーポイント

高性能な MoE アーキテクチャの採用

NVIDIA エコシステムとの完全統合

エンタープライズ向け多機能対応

画像・動画のネイティブ入力に対応し、金融分析や並行コードエージェントなど、複雑な推論と検索を要する実業務ユースケースに特化。

Step 3.7 Flash のアーキテクチャ

マルチモーダル処理の統合

視覚情報と言語情報をシームレスに連携させることで、エンタープライズ向けの高品質な多機能 AI 応答を可能にする設計が示されている。

Open Source Framework Support

Step 3.7 Flash is deployable via SGLang, NVIDIA TensorRT-LLM, and vLLM to leverage kernels optimized for NVIDIA hardware.

Rapid Prototyping with Endpoints

Developers can prototype using GPU-accelerated endpoints on build.nvidia.com, featuring a demo notebook for multi-step document intelligence pipelines.

影響分析・編集コメントを表示

影響分析

編集コメント

モデル：Step 3.7 Flash

総パラメータ数：198B

ビジュアルエンコーダーパラメータ数：1.8B

活性化パラメータ数：11B

コンテキスト長：256K

エキスパート数：288（8 個が活性）

*表 1. パラメータ数、コンテキスト長、MoE 構成など、Step 3.7 Flash の主要仕様概要*

image*図 1. テキストおよびビジョン処理のための Step 3.7 Flash コンポーネントのハイレベルなダイアグラム*

NVIDIA エンドポイントを使用した構築

NVIDIA NIM を用いた本番環境対応デプロイ

NVIDIA コンテナレジストリから NIM コンテナをダウンロードしてください（企業ライセンスが必要です）。
OpenAI クライアントを使用してサーバーを開始します。
エンドポイントに対してテキストまたは画像の入力を送信します。

from openai import OpenAI

client = OpenAI(

api_key="no-key-required"

)

completion = client.chat.completions.create(

model="stepfun/step-3.7-flash",

messages=[{"role":"user","content":"Explain particle physics?"}]

temperature=0.5,

top_p=1,

max_tokens=1024,

stream=True

)

for chunk in completion:

if chunk.choices[0].delta.content is not None:

print(chunk.choices[0].delta.content, end="")

NVIDIA NeMo Framework を用いた Day 0 ファインチューニング

著者について

原文を表示

Step 3.7 Flash can be deployed with open source frameworks such as SGLang, NVIDIA TensorRT-LLM, and vLLM to utilize kernels optimized for NVIDIA hardware.

Build with NVIDIA endpoints

Production-ready deployment with NVIDIA NIM

Download the NIM container from the NVIDIA container registry (enterprise license required).

Start a server with the OpenAI client.

Send either text or image input to the endpoint.

from openai import OpenAI

client = OpenAI(

api_key="no-key-required"

)

completion = client.chat.completions.create(

model="stepfun/step-3.7-flash",

messages=[{"role":"user","content":"Explain particle physics?"}]

temperature=0.5,

top_p=1,

max_tokens=1024,

stream=True

)

for chunk in completion:

if chunk.choices[0].delta.content is not None:

print(chunk.choices[0].delta.content, end="")

Day 0 fine-tuning with NVIDIA NeMo Framework

For advanced large-scale training, teams can also use the NeMo Megatron-Bridge fine-tuning recipe, which provides additional performance optimizations.

To get started, check out Step 3.7 Flash on Hugging Face, test it with your own data on build.nvidia.com, or locally on DGX Station using the vLLM Playbook.

About the Authors

この記事をシェア

AWS Machine Learning Blog★42026年6月5日 01:59

NVIDIA Nemotron 3 Ultra が Amazon SageMaker JumpStart で利用可能に

MarkTechPost★42026年6月11日 17:33

Cohere が開発者向けコード生成モデル「North Mini Code」を発表：30B パラメータの MoE アーキテクチャで 3B アクティブ

NVIDIA Developer Blog★42026年6月4日 22:02

NVIDIA Nemotron 3 Ultra が長時間実行型エージェントの推論を高速化・効率化

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

エンタープライズ対応マルチモーダルAI「Step 3.7 Flash」をNVIDIA GPU上で実行可能に

キーポイント

影響分析

編集コメント

NVIDIA エンドポイントを使用した構築

NVIDIA NIM を用いた本番環境対応デプロイ

NVIDIA NeMo Framework を用いた Day 0 ファインチューニング

著者について

Build with NVIDIA endpoints

Production-ready deployment with NVIDIA NIM

Day 0 fine-tuning with NVIDIA NeMo Framework

About the Authors

関連記事

エンタープライズ対応マルチモーダルAI「Step 3.7 Flash」をNVIDIA GPU上で実行可能に

キーポイント

影響分析

編集コメント

NVIDIA エンドポイントを使用した構築

NVIDIA NIM を用いた本番環境対応デプロイ

NVIDIA NeMo Framework を用いた Day 0 ファインチューニング

著者について

Build with NVIDIA endpoints

Production-ready deployment with NVIDIA NIM

Day 0 fine-tuning with NVIDIA NeMo Framework

About the Authors

関連記事