NVIDIA Developer Blog·2026年2月18日 03:00·約6分

5つの重要なマルチモーダルRAG機能でAI対応知識システムを構築

#マルチモーダルRAG #Retrieval-Augmented Generation #企業AI #NVIDIA #データ基盤 #非構造化データ処理

TL;DR

NVIDIAが提案する5つのマルチモーダルRAG機能を活用し、AI対応の知識システムを効果的に構築する方法について説明しています。

AI深層分析2026年2月25日 23:42

重要/ 5段階

キーポイント

企業データの複雑性に対応するため、マルチモーダルRAG（Retrieval-Augmented Generation）が重要である

NVIDIA Enterprise RAG Blueprintは、テキスト・表・図表・画像など多様なデータ形式を統合的に処理する参照アーキテクチャを提供する

5つの主要構成（ベースラインパイプライン・クエリ分解・メタデータフィルタリング・視覚的推論など）により、精度と文脈関連性を向上させる

AIデータプラットフォームに組み込むことで、従来のデータリポジトリをAI対応の知識システムに変革できる

影響分析・編集コメントを表示

影響分析

この記事は、企業におけるAI活用の実用段階において、多様なデータ形式を統合的に扱えるマルチモーダルRAGの重要性を明確に示している。NVIDIAが提供する参照アーキテクチャは、企業が持つ非構造化データの価値を最大限に引き出すための具体的な道筋を示しており、AIとデータ基盤の統合を加速させる可能性が高い。

編集コメント

企業の実データ活用における「最後の一マイル」を埋める重要な技術動向。マルチモーダル対応がRAGの実用性を大きく高めるポイントだと理解できる。

企業データは本質的に複雑です：実世界の文書はマルチモーダルであり、テキスト、表、チャートやグラフ、画像、図表、スキャンされたページ、フォーム、埋め込みメタデータに及びます。財務報告書は表に重要な洞察を載せ、エンジニアリングマニュアルは図表に依存し、法律文書には注釈付きやスキャンされたコンテンツが含まれることがよくあります。

検索拡張生成（RAG）は、LLMを信頼できる企業ナレッジに基づかせるために作られました—クエリ時に関連するソースデータを検索することで、虚構を減らし精度を向上させます。しかし、RAGシステムが周囲のテキストのみを処理する場合、表、チャート、図表に埋め込まれた重要なシグナルを見逃し、不完全または誤った答えにつながります。

インテリジェントエージェントは、それが構築されるデータ基盤と同じ程度にしか優れていません。したがって、現代のRAGは本質的にマルチモーダルでなければなりません—企業レベルの精度を達成するために、視覚的およびテキスト的な文脈の両方を理解できる必要があります。NVIDIA Enterprise RAG Blueprintはこのために構築されており、非構造化企業データを、その上に構築されるインテリジェントシステムに接続するモジュラー参照アーキテクチャを提供します。

このブループリントはまた、NVIDIA AI Data Platformの基盤層として機能し、コンピュートとデータの間の伝統的なギャップを埋めるのに役立ちます。データ層に近い場所で検索と推論を可能にすることで、ガバナンスを維持し、運用上の摩擦を減らし、企業ナレッジをインテリジェントシステムがすぐに使用できるようにします。結果は、モデルと共に検索、エンリッチメント、推論が可能なストレージという、現代的なAIデータスタックです。

Enterprise RAG Blueprintは多くの設定可能なオプションを提供しますが、この記事では、企業ユースケース全体で精度と文脈的関連性を最も直接的に改善する以下の5つの主要な構成に焦点を当てます：

ベースラインマルチモーダルRAGパイプライン

クエリ分解

高速かつ正確な検索のためのメタデータフィルタリング

マルチモーダルデータのための視覚的推論

この記事ではまた、ブループリントをAIデータプラットフォームに組み込むことで、従来のリポジトリをAI対応ナレッジシステムに変革する方法について説明します。

このブログの精度指標は、RAGASフレームワークを使用し、よく知られた公開データセットを用いて測定されています。NVIDIA RAG Blueprintシステムの評価について詳しくはこちらをご覧ください。

ドキュメントの取り込みと理解

エージェントが洞察を提供する前に、それはあなたのデータに完全に基づいていなければなりません。この基盤となる構成は、インテリジェントなドキュメント取り込みとコアRAG機能に焦点を当てています。

Enterprise RAG Blueprintは、NVIDIA NeMo Retrieverを使用して、テキスト、表、チャートやグラフ、インフォグラフィックなどのマルチモーダル企業コンテンツを抽出し、そのコンテンツをテキストに埋め込んでベクトルデータベースにインデックス化します。クエリ時には、ブループリントは意味的検索、再ランキング、およびNemotron LLMを実行して、根拠に基づいた回答を生成します。

パフォーマンスを最大化するため、このベースラインは意図的に画像キャプション生成と重い推論を避けており、本番環境への導入に理想的な出発点となっています。このベースラインをDockerでデプロイしてください。

ドキュメント取り込みと理解の利点

この基盤となる構成は、ブループリントの最高効率パイプラインであり、精度とスループットを最適化しつつ、GPUコストと最初のトークンまでの時間（TTFT）を低く抑えます。この構成は、検索品質とLLMの根拠付けに関するベースラインパフォーマンスを確立します。

表1は、いくつかのデータセットにわたる全体的な影響をまとめています。

精度（v2.3 デフォルト） MM = マルチモーダル, TO = テキストのみ

表1. ベースライン構成の精度への影響（高いほど良い）

RAGブループリントで推論を有効にすると、LLMが検索された証拠を解釈し、論理的に根拠に基づいた回答を合成できるようになります。これは多くのアプリケーションで精度を向上させる最も簡単な変更です。NVIDIA Enterprise RAG Blueprintの推論を有効にしてください。

表2は、いくつかのサンプルデータセットにわたる全体的な影響をまとめています。

精度（v2.3 デフォルト）＋推論 MM = マルチモーダル, TO = テキストのみ

表2. 推論を有効にした場合の精度への影響 vs ベースライン構成（高いほど良い）

推論の利点

数学的操作や複雑なデータ比較を含むあらゆるユースケースでは、典型的な単純な類似性検索やハイブリッド検索では不十分です。エラーを修正し、正確な文脈理解を確保するには推論が必要です。データセット全体での精度向上は平均約5％であり、いくつかのケースでは推論による劇的な修正が示されています。

FinanceBenchデータセットでは、ベースライン構成はAdobeの2017年度営業キャッシュフロー比率を誤って2.91と計算しました。推論を有効にした後、モデルは正しい答えである0.83を生成しました。さらに、Ragbattleデータセットは、VLMを有効にすることによる精度向上を示しています。

クエリ分解

複雑なユーザーの質問に答えるには、多くの場合、データ基盤の複数の場所から事実を引き出す必要があります。クエリ分解は、単一の質問を小さなサブクエリに分割し、それぞれの証拠を検索し、結果を完全で根拠に基づいた応答に再結合します。NVIDIA Enterprise RAG Blueprintのクエリ分解を有効にしてください。

図1. RAGパイプライン

図2. クエリ分解前後の応答精度

原文を表示

Enterprise data is inherently complex: real-world documents are multimodal, spanning text, tables, charts and graphs, images, diagrams, scanned pages, forms, and embedded metadata. Financial reports carry critical insights in tables, engineering manuals rely on diagrams, and legal documents often include annotated or scanned content.

Retrieval-augmented generation (RAG) was created to ground LLMs in trusted enterprise knowledge—retrieving relevant source data at query time to reduce hallucinations and improve accuracy. But if a RAG system processes only surrounding text, it misses key signals embedded in tables, charts, and diagrams—resulting in incomplete or incorrect answers.

An intelligent agent is only as good as the data foundation it’s built on. Modern RAG must therefore be inherently multimodal—able to understand both visual and textual context to achieve enterprise-grade accuracy. The NVIDIA Enterprise RAG Blueprint is built for this, providing a modular reference architecture that connects unstructured enterprise data to the intelligent systems built on top of it.

The blueprint also serves as a foundational layer for the NVIDIA AI Data Platform, helping to bridge the traditional gap between compute and data. By enabling retrieval and reasoning closer to the data layer, it preserves governance, reduces operational friction, and makes enterprise knowledge immediately usable by intelligent systems. The result is a modern AI data stack—storage that can retrieve, enrich, and reason alongside your models.

While the Enterprise RAG Blueprint provides many configurable options, this post highlights the following five key configurations that most directly improve accuracy and contextual relevance across enterprise use cases:

Baseline multimodal RAG pipeline

Query decomposition

Filtering metadata for faster and precise retrieval

Visual reasoning for multimodal data

The post also explains how the blueprint can be embedded into AI data platforms to transform traditional repositories into AI-ready knowledge systems.

Accuracy metrics in this blog are measured using the RAGAS framework, using well-known public datasets. Learn more about evaluating your NVIDIA RAG Blueprint system.

Document ingestion and understanding

Before an agent can deliver insights, it must be perfectly grounded in your data. This foundational configuration focuses on intelligent document ingestion and core RAG functionality.

The Enterprise RAG Blueprint uses NVIDIA NeMo Retriever to extract multimodal enterprise content—text, tables, charts and graphs, and infographics—then embeds that content into text for indexing in a vector database. At query time, the blueprint runs semantic retrieval, reranking, and Nemotron LLM to generate a grounded answer.

To maximize performance, this baseline intentionally avoids image captioning and heavy reasoning, making it the ideal starting point for production deployments. Deploy this baseline on Docker.

Benefits of document ingestion and understanding

This foundational configuration is the blueprint’s highest-efficiency pipeline, optimized for accuracy and throughput while keeping GPU cost and time to first token (TTFT) low. This configuration establishes your baseline performance for retrieval quality and LLM grounding.

Table 1 summarizes the overall impact across a few datasets.

Accuracy (v2.3 Default) MM = Multimodal, TO = Text-Only

Table 1. Accuracy impact of baseline configuration (higher is better)

When you turn on reasoning in the RAG blueprint, you enable the LLM to interpret the retrieved evidence, and synthesize logically grounded answers. This is the easiest change to get an accuracy boost for many applications. Enable reasoning for the NVIDIA Enterprise RAG Blueprint.

Table 2 summarizes the overall impact across several sample datasets.

Accuracy (v2.3 Default) plus Reasoning MM = Multimodal, TO = Text-Only

Table 2. Accuracy impact of enabling reasoning versus baseline configuration (higher is better)

Benefits of reasoning

For any use case involving mathematical operations or complex data comparison, a typical simple similarity or hybrid search will not suffice. Reasoning is required to correct errors and ensure precise contextual understanding. Accuracy improvements across datasets averaged ~5%, with several cases demonstrating dramatic reasoning-driven corrections.

In the FinanceBench dataset, the baseline configuration incorrectly computed the Adobe FY2017 operating cash flow ratio as 2.91. After enabling reasoning, the model produced the correct answer, 0.83. In addition, the Ragbattle dataset demonstrates the accuracy improvement from enabling VLM.

Query decomposition

Answering complex user questions often requires pulling facts from multiple places in the data foundation. Query decomposition breaks a single question into smaller subqueries, retrieves evidence for each, and recombines the results into a complete, grounded response. Turn on query decomposition for the NVIDIA Enterprise RAG Blueprint.

Figure 2. Response accuracy before and after query decomposition

この記事をシェア

MarkTechPost重要度42026年7月4日 15:32

NVIDIA AI が自己改善型ロボットフレームワーク「ASPIRE」を発表、LIBERO-Pro の長期タスクでゼロショット成功率 31% を達成

AI News重要度42026年7月2日 23:38

NVIDIA BioNeMo が Anthropic の Claude Science を加速

NVIDIA Developer Blog重要度42026年7月3日 06:25

速度を落とさないハードウェア基盤の AI セキュリティ

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

NVIDIA Developer Blog·2026年2月18日 03:00·約6分

5つの重要なマルチモーダルRAG機能でAI対応知識システムを構築

#マルチモーダルRAG #Retrieval-Augmented Generation #企業AI #NVIDIA #データ基盤 #非構造化データ処理

TL;DR

NVIDIAが提案する5つのマルチモーダルRAG機能を活用し、AI対応の知識システムを効果的に構築する方法について説明しています。

AI深層分析2026年2月25日 23:42

重要/ 5段階

キーポイント

企業データの複雑性に対応するため、マルチモーダルRAG（Retrieval-Augmented Generation）が重要である

NVIDIA Enterprise RAG Blueprintは、テキスト・表・図表・画像など多様なデータ形式を統合的に処理する参照アーキテクチャを提供する

5つの主要構成（ベースラインパイプライン・クエリ分解・メタデータフィルタリング・視覚的推論など）により、精度と文脈関連性を向上させる

AIデータプラットフォームに組み込むことで、従来のデータリポジトリをAI対応の知識システムに変革できる

影響分析・編集コメントを表示

影響分析

編集コメント

ベースラインマルチモーダルRAGパイプライン

クエリ分解

高速かつ正確な検索のためのメタデータフィルタリング

マルチモーダルデータのための視覚的推論

ドキュメントの取り込みと理解

ドキュメント取り込みと理解の利点

表1は、いくつかのデータセットにわたる全体的な影響をまとめています。

精度（v2.3 デフォルト） MM = マルチモーダル, TO = テキストのみ

表1. ベースライン構成の精度への影響（高いほど良い）

表2は、いくつかのサンプルデータセットにわたる全体的な影響をまとめています。

精度（v2.3 デフォルト）＋推論 MM = マルチモーダル, TO = テキストのみ

表2. 推論を有効にした場合の精度への影響 vs ベースライン構成（高いほど良い）

推論の利点

クエリ分解

図1. RAGパイプライン

図2. クエリ分解前後の応答精度

原文を表示

Baseline multimodal RAG pipeline

Query decomposition

Filtering metadata for faster and precise retrieval

Visual reasoning for multimodal data

The post also explains how the blueprint can be embedded into AI data platforms to transform traditional repositories into AI-ready knowledge systems.

Accuracy metrics in this blog are measured using the RAGAS framework, using well-known public datasets. Learn more about evaluating your NVIDIA RAG Blueprint system.

Document ingestion and understanding

Before an agent can deliver insights, it must be perfectly grounded in your data. This foundational configuration focuses on intelligent document ingestion and core RAG functionality.

To maximize performance, this baseline intentionally avoids image captioning and heavy reasoning, making it the ideal starting point for production deployments. Deploy this baseline on Docker.

Benefits of document ingestion and understanding

Table 1 summarizes the overall impact across a few datasets.

Accuracy (v2.3 Default) MM = Multimodal, TO = Text-Only

Table 1. Accuracy impact of baseline configuration (higher is better)

Table 2 summarizes the overall impact across several sample datasets.

Accuracy (v2.3 Default) plus Reasoning MM = Multimodal, TO = Text-Only

Table 2. Accuracy impact of enabling reasoning versus baseline configuration (higher is better)

Benefits of reasoning

Query decomposition

この記事をシェア

MarkTechPost重要度42026年7月4日 15:32

NVIDIA AI が自己改善型ロボットフレームワーク「ASPIRE」を発表、LIBERO-Pro の長期タスクでゼロショット成功率 31% を達成

AI News重要度42026年7月2日 23:38

NVIDIA BioNeMo が Anthropic の Claude Science を加速

NVIDIA Developer Blog重要度42026年7月3日 06:25

速度を落とさないハードウェア基盤の AI セキュリティ

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

5つの重要なマルチモーダルRAG機能でAI対応知識システムを構築

キーポイント

影響分析

編集コメント

関連記事

5つの重要なマルチモーダルRAG機能でAI対応知識システムを構築

キーポイント

影響分析

編集コメント

関連記事