NVIDIA Developer Blog·2026年6月23日 00:00·約13分で読める

DAQIRI を活用して高速データ収集におけるリアルタイム AI を実現

#リアルタイム AI #データ収集 #エッジコンピューティング #NVIDIA

TL;DR

NVIDIA は DAQIRI という技術を発表し、高速なデータ収集プロセスにおいてリアルタイムでの AI 処理を可能にする機能を確立した。

AI深層分析2026年6月23日 01:01

注目/ 5段階

深度40%

キーポイント

DAQIRI 技術の発表

NVIDIA が発表した新技術 DAQIRI は、従来のバッチ処理に代わり、データ収集プロセス内でリアルタイム AI 処理を可能にするものである。

高速データ収集との統合

この機能は特に高速なデータ収集環境において、遅延を最小化しながら即座にインサイトを抽出することを目的としている。

エッジとクラウドの連携

NVIDIA の技術スタックを活用することで、データ生成地点での処理能力が向上し、システム全体の効率性が改善される。

影響分析・編集コメントを表示

影響分析

この発表は、特に製造業や科学計測など、高速データストリームを扱う分野において、意思決定の遅延を劇的に短縮する可能性を秘めています。しかし、詳細な技術仕様や具体的なベンチマーク数値が本文では明記されていないため、現時点での業界全体への即時的インパクトは限定的と判断されます。

編集コメント

プレスリリースベースの発表であり、具体的な技術的詳細や実証データが不足しているため、現時点では業界への即時的な変革というよりは、NVIDIA の製品ラインナップ強化の一環と捉えるべきでしょう。

2020年にAlphaFold2が創薬に革命をもたらした際、その成功は1971年以来科学者によって収集され、タンパク質データバンク（Protein Data Bank）に保存された約17万個のタンパク質構造に完全に依存していました。計測データは、生成されるデータを処理し、重要な事象をリアルタイムで対応し、深い洞察のためにデータを分析するすべてのAIモデルとワークフローの基盤です。現代のセンサーや検出器が台頭している現在、画期的なAIモデルに必要な十分なデータを収集するために50年も待つ必要はありません。

1MHzのリピートレートで光子パルスを生成する大型科学施設であるリニアコヒーレントライトソースII（Linac Coherent Light Source II: LCLS-II）から、産業用CTスキャナや高帯域幅のソフトウェア定義無線（software defined radios）に至るまで、出力率は継続的に増加しており、ボトルネックが欠落したデータから、「収集・保存・分析」という現在のアーキテクチャへとシフトしています。このアーキテクチャは、短時間スケールでの高データレートに対処するために設計されたものではありません。

適応可能なデータ取得パイプライン（data acquisition pipeline）へ移行し、ソース側でデータを前処理することで、データの洪水から生じる情報損失を緩和する機会が生まれ、データ収集から発見に至るまでのプロセスを加速します。

NVIDIA DAQIRI（Data Acquisition for Integrated Real-time Instruments）は、データ取得を柔軟なハードウェア中心の設計から、適応可能なソフトウェア中心のアーキテクチャへと転換します。NVIDIA Holoscan Platform の一部である高性能ネットワークライブラリとして、DAQIRI は既存の高帯域ストリーミング検出器やセンサーを NVIDIA ソフトウェアエコシステムに直接接続します。例としては、リアルタイムマルチモーダル・マルチレート処理のための Holoscan、リアルタイム推論のための NVIDIA TensorRT、およびストリーミング圧縮のための NVIDIA nvCOMP が挙げられます。NVIDIA エコシステムに加え、DAQIRI はカスタム機器固有のソフトウェアプラットフォームへも直接ストリームできます。

DAQIRI を利用することで、開発者や機器構築者は、機器を改造することなく、フィルタリング、推論、圧縮、イベント選択、適応制御などのタスクに対してデータをストリーム処理するデータ取得ワークフローを作成できます。

NVIDIA ConnectX ネットワークインターフェースカード（NIC）を搭載したエッジスーパーコンピューティング機能へ機器の出力を直接ストリーミングすることで、機器は実験を継続的に監視し、変化する条件に対応し、リアルタイムでアクションをトリガーできるようになります。これらのエッジスーパーコンピューティングシステムの規模は、特定の機器に必要な計算リソースに応じて、NVIDIA DGX Spark から NVIDIA IGX Platform まで、あるいは NVIDIA RTX Pro Server や VR200 などのノードおよびラックベースのソリューションまで多岐にわたります。

このアプローチにより、研究者は流入するデータに対する即座の洞察を得ることができると同時に、AI やその他の計算集約型手法を用いたスーパーコンピューティング施設での下流処理用に選択された出力を準備することができます。

A-GHOST: 保存不可能なデータを検索可能に

欧州原子核研究機構（CERN）における高輝度大型ハドロンコライダー (HL-LHC) のアップグレードにより、設計当初と比較してルミナosity が 10 倍に向上します。はるかに高いデータレートに対応するため、ATLAS 検出器は現在の選択システムをアップグレードします。新しい設計でも依然として 2 段階の選択システムを採用しますが、第 1 段階後の選択イベント帯域幅は 1 MHz（従来 100 kHz から）に、保存先へ送られる第 2 段階後は最大 10 kHz（従来 1 kHz から）となります。この増大したレートにおいても、オンラインシステムでは依然として全衝突の 99% 以上を拒否する必要があります。

A-GHOST プロジェクトは、効率的なネットワークを活用して GPU を生検出器データに近づけることで、通常の選択パスで破棄されるストリームに対して、より強力な AI ドライブ型の検索を DAQIRI を用いて適用します。

R&D の取り組みは、HL-LHC で使用予定のカスタム FPGA ベースのハードウェアボードと、高性能 GPU 対応処理ファームとの間にストリーミングリンクを構築し、その活用を探求することに焦点を当てています。このアーキテクチャにより、CERN Openlab、シカゴ大学、UCL の科学者らが主導する R&D 取り組みは、コンボリューションオートエンコーダー (Convolutional Auto-Encoders: CAEs)、時間的畳み込みニューラルネットワーク (Temporal Convolutional Neural Networks: TCCN)、トランスフォーマーベースのモデルといった強力なモデルを展開することで、フルデータストリームのリアルタイム分析を可能にします。これらのモデルは、プロトタイプハードウェアを用いてテストされる予定です。

DAQIRI の内部仕組み

DAQIRI は、100 Gbps 以上のラインレートで UDP および RoCE v2 トラフィックを含む高帯域幅イーサネットデータを処理するように設計されています。これを実現するために、アーキテクチャは Linux カーネルを完全にバイパスします。

Data Plane Development Kit (DPDK) を活用することで、DAQIRI はゼロコピーアクセスを提供し、データを NIC から GPU の直接メモリアクセス (Direct Memory Access: DMA) バッファへ直接ルーティングします。このカーネルバイパス機構は、従来のネットワークスタックに通常伴うレイテンシと CPU オーバーヘッドを低減し、膨大な計測データストリームが即座の処理のために GPU に到達することを保証します。

NVIDIA DAQIRI の主要機能:

高スループット、低レイテンシ

適切なハードウェアおよび CPU/NUMA チューニングにより、あらゆるインターフェースでラインレートを実現

カスタマイズされた受信処理

パケットの自動再順序付け、データ型変換、およびハードウェアベースのフローステアリング

GPU へのゼロメモリコピー

NIC のリングバッファに直接アクセス（バッチ処理とヘッダーデータの分割）して GPU テンソルへ転送することで、レイテンシを PCIe 転送時間に抑えます。

YAML ドライブ型設定

デプロイの容易さを確保するための最適化されカスタマイズ可能なボイラープレートネットワーク構成を提供します。

フレキシブルなデータ移動バックエンド

さまざまなアプリケーションやハードウェア要件に対応するため、Linux Sockets、DPDK (Data Plane Development Kit)、RoCEv2 (RDMA over Converged Ethernet version 2) をサポートしています。

プラグアンドプレイ C++ および Python API

数時間ではなく数分でリアルタイムアプリケーションを構築し、他の GPU ライブラリとインターフェースできます。

image*図 1.* *DAQIRI は、高帯域幅の検出器ストリームから NVIDIA アクセラレーテッドコンピューティングへのソフトウェア定義ブリッジとして機能し、「まず保存する」というプロセスをクリティカルパスから排除することで、計測機器がデータストリーミング処理を行い、AI 推論を実行し、リアルタイムで応答することを可能にします*

基盤となるデータ移動は低レベルのネットワーク最適化に依存していますが、DAQIRI はこの複雑さを計測器ビルダーのために抽象化しています。開発者は、読みやすい YAML ファイルで設定されたアクセス可能な C++ および Python API を使用して、データ取得パイプラインをオーケストレーションできます。

個々のネットワークパケットの管理や手動メモリアロケーションを行う代わりに、DAQIRI は着信するネットワークパケットを自動的にバッチ処理し、直接 GPU テンソルへ転送します。これにより、開発者はネットワークプロトコルの管理に時間を割くのではなく、独自の推論またはフィルタリングロジックの記述に完全に集中することが可能になります。

以下の手順では、機器設計者が高速データストリームを小さく検査可能な単位に分割して設定し、短い C++ ループを使用して GPU 対応テンソルを受信する方法を示します。

DAQIRI アプリケーションは設定ファイルから始まります。この設定ファイルは、アプリケーション実行前のデータパスを記述します：どの NIC（ネットワークインターフェースカード）を使用するか、どの GPU がパケットバッファを所有するか、パケットがどのようにフィルタリングされるか、そして受信したパケットペイロードが下流処理のためにどのように組み立てられるかなどです。

これは、多くの機器パイプラインが求めるハンドオフポイントです。個別のネットワークパケットではなく、すでに GPU 上に存在するバッチ形状のテンソルです。再順序付けステージでは、テンソルを組み立てる際にペイロードデータを変換することもできます。例えば、センサーフロントエンドはワイヤー上でコンパクトな int4 サンプルを送信する一方、GPU 処理や AI 推論ステージでは fp16 を期待している場合があります。

DAQIRI は、この変換を GPU の再順序付けステップの一部として実行し、アプリケーション内で別個のアンパックパスを行う必要をなくします。これらのファイルはあらゆるハードウェア構成に対して容易に編集可能であり、より多くの例は Cocktail Book (コード例) で確認できます。

DAQIRI の手順

まずはトップレベルの DAQIRI 設定から始めます。これにより、生ストリーミングパスが確立され、DAQIRI を管理する CPU コアが割り当てられ、デプロイに十分な簡潔さでログが保持されます。

%YAML 1.2

daqiri:

cfg:

version: 1

stream_type: "raw"

master_core: 3

log_level: "info"

次に、DAQIRI が使用する GPU メモリ領域を定義します。rx_packets は GPUDirect を通じて生パケットバッファが着地する場所であり、rx_tensor は GPU ワークロードによって消費される完了済みで再順序付けされたテンソルです。バッファサイズは、int4 から fp16 への展開を明示的に示すものでもあります。

memory_regions:

name: "rx_packets"

kind: "device"

affinity: 0

num_bufs: 16384

buf_size: 8192

name: "rx_tensor"

kind: "device"

affinity: 0

num_bufs: 128

buf_size: 32768000

次に、この設定を物理的な受信インターフェースにバインドします。これにより NIC が PCIe アドレスで命名され、フロー分離が有効化され、ポーリングキューが割り当てられます。このキューは一度に 1024 パケットをバッチ処理しますが、2 ミリ秒後にフラッシュされるため、下流の処理が無限に待機することはありません。

interfaces:

name: "rx0"

address: "0000:00:00.0"

rx:

flow_isolation: true

queues:

name: "q0"

id: 0

cpu_core: 9

batch_size: 1024

timeout_us: 2000

memory_regions: ["rx_packets"]

フロールールは、受け入れられるトラフィックを期待される UDP ストリームに絞り込み、一致するパケットをキュー 0 に送信します。これにより、他の機器やネットワークトラフィックがこの受信パスから除外されます。

flows:

name: "data_flow"

id: 100

action: {type: queue, id: 0}

match: {udp_src: 4096, udp_dst: 4096}

最後に、再順序付け設定は、パケットペイロードをアプリケーションが実際に望むテンソル形状に変換します。これはパケットヘッダーをスキップし、シーケンス番号を使用して順序を復元し、1024 パケットをバッチ化し、GPU 再順序付けステップの一部としてコンパクトな int4 ペイロードを fp16 値に変換します。

reorder_configs:

name: "packets_to_tensor"

reorder_type: "gpu"

memory_region: "rx_tensor"

payload_byte_offset: 68

flow_ids: [100]

data_types:

input_type: "int4"

output_type: "fp16"

endianness: "host"

method:

seq_packets_per_batch:

sequence_number:

bit_offset: 512

bit_width: 32

packets_per_batch: 1024

これらの YAML の詳細により、アプリケーションコードは意図的に小さくなります。まず、設定ファイルから DAQIRI を初期化し、完了したパケットバッチを受け取るバーストハンドルを宣言します。

// 設定ファイルで DAQIRI を初期化する。

daqiri::daqiri_init("rx_reorder.yaml");

daqiri::BurstParams* burst = nullptr;

次に、DAQIRI に次の完了したバーストを要求します。アプリケーションがポインタを受け取る頃には、DAQIRI はすでにパケットを連続する GPU メモリに再順序付け済みであるため、コードはテンソルを直接モデルまたは CUDA カーネルに渡すことができます。

// パケットを受信; DAQIRI がそれらを連続する GPU メモリに再順序付けします。

daqiri::get_rx_burst(&burst);

// 再順序付けされたバッチは、処理/推論の準備ができた GPU テンソルです。

auto* tensor = daqiri::get_packet_ptr(burst, 0);

run_model_or_kernel(tensor);

GPU ジョブが完了したら、バッファを DAQIRI に返して、次のバーストで再利用できるようにしてください。

センサーや検出器に DAQIRI を導入する

柔軟性の低いハードウェア中心の収集パラダイムから、ソフトウェア定義型で AI 対応アーキテクチャへ移行することで、DAQIRI は科学データ取得における従来のボトルネックを解消します。開発者は now ストリーム上でデータを処理し、エッジでリアルタイム AI 推論を実行し、高品質で AI 対応のデータのみを HPC ファシリティに送信してより深い分析を行うことが可能になります。

DAQIRI を用いて、今日からリアルタイム処理をリアルタイムストリーミングワークフローに統合してください！

DAQIRI GitHub リポジトリを探索する

チュートリアルとドキュメントについては DAQIRI Getting Started Docs を読む

ベンチマークと例については GitHub 上の DAQIRI ランディングページを訪れる

謝辞

*DAQIRI を A-GHOST データ取得パイプラインで使用するための作業と協力、ならびに本ブログ記事に対する技術レビューを賜りました CERN の David Miller 氏および Ioannis Xiotidis 氏に感謝いたします。また、ANO ドキュメンテーションがこの取り組みの参考となった Alexis Girault 氏にも謝意を表します。NVIDIA の貢献者である Cliff Burdick 氏、Chloe Crozier 氏、Jay Carlson 氏、Mahdi Azizian 氏、Julien Jomier 氏、および広範な Holoscan チームの皆様には、その専門知識、指導、支援に対し心より感謝申し上げます。*

原文を表示

When AlphaFold2 revolutionized drug discovery in 2020, its success relied entirely on the roughly 170,000 protein structures collected by scientists since 1971 and preserved in the Protein Data Bank. Measured data is the backbone for all AI models and workflows that process data as it’s created, act on what matters in real time, and analyzes data for deep insights. With the current rise of modern sensors and detectors, nobody needs to wait 50 years to collect enough data for groundbreaking AI models.

From large scientific facilities such as Linac Coherent Light Source II (LCLS-II), which generates photon pulses at a 1 MHz repetition rate, to industrial CT scanners and high bandwidth software defined radios, output rates continue to increase and shift the bottleneck away from missing data to the current “collect, store, analyze” architecture, which has never been designed to deal with high data rates on short time scales.

By moving to an adaptable data acquisition pipeline, pre-processing data at the source opens up opportunities that mitigate information loss from the data deluge, while accelerating the path from data collection to discovery.

NVIDIA DAQIRI (Data Acquisition for Integrated Real-time Instruments) shifts the data acquisition to an adaptable, software-centric architecture from an inflexible hardware-centric design. As a high-performance networking library, part of the NVIDIA Holoscan Platform, DAQIRI directly connects existing high-bandwidth streaming detectors and sensors to the NVIDIA software ecosystem. Examples include Holoscan for real-time multi-modal, multi-rate processing; NVIDIA TensorRT for real time inference; and NVIDIA nvCOMP for streaming compression. In addition to the NVIDIA ecosystem, DAQIRI can also stream directly into custom instrument-specific software platforms.

DAQIRI enables developers and instrument builders to create data acquisition workflows that process data in stream for tasks such as filtering, inference, compression, event selection, and adaptive control without having to modify their instruments.

By streaming instrument outputs directly into edge supercomputing capabilities equipped with an NVIDIA ConnectX Network Interface Card (NIC), instruments can monitor the experiment continuously, respond to changing conditions, and trigger actions, in real time. These edge supercomputing systems can range in size from the NVIDIA DGX Spark to the NVIDIA IGX Platform to node and rack-based solutions like NVIDIA RTX Pro Server or VR200, depending on the required compute for the specific instrument.

This approach gives researchers immediate insight into incoming data while also preparing selected outputs for downstream processing at supercomputing facilities using AI and other computationally intensive methods.

A-GHOST: Making unsavable data searchable

TheHigh-Luminosity Large Hadron Collider (HL-LHC) upgrade at the European Organization for Nuclear Research (CERN) will increase the luminosity by a factor of 10 compared to the original design. To process the much higher data rates, the ATLAS detector will upgrade its current selection system. The new design will still use a two-stage selection system, however now with a bandwidth of selected events of 1 MHz (up from 100 kHz) after the first stage, and up to 10 kHz (up from 1 kHz) after the second stage going to storage. Even at this increased rate, this still implies rejecting more than 99% of all collisions in the online system.

The A-GHOST project uses DAQIRI to apply more powerful AI-driven searches to the stream that is discarded by the nominal selection path by employing efficient networking to bring the GPUs closer to the raw detector data.

The R&D effort focuses on exploring the utilization of a streaming link between the custom Field-Programmable Gate Array (FPGA)-based hardware boards planned to be used during HL-LHC, and a high performance GPU enabled processing farm. With this architecture, the R&D effort led by CERN Openlab, University of Chicago and UCL scientists will allow the real-time analysis of the full data stream by deploying powerful models like Convolutional Auto-Encoders (CAEs), temporal Convolutional Neural Networks (TCCN) and transformer-based models, which are planned to be tested with the prototype hardware.

How DAQIRI works under the hood

DAQIRI is designed to handle high-bandwidth Ethernet data, including UDP and RoCE v2 traffic, at line rate of 100s of Gbps and higher. To achieve this, the architecture completely bypasses the Linux kernel.

By leveraging the Data Plane Development Kit (DPDK), DAQIRI provides zero-copy access, routing data directly from the NIC to the GPU’s Direct Memory Access (DMA) buffers. This kernel-bypass mechanism reduces the latency and CPU overhead typically associated with traditional network stacks, ensuring that massive instrument data streams arrive at the GPU ready for immediate processing.

NVIDIA DAQIRI Key Features:

High Throughput, Low Latency

Achieve line rate on any interface with proper hardware and CPU/NUMA tuning

Customized Receive Processing

Automated packet reordering, data type conversion, and hardware-based flow steering

Zero Memory Copy to GPU

Direct NIC ring-buffer access (Batched and Header Data Split) to GPU tensor keeps latency at PCIe transit time

YAML-Driven Configuration

Optimized and customizable boilerplate network configurations for ease of deployment

Flexible Data Movement Backends

Linux Sockets, DPDK, and RoCEv2 support for varying application and hardware demands

Plug and Play C++ and Python APIs

Build a real time application and interface with other GPU libraries in minutes, not hours”

Figure 1. DAQIRI acts as a software-defined bridge from high-bandwidth detector streams to NVIDIA accelerated computing, removing “store first” from the critical path so instruments can process data in stream, run AI inference, and respond in real time

While the underlying data movement relies on low-level networking optimizations, DAQIRI abstracts this complexity for instrument builders. Developers can orchestrate the data acquisition pipeline using accessible C++ and Python APIs, configured via readable YAML files.

Instead of managing individual network packets or manual memory allocations, DAQIRI automatically batches incoming network packets directly into GPU tensors. This allows developers to focus entirely on writing their custom inference or filtering logic rather than managing network protocols.

The walkthrough below shows how an instrument designer configures a high-speed data stream in small, inspectable pieces, then uses a short C++ loop to receive GPU-ready tensors.

DAQIRI applications start with a configuration file. The config describes the data path before the application runs: which NIC to use, which GPU owns the packet buffers, how packets should be filtered, and how received packet payloads should be assembled for downstream processing.

This is the handoff point most instrument pipelines want: a batch-shaped tensor already resident on the GPU rather than individual network packets. The reorder stage can also convert payload data while assembling the tensor. For example, a sensor frontend may send compact int4 samples on the wire, while the GPU processing or AI inference stage expects fp16.

DAQIRI can perform that conversion as part of the GPU reorder step, avoiding a separate unpacking pass in the application. These files are easily editable for any hardware configuration, and more examples can be found in theCocktail Book (code examples).

DAQIRI walkthrough

Start with the top-level DAQIRI settings. This establishes the raw streaming path, assigns a CPU core to manage DAQIRI, and keeps logs concise enough for deployment.

%YAML 1.2

---

daqiri:

cfg:

version: 1

stream_type: "raw"

master_core: 3

log_level: "info"

Next, define the GPU memory regions DAQIRI will use. rx_packets is where raw packet buffers land through GPUDirect, while rx_tensor is the completed, reordered tensor consumed by the GPU workload. The buffer sizes also make the int4-to-fp16 expansion explicit.

memory_regions:

- name: "rx_packets"

kind: "device"

affinity: 0

num_bufs: 16384

buf_size: 8192

- name: "rx_tensor"

kind: "device"

affinity: 0

num_bufs: 128

buf_size: 32768000

Then bind the configuration to the physical receive interface. This names the NIC by PCIe address, enables flow isolation, and assigns a polling queue. The queue batches 1024 packets at a time but can flush after 2 ms so downstream processing is not waiting indefinitely.

interfaces:

- name: "rx0"

address: "0000:00:00.0"

rx:

flow_isolation: true

queues:

- name: "q0"

id: 0

cpu_core: 9

batch_size: 1024

timeout_us: 2000

memory_regions: ["rx_packets"]

The flow rule narrows the accepted traffic to the expected UDP stream and sends matching packets into queue 0. That keeps the rest of the instrument or network traffic out of this receive path.

flows:

- name: "data_flow"

id: 100

action: {type: queue, id: 0}

match: {udp_src: 4096, udp_dst: 4096}

Finally, the reorder configuration turns packet payloads into the tensor shape the application actually wants. It skips packet headers, uses sequence numbers to restore ordering, batches 1024 packets, and converts compact int4 payloads into fp16 values as part of the GPU reorder step.

reorder_configs:

- name: "packets_to_tensor"

reorder_type: "gpu"

memory_region: "rx_tensor"

payload_byte_offset: 68

flow_ids: [100]

data_types:

input_type: "int4"

output_type: "fp16"

endianness: "host"

method:

seq_packets_per_batch:

sequence_number:

bit_offset: 512

bit_width: 32

packets_per_batch: 1024

With those details in YAML, the application code becomes intentionally small. First, initialize DAQIRI from the configuration file and declare the burst handle that will receive completed packet batches.

// Initialize DAQIRI with a config file.

daqiri::daqiri_init("rx_reorder.yaml");

daqiri::BurstParams* burst = nullptr;

Then ask DAQIRI for the next completed burst. By the time the application receives the pointer, DAQIRI has already reordered the packets into contiguous GPU memory, so the code can hand the tensor directly to a model or CUDA kernel.

// Receive packets; DAQIRI reorders them into contiguous GPU memory.

daqiri::get_rx_burst(&burst);

// The reordered batch is now a GPU tensor ready for processing/inference.

auto* tensor = daqiri::get_packet_ptr(burst, 0);

run_model_or_kernel(tensor);

Once the GPU job is done, return the buffers to DAQIRI so they can be reused for the next burst.

Bring DAQIRI to your sensor or detector

By shifting to a software-defined, AI-enabled architecture from an inflexible, hardware-centric collection paradigm, DAQIRI removes the traditional bottlenecks of scientific data acquisition. Developers can now process data in stream, run real-time AI inference at the edge, and ensure that only high-quality, AI-ready data is sent to HPC facilities for deeper analysis.

Start integrating real-time processing into your real time streaming workflows with DAQIRI today!

Explore the DAQIRI GitHub repository

Read the DAQIRI Getting Started Docs for Tutorials and Documentations

Visit the DAQIRI Landing Page on GitHub for Benchmarks and Examples

Acknowledgments

*We’d like to thank David Miller and Ioannis Xiotidis from CERN for their work and collaboration on using DAQIRI in the A-GHOST data acquisition pipeline, as well as for their collaboration and their technical review for this blog post. We also acknowledge Alexis Girault, whose ANO documentation helped inform this effort. Special thanks to NVIDIA contributors Cliff Burdick, Chloe Crozier, Jay Carlson, Mahdi Azizian, Julien Jomier, and the broader Holoscan team for their expertise, guidance, and support. *

この記事をシェア

TechCrunch AI★32026年6月18日 00:00

ロボット学習データの収集は汚く地味な作業だ。一部の AI ラボはすでに XDOF にその業務を委託している

AI ラボの一部が、ロボット訓練に必要なデータ収集という汚く地味な作業を専門企業 XDOF に委託し、対価を支払っている事例が報告された。

404 Media★42026年6月15日 23:53

裁判所、Meta の成人向けサイトデータ収集訴訟を却下せず Meta に訴えられ得ると判断

連邦裁判官は、Strike 3 ホールディングス（Blacked.com 等の運営会社）が Meta を相手取り、同社による成人向け動画の無断スクレイピングに対する訴訟の却下請求を退けた。これにより Meta は著作権侵害で訴えられる可能性が残された。

AI News★42026年6月10日 19:00

マクドナルド、Google 支援の AI ドライブスルー注文システムをテスト中

マクドナルドは、Google が支援する「ArchIQ」と呼ばれるAIシステムを米国の5店舗で試験運用しており、このシステムがドライブスルーでの注文受付や店舗運営をサポートしている。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

NVIDIA Developer Blog·2026年6月23日 00:00·約13分で読める

DAQIRI を活用して高速データ収集におけるリアルタイム AI を実現

#リアルタイム AI #データ収集 #エッジコンピューティング #NVIDIA

TL;DR

NVIDIA は DAQIRI という技術を発表し、高速なデータ収集プロセスにおいてリアルタイムでの AI 処理を可能にする機能を確立した。

AI深層分析2026年6月23日 01:01

注目/ 5段階

深度40%

キーポイント

DAQIRI 技術の発表

NVIDIA が発表した新技術 DAQIRI は、従来のバッチ処理に代わり、データ収集プロセス内でリアルタイム AI 処理を可能にするものである。

高速データ収集との統合

この機能は特に高速なデータ収集環境において、遅延を最小化しながら即座にインサイトを抽出することを目的としている。

エッジとクラウドの連携

NVIDIA の技術スタックを活用することで、データ生成地点での処理能力が向上し、システム全体の効率性が改善される。

影響分析・編集コメントを表示

影響分析

編集コメント

A-GHOST: 保存不可能なデータを検索可能に

DAQIRI の内部仕組み

NVIDIA DAQIRI の主要機能:

高スループット、低レイテンシ

適切なハードウェアおよび CPU/NUMA チューニングにより、あらゆるインターフェースでラインレートを実現

カスタマイズされた受信処理

パケットの自動再順序付け、データ型変換、およびハードウェアベースのフローステアリング

GPU へのゼロメモリコピー

YAML ドライブ型設定

デプロイの容易さを確保するための最適化されカスタマイズ可能なボイラープレートネットワーク構成を提供します。

フレキシブルなデータ移動バックエンド

プラグアンドプレイ C++ および Python API

数時間ではなく数分でリアルタイムアプリケーションを構築し、他の GPU ライブラリとインターフェースできます。

DAQIRI の手順

%YAML 1.2

daqiri:

cfg:

version: 1

stream_type: "raw"

master_core: 3

log_level: "info"

memory_regions:

name: "rx_packets"

kind: "device"

affinity: 0

num_bufs: 16384

buf_size: 8192

name: "rx_tensor"

kind: "device"

affinity: 0

num_bufs: 128

buf_size: 32768000

interfaces:

name: "rx0"

address: "0000:00:00.0"

rx:

flow_isolation: true

queues:

name: "q0"

id: 0

cpu_core: 9

batch_size: 1024

timeout_us: 2000

memory_regions: ["rx_packets"]

flows:

name: "data_flow"

id: 100

action: {type: queue, id: 0}

match: {udp_src: 4096, udp_dst: 4096}

reorder_configs:

name: "packets_to_tensor"

reorder_type: "gpu"

memory_region: "rx_tensor"

payload_byte_offset: 68

flow_ids: [100]

data_types:

input_type: "int4"

output_type: "fp16"

endianness: "host"

method:

seq_packets_per_batch:

sequence_number:

bit_offset: 512

bit_width: 32

packets_per_batch: 1024

// 設定ファイルで DAQIRI を初期化する。

daqiri::daqiri_init("rx_reorder.yaml");

daqiri::BurstParams* burst = nullptr;

// パケットを受信; DAQIRI がそれらを連続する GPU メモリに再順序付けします。

daqiri::get_rx_burst(&burst);

// 再順序付けされたバッチは、処理/推論の準備ができた GPU テンソルです。

auto* tensor = daqiri::get_packet_ptr(burst, 0);

run_model_or_kernel(tensor);

GPU ジョブが完了したら、バッファを DAQIRI に返して、次のバーストで再利用できるようにしてください。

センサーや検出器に DAQIRI を導入する

DAQIRI を用いて、今日からリアルタイム処理をリアルタイムストリーミングワークフローに統合してください！

DAQIRI GitHub リポジトリを探索する

チュートリアルとドキュメントについては DAQIRI Getting Started Docs を読む

ベンチマークと例については GitHub 上の DAQIRI ランディングページを訪れる

謝辞

原文を表示

A-GHOST: Making unsavable data searchable

How DAQIRI works under the hood

NVIDIA DAQIRI Key Features:

High Throughput, Low Latency

Achieve line rate on any interface with proper hardware and CPU/NUMA tuning

Customized Receive Processing

Automated packet reordering, data type conversion, and hardware-based flow steering

Zero Memory Copy to GPU

Direct NIC ring-buffer access (Batched and Header Data Split) to GPU tensor keeps latency at PCIe transit time

YAML-Driven Configuration

Optimized and customizable boilerplate network configurations for ease of deployment

Flexible Data Movement Backends

Linux Sockets, DPDK, and RoCEv2 support for varying application and hardware demands

Plug and Play C++ and Python APIs

Build a real time application and interface with other GPU libraries in minutes, not hours”

The walkthrough below shows how an instrument designer configures a high-speed data stream in small, inspectable pieces, then uses a short C++ loop to receive GPU-ready tensors.

DAQIRI walkthrough

Start with the top-level DAQIRI settings. This establishes the raw streaming path, assigns a CPU core to manage DAQIRI, and keeps logs concise enough for deployment.

%YAML 1.2

---

daqiri:

cfg:

version: 1

stream_type: "raw"

master_core: 3

log_level: "info"

memory_regions:

- name: "rx_packets"

kind: "device"

affinity: 0

num_bufs: 16384

buf_size: 8192

- name: "rx_tensor"

kind: "device"

affinity: 0

num_bufs: 128

buf_size: 32768000

interfaces:

- name: "rx0"

address: "0000:00:00.0"

rx:

flow_isolation: true

queues:

- name: "q0"

id: 0

cpu_core: 9

batch_size: 1024

timeout_us: 2000

memory_regions: ["rx_packets"]

The flow rule narrows the accepted traffic to the expected UDP stream and sends matching packets into queue 0. That keeps the rest of the instrument or network traffic out of this receive path.

flows:

- name: "data_flow"

id: 100

action: {type: queue, id: 0}

match: {udp_src: 4096, udp_dst: 4096}

reorder_configs:

- name: "packets_to_tensor"

reorder_type: "gpu"

memory_region: "rx_tensor"

payload_byte_offset: 68

flow_ids: [100]

data_types:

input_type: "int4"

output_type: "fp16"

endianness: "host"

method:

seq_packets_per_batch:

sequence_number:

bit_offset: 512

bit_width: 32

packets_per_batch: 1024

// Initialize DAQIRI with a config file.

daqiri::daqiri_init("rx_reorder.yaml");

daqiri::BurstParams* burst = nullptr;

// Receive packets; DAQIRI reorders them into contiguous GPU memory.

daqiri::get_rx_burst(&burst);

// The reordered batch is now a GPU tensor ready for processing/inference.

auto* tensor = daqiri::get_packet_ptr(burst, 0);

run_model_or_kernel(tensor);

Once the GPU job is done, return the buffers to DAQIRI so they can be reused for the next burst.

Bring DAQIRI to your sensor or detector

Start integrating real-time processing into your real time streaming workflows with DAQIRI today!

Explore the DAQIRI GitHub repository

Read the DAQIRI Getting Started Docs for Tutorials and Documentations

Visit the DAQIRI Landing Page on GitHub for Benchmarks and Examples

Acknowledgments

この記事をシェア

TechCrunch AI★32026年6月18日 00:00

ロボット学習データの収集は汚く地味な作業だ。一部の AI ラボはすでに XDOF にその業務を委託している

AI ラボの一部が、ロボット訓練に必要なデータ収集という汚く地味な作業を専門企業 XDOF に委託し、対価を支払っている事例が報告された。

404 Media★42026年6月15日 23:53

裁判所、Meta の成人向けサイトデータ収集訴訟を却下せず Meta に訴えられ得ると判断

AI News★42026年6月10日 19:00

マクドナルド、Google 支援の AI ドライブスルー注文システムをテスト中

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

DAQIRI を活用して高速データ収集におけるリアルタイム AI を実現

キーポイント

影響分析

編集コメント

A-GHOST: 保存不可能なデータを検索可能に

DAQIRI の内部仕組み

DAQIRI の手順

センサーや検出器に DAQIRI を導入する

謝辞

A-GHOST: Making unsavable data searchable

How DAQIRI works under the hood

DAQIRI walkthrough

Bring DAQIRI to your sensor or detector

Acknowledgments

関連記事

DAQIRI を活用して高速データ収集におけるリアルタイム AI を実現

キーポイント

影響分析

編集コメント

A-GHOST: 保存不可能なデータを検索可能に

DAQIRI の内部仕組み

DAQIRI の手順

センサーや検出器に DAQIRI を導入する

謝辞

A-GHOST: Making unsavable data searchable

How DAQIRI works under the hood

DAQIRI walkthrough

Bring DAQIRI to your sensor or detector

Acknowledgments

関連記事