Hugging Face Blog·2026年5月10日 03:09·約26分

プライバシーを保護する腫瘍学臨床意思決定支援のための二層マルチエージェントフレームワーク「OncoAgent」の提案

#RAG #LangGraph #QLoRA #医療 AI #プライバシー保護 #マルチエージェント

TL;DR

OncoAgent は、プライバシーを保護しつつ腫瘍学臨床意思決定を支援するオープンソースの二層型マルチエージェントシステムであり、LangGraph と RAG を活用して医療ガイドラインに基づく高精度な判断を可能にする。

AI深層分析2026年7月5日 05:13

重要/ 5段階

深度40%

キーポイント

二層型マルチエージェントアーキテクチャ

LLM の微調整と LangGraph を基盤とした最先端のトポロジーを組み合わせた、プライバシー保護に特化した臨床意思決定支援システムを構築している。

4 段階の補正型 RAG パイプライン

70 以上の医師グレードの NCCN および ESMO ガイドラインを対象に、4 つの段階からなる補正型 RAG（Retrieval-Augmented Generation）パイプラインを実装し、情報の正確性と信頼性を高めている。

プライバシー保護とオープンソース化

患者データの機密性を厳格に維持しつつ、QLoRA などの技術を用いた軽量な微調整モデルを公開しており、医療現場での導入可能性を示唆している。

Corrective RAG with Document Grading

Retrieved documents are graded for clinical relevance before processing, automatically reformulating queries if irrelevant to eliminate hallucinations caused by semantically mismatched content.

Reflexion Safety Loop (Critic Node)

A deterministic code-based critic node performs three-layer validation (formatting, safety rules, and LLM entailment) before output release, ensuring safety enforcement cannot be bypassed by adversarial prompting.

Specialized Medical Embeddings

The system rejects general-purpose models in favor of PubMedBert for embeddings to accurately capture clinical terminology semantics within a zero-cloud ChromaDB vector store.

反ハルシネーション閾値の確立

NCCNコーパスとの距離計算により、医療クエリ（0.06-0.09）とドメイン外入力（0.11-0.15）を明確に区別し、0.10 を閾値として設定することで、ドメイン外の臨床入力に対するハルシネーションをゼロに保証している。

影響分析・編集コメントを表示

影響分析

この発表は、AI が医療現場で実用化される際の最大の障壁であった「プライバシー保護」と「信頼性（ハルシネーションの防止）」を同時に解決する具体的なアーキテクチャを示した点で画期的です。特に、厳格なガイドラインに基づく RAG パイプラインとマルチエージェントによる相互検証は、臨床医が AI を安全に活用するための新たな基準となる可能性があります。

編集コメント

医療 AI の実用化において「信頼性」と「プライバシー」の両立は長年の課題でしたが、このフレームワークはその解決策を具体的な技術スタックで提示しており、今後の臨床導入への期待が高まります。

記事に戻る

thumbnail: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/oncoagent-thumbnail.png

authors:

user: oncoagent-research

tags:

oncology

multi-agent

LangGraph

QLoRA

open-source

clinical-ai

healthcare

OncoAgent: プライバシーを保護する腫瘍学臨床意思決定支援のための二層型マルチエージェントフレームワーク

*技術的プレプリント · 2026 年 5 月 · OncoAgent リサーチグループ*

抄録

私たちは、腫瘍学向けのオープンソースかつプライバシーを保護する臨床意思決定支援システムであるOncoAgentを発表します。OncoAgent は、二層型の微調整済み大規模言語モデル（LLM）アーキテクチャと、最先端のマルチエージェント LangGraph トポロジー、70 以上の医師グレードの NCCN および ESMO ガイドラインを対象とした 4 つの段階からなる補正型 RAG（Retrieval-Augmented Generation：検索拡張生成）パイプライン、そして厳格な Zero-PHI（患者個人識別情報なし）ポリシーを強制する 3 層のリフレクション安全性検証器を組み合わせたものです。

本システムは、臨床クエリを加算的複雑度スコアラーを経由してルーティングし、9B パラメータの速度最適化モデル（Tier 1）または 27B の深層推論モデル（Tier 2）のいずれかに振り分けます。両モデルとも、Unsloth フレームワークを用いて AMD Instinct MI300X ハードウェア（HBM3 192 GB）上で微調整され、266,854件の実在および合成生成された腫瘍学症例のコーパスに基づいています。

MI300X 上のシーケンスパッキングにより、フルデータセットのファインチューニングが約50 分で可能となりました。これは API ベースの生成と比較してスループットが 56 倍加速されたことを意味します。修正後、CRAG（Corrective RAG）によるドキュメント評価は 100% の成功率を達成し、平均 RAG 信頼度スコアは 2.3+ となりました。本システム全体は 100% オープンソースであり、オンプレミスでのデプロイが可能です。これにより、専有クラウド API への依存が排除され、患者データの主権が保持されます。

キーワード: クリニカル・ディシジョン・サポート（臨床意思決定支援）、腫瘍学 AI、マルチエージェントシステム、検索拡張生成（RAG）、QLoRA、AMD ROCm、オープンソース医療 AI、ヒューマン・イン・ザ・ループ（HITL）安全性、LangGraph、Corrective RAG

1. はじめに

腫瘍学は、臨床医学において最も情報密度が高く、認知的負荷の大きい分野の一つです。米国国立総合がんネットワーク（NCCN）から欧州腫瘍内科学会（ESMO）に至るまで、エビデンスに基づくガイドラインの量、多様性、そして急速な進化が、公表されたエビデンスとベッドサイドでの実践との間に持続的な知識格差を生み出しています。

AI を活用した臨床意思決定支援システムは、この格差を埋めるための変革的な可能性を秘めていますが、現在市販されているシステムの多くは、以下の 3 つの重要な点で失敗しています:

検証済みのガイドラインに基づかない幻覚（ハルシネーション）に基づく推奨事項
プライバシーが敏感な病院環境におけるオンプレミスデプロイを妨げるクラウド API への依存
複雑な多疾患併存の症例において文脈飽和を起こしやすいモノリス型 LLM アーキテクチャ

OncoAgent は、以下の 3 つの中核原則に基づいて設計されています:

アーキテクチャの分解：臨床推論は、それぞれに制限された監査可能な機能を備えた 8 つの専門化された LangGraph ノードに分解されます。

根拠に基づく生成：すべてのモデル出力は、明示的な関連性ゲートを持つ 4 段階の検索パイプラインを通じて、キュレーションされたベクトル知識ベースにアンカーされています。

ハードウェア主権：完全な推論およびトレーニングスタックは、ROCm およびオープンソースフレームワークを使用して AMD Instinct MI300X でネイティブに実行され、データ流出なしでの病院展開を可能にします。

2. 関連研究

2.1 臨床用大規模言語モデルと意思決定支援

大規模言語モデルは、診断コード化、文献の要約、患者とのコミュニケーションを含む臨床自然言語処理タスクにおいて大きな可能性を示しています。BioMedLM、Med-PaLM 2、ClinicalBERT に代表されるドメイン固有のファインチューニングアプローチは、汎用モデルよりも医療ベンチマークでのパフォーマンスを一貫して向上させます。OncoAgent はこの研究の流れを拡張し、ハルシネーション（幻覚）の結果が最も深刻となる腫瘍学トリアージおよび治療経路推奨という特定のサブドメインを対象としています。

2.2 マルチエージェントアーキテクチャ

分解されたマルチエージェントシステムは、複雑な推論タスクに対する原理的なアプローチとして登場しました。OncoAgent は、4 つの代表的な SOTA（State-of-the-Art）パターンを統合しています：

Claude Code パターン — 大規模言語モデルの推論から分離された決定論的安全装置

Hermes Agent パターン — セッションごとのメモリ隔離を備えた構造化ツール呼び出し

修正型 RAG (Shi et al., 2024) — ドキュメントの関連性評価とクエリ再構築

Reflexion (Shinn et al., 2023) — フィードバック強化による試行ループを通じた自己修正生成

2.3 医療分野における検索拡張生成 (Retrieval-Augmented Generation)

標準的なバイエンコーダー型検索は、用語の精度が極めて重要な臨床領域（例：「チロシンキナーゼ阻害剤」対「TKI」）には不向きです。OncoAgent は、クロスエンコーダーによる再ランク付けを備えた多段階パイプラインを実装しており、さらに仮説的ドキュメント埋め込み (HyDE; Gao et al., 2022) を統合することで、自然言語クエリをガイドライン埋め込み空間へ投影し、医学的な同義語の不一致を解消します。

3. システムアーキテクチャ

3.1 概要

OncoAgent は、LangGraphを用いた状態保持型有向グラフとして実装されています。システムの状態は、11 の論理セクションと約 30 の型付きキーを含む不変の AgentState TypedDict で表現されます。各ノードは特定のキーに追加を行うだけで、上流データを改変しないため、完全な監査証跡が保持されます。

8 ノードのトポロジは以下の通りです：

Router → Ingestion → Corrective RAG → Specialist ↔ Critic → HITL Gate → Formatter → END

↓

Fallback → END

主要な特性：

5 つの条件付きエッジ

1 つのリフレクション再試行ループ（最大 2 回）

高複雑度または低信頼度の出力に対する必須のヒューマン・イン・ザ・ループ (HITL) 割り込み

症例の複雑さは、専門家の招致前に加重加算モデルを用いて定量化されます：

S = w_cancer + w_stage + w_mutations + w_treatment

ここで：

Factor

Condition

Weight

Cancer type

Rare

+0.40

Cancer type

Unknown primary

+0.30

Stage

Stage IV

+0.25

Stage

Stage III

+0.15

Mutations

≥2 identified

+0.30

Mutations

Single

+0.15

Prior treatment

Any keyword match

+0.10

決定境界： S ≥ 0.5 → Tier 2（Qwen 3.6-27B による深層推論） · S < 0.5 → Tier 1（Qwen 3.5-9B による高速トリアージ）

検証結果： KRAS および BRCA2 変異を有するステージ IV の膵癌症例において、正しく S = 0.80 が算出され、Tier 2 にルーティングされました。✅

臨床医はまた、ユーザーインターフェース（UI）を通じて手動で階層選択を上書きすることも可能です。

3.3 ドキュメント評価を伴う修正型 RAG

CRAG ノードは、各取得ドキュメントを専門家に転送する前に臨床的関連性について評価します。二値分類による関連性判定に失敗したドキュメントは、自動的にクエリ再構成（最大 1 回の再試行）トリガーします。これにより、RAG パイプラインにおける主要なハルシネーション（幻覚）の原因である、「題名は妥当だが意味的に無関係なコンテンツの取得」を排除します。

評価ステップにおいて Qwen 3.5 から Qwen 2.5 Instruct へ移行した後、成功率が0% → 100%に向上し、子宮がんトリアージテストにおける RAG の信頼度スコアは2.3+に達しました。

3.4 リフレクション安全ループ（クリティックノード）

クリティックノードは、あらゆる出力が HITL ゲートに到達する前に、3 層の検証カスケードを実行します：

フォーマットチェック — OncoCoT 出力スキーマへの構造的適合性を検証
セーフティチェック — 禁止された出力パターン（ガイドライン引用のない絶対用量、薬物相互作用の省略など）に対する決定論的ルールベースのスキャン
LLM 推論チェック — スペシャリストの推奨が取得された RAG コンテキストによって完全に支持されていることを確認

FAIL の場合、クリティックからの具体的なフィードバックはスペシャリストのコンテキストに再注入され、リトライが行われます（最大 2 回）。重要なのは、クリティックが LLM に制御されるロジックではなく決定論的コードとして実行される点です。これにより、敵対的なプロンプトによるセーフティ強制の回避を防ぎます。

3.5 ヒューマン・イン・ザ・ループゲートとフォールバック

HITL ゲートは、すべての Tier 2 ケースおよび rag_confidence < 0.3 の出力に対して、必須の臨床医による中断機能を提供します。専用の Fallback ノードが回復不能な失敗を捕捉し、いかなる故障モードにおいても幻覚的な代替案を回避して、臨床的に安全な拒否応答「Información no concluyente en las guías provistas」を返します。

3.6 患者ごとのメモリ分離

PatientMemoryStore モジュールは、各患者セッションに一意の thread_id（形式：PT-XXXX）を割り当てます。これは LangGraph のネイティブチェックポイントシステムへ設定可能なパラメータとして渡されます。これにより、セッション内での反復的な多回対話型相談を可能にしつつ、厳格な患者ごとのメモリ分離が保証されます。

4. 知識ベースの構築と RAG パイプライン

4.1 ガイドラインの取り込みとサニタイゼーション

知識ベースは、138 の NCCN 詳細ページを 60 秒未満で処理した並行ウェブスクレイパーによって特定された77 の直接医師ガイドライン PDFから構築されました。テキスト抽出には PyMuPDF (fitz) がブロックレベルの構文解析に使用され、多段臨床レイアウトの意味的な読み順が保持されています。

取り込み前に、正規表現ベースのサニタイゼーションステップにより機関のブランディングが除去されます。患者向け資料はヒューリスティックフィルタリングによって除外されます。その結果得られたコーパスには、HCC、非小細胞肺癌 (NSCLC)、乳がん、大腸癌、神経内分泌腫瘍を含む主要ながん種すべての70 以上の専門的腫瘍ガイドラインが含まれています。

4.2 医療埋め込みとベクトルストア

一般的な汎用埋め込みモデル（例：all-MiniLM-L6-v2）は、臨床用語の意味論が不十分であるため採用されませんでした。OncoAgent では以下の構成を使用します:

埋め込み: pritamdeka/S-PubMedBert-MS-MARCO — PubMed および MS-MARCO で非対称医療意味検索用にファインチューニング済み

ベクトルストア: ローカル ChromaDB 永続インデックス — クラウド不要、Zero-PHI (患者情報なし) に準拠

4.3 4 段階検索パイプライン

ステージ	コンポーネント	機能	設定
1. リコール	PubMedBERT Bi-Encoder	ワイドネット検索	トップ 15 候補
2. 距離ゲート	コサイン距離フィルタ	幻覚防止の下限	しきい値 = 0.10
3. リランキング	クロスエンコーダ (MS-MARCO MiniLM)	照合クエリとドキュメントの関連性	トップ 5 を返却
4. コンテキストトリミング	文字予算制限器	LLM コンテキストウィンドウ内に収める	最大 6,000 文字

反幻覚ポリシー: ステージ 2 で失敗したクエリについては、スペシャリストを呼び出すことなく「Información no concluyente en las guías provistas」と返答します。これにより、ドメイン外の臨床入力に対する幻覚に基づく推奨がゼロに保証されます。

NCCN コーパスに対する距離閾値の較正結果は以下の通りです：

医療クエリの距離：約 0.06–0.09

ドメイン外データの距離：約 0.11–0.15

ハード閾値：0.10

オプションのHyDEモジュールは、仮説的なガイドライン段落を生成し、それをステージ 1 の検索における埋め込みアンカーとして使用します。これにより、「neoplasia pulmonar」と「lung carcinoma」のような同義語の不整合が解消されます。

5. デュアルティア QLoRA 微調整

5.1 訓練コーパス：OncoCoT（266,854 サンプル）

ソース

タイプ

サンプル数

備考

PMC-Patients

実際の臨床症例

約 85,000

PubMed Central の患者報告書

Asclepius

実際の臨床データ

約 85,000

キュレーションされた医療 QA コーパス

OncoCoT Synthetic

合成データ（Qwen 3.6-27B）

96,941

MI300X で生成、時速約 6,800 サンプル・拒否率 0.65%

合計

—

266,854

90/10 の訓練/評価分割・SHA-256 ハッシュ化・重複除去済み

すべての症例は Qwen 互換性のための ChatML テンプレートを使用しています。JSON パースの破損を防ぐため、思考トークンは無効化されています（chat_template_kwargs: {enable_thinking: False}）。

5.2 QLoRA 設定

両ティアとも BitsAndBytes を介して 4 ビット NormalFloat4 (NF4) 量子化を使用し、LoRA アダプターは主要な投影モジュールすべて（q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj）を対象としています。

パラメータ

ティア 1 (Qwen 3.5-9B)

ティア 2 (Qwen 3.6-27B)

デバイスごとのバッチサイズ

勾配累積

実効バッチサイズ

学習率

2×10⁻⁴

1×10⁻⁴

LoRA 階層 (*r*)

シーケンスパッキング

True, 2048 トークン

早期打ち切り

忍耐値 = 3

量子化

NF4 4-bit (NF4 4 ビット)

5.3 Unsloth を用いた AMD MI300X の最適化

元の HuggingFace transformers + PEFT パイプラインは、2 つの独立した問題により MI300X で失敗しました:

trl v0.24.0 の厳格な EOS 検証と Qwen3VLProcessor ラッパーとの間のトークン化競合

標準精度における目標の実効バッチサイズに対する VRAM (ビデオメモリ) の余裕不足

Unsloth's FastLanguageModel への移行により、両方の問題が同時に解決されました:

VRAM 削減: ピーク使用量が約 60% 減少（OOM から安定した約 64 GB に、192 GB デバイス上で）

訓練速度: 実効バッチサイズ 16 で約 2 倍の向上、ステップあたり約 16 秒に

AMD ROCm 固有の適応策として必要なのは:

1. Qwen3VLProcessor ラッパーではなく内部トークナイザーを渡す

trainer = SFTTrainer(tokenizer=model.get_tokenizer(), ...)

2. 互換性のない EOS の注入を防ぐ

training_args = SFTConfig(eos_token=None, ...)

3. ROCm 6.2/gfx942 用の AMD 固有の bitsandbytes

pip install bitsandbytes --find-links <amd-continuous-release-wheel>

4. BF16 のワークアラウンド（ハードウェアはサポートしているにもかかわらず、ROCm では is_bf16_supported() が False を返す）

training_args = TrainingArguments(fp16=True, ...)

最終的なデプロイではネイティブの BF16 を使用:

model = AutoModelForCausalLM.from_pretrained(..., torch_dtype=torch.bfloat16)

5.4 シーケンスパッキングとスループットの飛躍

SFTConfig の packing=True を介したシーケンスパッキングは、複数の短い臨床記録を単一の 2048 トークンシーケンスに結合し、パディングのオーバーヘッドを排除して順伝播回数を劇的に削減します。

Unsloth カーネルとシーケンスパッキングが MI300X でもたらす相乗効果により、266,854 サンプルのコパス全体でのファインチューニングが約 50 分で完了しました（当初の見積もりは 5 時間）。これはおよそ 6 倍のトレーニング時間の短縮を意味します。GPU の利用率は最大で約 70% に達し、一貫して約 11.3 秒/イテレーションのスループットを維持しました。

チェックポイント-1000 の結果: Tier 1 アダプターは 1,339 ステップ訓練され、トレーニング損失は約 0.05、アダプターのサイズは 187 MB です。adapter_model.safetensors、adapter_config.json、tokenizer.json を含む 11 ファイルのマニフェストに対して検証済みです。

本システムは適応型推論ルーティングをサポートしています：ROCm が利用可能な場合は LocalModelManager シングルトンによるローカル BF16 推論を行い、高可用性を確保するために Featherless.ai API へシームレスにフォールバックします。

6. セーフティとプライバシーフレームワーク

6.1 ゼロ PHI ポリシー

専用ゼロ PHI 削除ノードは、Ingestion ノード内の最初の処理ステップとして実行され、テキストが LLM に到達する前に動作します。これは、保護対象医療情報（患者名、生年月日、MRN 番号、住所、施設識別子）を特定し、臨床的に中立なプレースホルダーに置換します。削除された表現は AgentState に保存され、元のテキストは破棄されます。

これにより、ローカルまたはリモートに関わらず、いかなる下流の LLM 呼び出しにも PHI が到達せず、ポリシーではなく設計によって HIPAA の非識別化要件を満たします。

6.2 レイヤー別安全アーキテクチャ

システムの安全性保証は、4 つの独立したレイヤーで強制されます。単一のレイヤーでの失敗が全体のセキュリティ姿勢を損なうことはありません。

レイヤー	メカニズム	対応する課題
L1: 検索ゲート	ディスタンスゲート（コサイン閾値 0.10）	ドメイン外のハルシネーション
L2: 信頼度ゲート	RAG 信頼度スコア < 0.3 → ブロック	低品質な検索根拠
L3: リフレクション批評家	フォーマット + 安全性 + LLM 含意（最大 2 回の再試行）	サポートされていない、または不安全な専門家の出力
L4: HITL ゲート	Tier 2 / 標的ケースにおける必須の臨床医による中断	専門家の判断を要する高複雑度ケース

レイヤー 1 と 2 は検索レイヤーで動作し、レイヤー 3 は生成レイヤー、レイヤー 4 はデプロイメントレイヤーで動作します。すべてのレイヤー 3 のチェックは、LLM で制御されるロジックではなく決定論的なコードとして実行され、敵対的プロンプティングによる安全性の回避を防ぎます。

7. クリニカルインターフェース

必ず JSON 形式で返してください。translation フィールドのみ。他のフィールド (technical_terms 等) は一切追加しないこと — 余計なフィールドを書こうとして本文翻訳がトークン上限で打ち切られる事故を防ぐため:

{"translation": "専用ゼロ PHI 削除ノードは、Ingestion ノード内の最初の処理ステップとして実行され、テキストが LLM に到達する前に動作します。これは、保護対象医療情報（患者名、生年月日、MRN 番号、住所、施設識別子）を特定し、臨床的に中立なプレースホルダーに置換します。削除された表現は AgentState に保存され、元のテキストは破棄されます。

6.2 レイヤー別安全アーキテクチャ

システムの安全性保証は、4 つの独立したレイヤーで強制されます。単一のレイヤーでの失敗が全体のセキュリティ姿勢を損なうことはありません。

レイヤー	メカニズム	対応する課題
L1: 検索ゲート	ディスタンスゲート（コサイン閾値 0.10）	ドメイン外のハルシネーション
L2: 信頼度ゲート	RAG 信頼度スコア < 0.3 → ブロック	低品質な検索根拠
L3: リフレクション批評家	フォーマット + 安全性 + LLM 含意（最大 2 回の再試行）	サポートされていない、または不安全な専門家の出力
L4: HITL ゲート	Tier 2 / 標的ケースにおける必須の臨床医による中断	専門家の判断を要する高複雑度ケース

7. クリニカルインターフェース"}

OncoAgent の UI は、ChatGPT スタイルの対話レイアウトを持つリアルタイムストリーミング Gradio アプリケーションとして実装されています。主な機能は以下の通りです。

左サイドバー：セッション制御、KPI タイル、証拠ソースタブ
メインチャットエリア：各ノードが完了するたびに、ライブでエージェントによる推論の更新が表示されます

リアルタイムでの透明性は、LangGraph の .stream(stream_mode="updates") API を通じて実現されており、各ノードが完了するたびに {node_name: node_output} 形式の辞書を出力します。UI は各ノードを人間が読みやすい臨床ラベル（例：corrective_rag → "NCCN/ESMO ガイドラインの取得中"）にマッピングし、医療従事者にパイプライン全体を可視化できるようにしています。

rag_confidence スコアと取得されたソース数は目立つように表示されており、各推奨事項の背後にあるガイドラインの根拠の質について、医療従事者が即座に確認できるようになっています。

このインターフェースは WCAG 2.1 AA 基準に合わせて設計されています。Lucide スタイルのインライン SVG アイコン、slate-900/sky-500 のダークテーマ、Figtree/Inter タイポグラフィ、prefers-reduced-motion メディアクエリへの対応、すべての遷移を 200 ms に制限しています。

8. 結果

コンポーネント	メトリック	値
知識ベース	取り込んだガイドライン数	70+
	パースされた PDF 数	< 60 秒で 138 件
	インデックスパースエラー	0
CRAG パイプライン	ドキュメント評価成功率（修正後）	100%
	RAG 信頼度スコア（子宮がんテスト）	2.3+（修正前は 0.0）
	並列評価レイテンシ（3〜5 ドキュメント）	< 5 秒
複雑性ルーター	IV 期膵臓癌 + KRAS + BRCA2	スコア = 0.80 → Tier 2 ✅
トレーニング（Tier 1, 9B）	完全な 266k サンプルの学習時間	~50 分（推定 5 時間と比較）

定常スループット

~11.3–16 秒/ステップ

GPU 利用率 (MI300X)

~ピーク時 70%

VRAM 利用率 (Unsloth)

~64 GB / 192 GB

チェックポイント-1000 時点のトレーニング損失

~0.05

合成データのスループット (MI300X vs. API)

6,800 vs. 120 事例/時 (56 倍 ↑)

合成コーパス拒否率

0.65%

グラフトポロジー

コンパイル済みノードの検証結果

8 / 8

モジュールテストスイート合格数

6 / 6

トリアージ中のブラウザタイムアウト

UI レンダリングレイテンシ

< 200 ms

9. 考察

9.1 クリニカル要件としてのハードウェア主権

トレーニング、推論、RAG (Retrieval-Augmented Generation)、および UI を含む OncoAgent の完全なスタックを、クラウド API への依存なしに単一の AMD MI300X インスタンス上で実行できる能力は、単なるエンジニアリング上の利便性ではありません。HIPAA(米国)、GDPR(EU)、および同等の国内枠組みによって規律される病院環境において、データを制御されたインフラ内に維持するという法的かつ倫理的義務は絶対的なものです。OncoAgent は、この制約内でも最先端 (SOTA) のマルチエージェント型臨床 AI を実現可能であることを示しています。

9.2 スループットの画期的進展

56 倍の合成データ生成加速（約 120 件/時間から約 6,800 件/時間へ）と、約 6 倍のトレーニング時間短縮は、時間制約のある環境におけるドメイン固有のファインチューニングの実現可能性に対して、実用的に極めて重要な貢献を果たしています。これらの結果は、AMD の CDNA3 アーキテクチャが Unsloth の Triton カーネル最適化と SFT シーケンスパッキングと組み合わされた場合、標準的な HuggingFace トレーニングパイプラインによって大幅に過小評価されている可能性を示唆しており、基盤となるモデルアーキテクチャを変更することなく、性能格差を埋めることが可能であることを示しています。

9.3 限界

いくつかの限界を認識する必要があります：

トレーニングコーパスは、約 36% が合成データによって生成されたケースに依存しています。臨床的精度の検証については、認定腫瘍医の判断との比較がまだ大規模には実施されていません。

現在の知識ベースは主に英語で記述された NCCN ガイドラインをカバーしており、ESMO や非英語圏の臨床コーパスは今後の課題として残されています。

Tier 1 アダプターは、より長い軌跡におけるチェックポイント 1000 に相当します。完全な収束と、下流の臨床ベンチマーク評価（MedQA、USMLE スタイルの腫瘍学サブセット）は、今後のリリースで予定されています。

10. 結論

OncoAgent は、最先端のマルチエージェント設計パターン、ドメイン固有のファインチューニング、および 4 つの段階からなるグラウンデッド検索パイプラインを統合した、腫瘍学における完全かつオープンソースでプライバシーを保護する臨床意思決定支援アーキテクチャを確立しました。

本システムは、生産環境レベルの臨床用 AI に専用インフラが必要ないことを示しています。26 万 6000 サンプルによる QLoRA（Quantized Low-Rank Adaptation）ファインチューニング、70 以上のガイドラインに基づく RAG（Retrieval-Augmented Generation）、8 ノード構成の LangGraph オーケストレーション、3 層構造のリフレクション安全性検証、リアルタイム臨床ストリーミング UI を含むフルスタックが、ROCm 環境下の単一の AMD Instinct MI300X インスタンス上で動作します。

アーキテクチャ上の貢献、特に Corrective RAG（修正型検索拡張生成）、Reflexion（自己反省メカニズム）、HITL（Human-in-the-Loop）ゲート機能を統合した一貫性のある安全性スタックの構築は、ハルシネーション（幻覚）の結果が生命に関わるドメイン固有の臨床 AI 展開において、再現可能な青写真となります。

すべてのコード、アダプター重み、および OncoCoT 合成コーパスは、Hugging Face Spaces および GitHub で公開されます。

References

Singhal, K. et al. (2023). Large language models encode clinical knowledge. Nature, 620, 172–180.

Nori, H. et al. (2023). Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. arXiv:2311.16452.

Wang, L. et al. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6), 186345.

Shi, W. et al. (2024). Corrective Retrieval Augmented Generation. arXiv:2401.15884.

Shinn, N. et al. (2023). Reflexion: Language agents with verbal reinforcement learning. NeurIPS 2023.

Nogueira, R. and Cho, K. (2019). Passage Re-ranking with BERT. arXiv:1901.04085.

Gao, L. 他 (2022). 関連性ラベルなしの精密なゼロショット密検索。arXiv:2212.10496.

Hu, E.J. 他 (2021). LoRA: 大規模言語モデルの低ランク適応。arXiv:2106.09685.

Dettmers, T. 他 (2023). QLoRA: 量子化された大規模言語モデルの効率的なファインチューニング。NeurIPS 2023.

Han, S. 他 (2024). LangGraph: LLM を用いた状態保持型マルチアクターアプリケーションの構築。LangChain Technical Report.

*OncoAgent は臨床意思決定支援ツールとして意図されています。いかなる臨床応用を行う前に、すべての出力は資格を有する医療専門家によるレビューが必要です。*

原文を表示

Back to Articles

thumbnail: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/oncoagent-thumbnail.png

authors:

user: oncoagent-research

tags:

oncology

multi-agent

LangGraph

QLoRA

open-source

clinical-ai

healthcare

OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support

*Technical preprint · May 2026 · OncoAgent Research Group*

Abstract

We present OncoAgent, an open-source, privacy-preserving clinical decision support system for oncology. OncoAgent combines a dual-tier fine-tuned LLM architecture with a state-of-the-art multi-agent LangGraph topology, a four-stage Corrective RAG pipeline over 70+ physician-grade NCCN and ESMO guidelines, and a three-layer reflexion safety validator enforcing a strict Zero-PHI policy.

The system routes clinical queries through an additive complexity scorer to either a 9B parameter speed-optimised model (Tier 1) or a 27B deep-reasoning model (Tier 2), both fine-tuned via QLoRA on a corpus of 266,854 real and synthetically generated oncological cases using the Unsloth framework on AMD Instinct MI300X hardware (192 GB HBM3).

Sequence packing on MI300X enabled full-dataset fine-tuning in approximately 50 minutes — a 56× throughput acceleration over API-based generation. Post-fix, CRAG document grading achieved a 100% success rate with a mean RAG confidence score of 2.3+. The complete system is 100% open source and deployable on-premises, eliminating proprietary cloud API dependency and preserving patient data sovereignty.

Keywords: clinical decision support, oncology AI, multi-agent systems, retrieval-augmented generation, QLoRA, AMD ROCm, open-source healthcare AI, HITL safety, LangGraph, Corrective RAG

1. Introduction

Oncology is one of the most information-dense and cognitively demanding domains in clinical medicine. The volume, heterogeneity, and rapid evolution of evidence-based guidelines — from the National Comprehensive Cancer Network (NCCN) to the European Society for Medical Oncology (ESMO) — create a persistent knowledge gap between published evidence and bedside practice.

AI-assisted clinical decision support systems hold transformative potential for closing this gap, yet most commercially available systems fail in three critical ways:

Hallucinated recommendations not grounded in validated guidelines

Cloud API dependency that precludes on-premises deployment in privacy-sensitive hospital environments

Monolithic LLM architectures prone to context saturation under complex multi-comorbidity presentations

OncoAgent is designed around three core principles:

Architectural decomposition: Clinical reasoning is decomposed across eight specialised LangGraph nodes, each with a bounded, auditable function.

Grounded generation: All model outputs are anchored to a curated vector knowledge base through a four-stage retrieval pipeline with explicit relevance gating.

Hardware sovereignty: The full inference and training stack runs natively on AMD Instinct MI300X using ROCm and open-source frameworks — enabling hospital deployment without data exfiltration.

2. Related Work

2.1 Clinical LLMs and Decision Support

Large language models have demonstrated significant promise in clinical NLP tasks including diagnostic coding, literature summarisation, and patient communication. Domain-specific fine-tuning approaches — exemplified by BioMedLM, Med-PaLM 2, and ClinicalBERT — consistently improve performance on medical benchmarks over general-purpose models. OncoAgent extends this line of work by targeting the specific subdomain of oncological triage and treatment pathway recommendation, where hallucination consequences are most severe.

2.2 Multi-Agent Architectures

Decomposed multi-agent systems have emerged as a principled approach to complex reasoning tasks. OncoAgent synthesises four canonical SOTA patterns:

Claude Code pattern — deterministic safety harnesses separated from LLM reasoning

Hermes Agent pattern — structured tool-calling with per-session memory isolation

Corrective RAG (Shi et al., 2024) — document relevance grading and query reformulation

Reflexion (Shinn et al., 2023) — self-correcting generation via feedback-augmented retry loops

2.3 Retrieval-Augmented Generation in Medicine

Standard bi-encoder retrieval is ill-suited for clinical domains where terminological precision is critical (e.g., "tyrosine kinase inhibitor" vs. "TKI"). OncoAgent implements a multi-stage pipeline with cross-encoder re-ranking, and integrates Hypothetical Document Embeddings (HyDE; Gao et al., 2022) to resolve medical synonym mismatches by projecting natural language queries into the guideline embedding space.

3. System Architecture

3.1 Overview

OncoAgent is implemented as a stateful directed graph using LangGraph. The system state is represented as an immutable AgentState TypedDict containing 11 logical sections and approximately 30 typed keys. Each node appends to specific keys without mutating upstream data, preserving a complete audit trail.

The 8-node topology is:

code

Router → Ingestion → Corrective RAG → Specialist ↔ Critic → HITL Gate → Formatter → END
                                                                   ↓
                                                               Fallback → END

Key properties:

5 conditional edges

1 reflexion retry loop (max 2 iterations)

1 mandatory HITL interrupt for high-complexity or low-confidence outputs

3.2 Complexity Router and Model Tiering

Case complexity is quantified using a weighted additive model prior to specialist invocation:

code

S = w_cancer + w_stage + w_mutations + w_treatment

Where:

Factor

Condition

Weight

Cancer type

Rare

+0.40

Cancer type

Unknown primary

+0.30

Stage

Stage IV

+0.25

Stage

Stage III

+0.15

Mutations

≥2 identified

+0.30

Mutations

Single

+0.15

Prior treatment

Any keyword match

+0.10

Decision boundary: S ≥ 0.5 → Tier 2 (Qwen 3.6-27B deep reasoning) · S < 0.5 → Tier 1 (Qwen 3.5-9B speed triage)

Validation: A Stage IV pancreatic carcinoma case with KRAS + BRCA2 mutations correctly produced S = 0.80, routing to Tier 2. ✅

Clinicians may also manually override the tier selection through the UI.

3.3 Corrective RAG with Document Grading

The CRAG node grades each retrieved document for clinical relevance before forwarding to the Specialist. Documents that fail binary relevance classification trigger automatic query reformulation (max 1 retry). This eliminates the primary hallucination source in RAG pipelines — retrieval of plausibly titled but semantically irrelevant content.

After migrating from Qwen 3.5 to Qwen 2.5 Instruct for the grading step, success rate improved from 0% → 100%, with RAG confidence score reaching 2.3+ on uterine cancer triage tests.

3.4 Reflexion Safety Loop (Critic Node)

The Critic node runs a three-layer validation cascade before any output reaches the HITL gate:

Formatting check — validates structural compliance with the OncoCoT output schema

Safety check — deterministic rule-based scan for prohibited output patterns (absolute dosing without guideline citation, drug interaction omissions, etc.)

LLM entailment check — verifies that the Specialist's recommendation is fully supported by the retrieved RAG context

On FAIL, the Critic's specific feedback is injected back into the Specialist context for a retry (max 2 iterations). Crucially, the Critic runs as deterministic code, not LLM-controlled logic — ensuring safety enforcement cannot be bypassed by adversarial prompting.

3.5 Human-in-the-Loop Gate and Fallback

The HITL gate provides a mandatory clinician interrupt for all Tier 2 cases and any output where rag_confidence < 0.3. A dedicated Fallback node catches unrecoverable failures and returns a clinically safe refusal — "Información no concluyente en las guías provistas" — avoiding hallucinated alternatives under any failure mode.

3.6 Per-Patient Memory Isolation

The PatientMemoryStore module assigns each patient session a unique thread_id (format PT-XXXX), passed as a configurable parameter to LangGraph's native checkpointing system. This enforces strict per-patient memory isolation while enabling iterative multi-turn consultations within a session.

4. Knowledge Base Construction and RAG Pipeline

4.1 Guideline Ingestion and Sanitisation

The knowledge base was constructed from 77 direct physician guideline PDFs identified by a concurrent web scraper that processed 138 NCCN detail pages in under 60 seconds. Text extraction used PyMuPDF (fitz) for block-level structural parsing, preserving the semantic reading order of multi-column clinical layouts.

A regex-based sanitisation step strips institutional branding prior to ingestion. Patient-facing materials are excluded via heuristic filtering. The resulting corpus covers 70+ professional oncological guidelines across all major cancer types including HCC, NSCLC, breast, colorectal, and neuroendocrine tumours.

4.2 Medical Embeddings and Vector Store

Standard general-purpose embedding models (e.g., all-MiniLM-L6-v2) were rejected due to poor clinical terminology semantics. OncoAgent uses:

Embeddings: pritamdeka/S-PubMedBert-MS-MARCO — fine-tuned on PubMed and MS-MARCO for asymmetric medical semantic search

Vector store: Local ChromaDB persistent index — zero-cloud, Zero-PHI compliant

4.3 Four-Stage Retrieval Pipeline

Stage

Component

Function

Configuration

1. Recall

PubMedBERT Bi-Encoder

Wide-net retrieval

top-15 candidates

2. Distance Gate

Cosine Distance Filter

Anti-hallucination floor

threshold = 0.10

3. Re-Ranking

Cross-Encoder (MS-MARCO MiniLM)

Joint query-document relevance

top-5 returned

4. Context Trimming

Character-Budget Limiter

Fit within LLM context window

max 6,000 chars

Anti-Hallucination Policy: Any query failing Stage 2 returns "Información no concluyente en las guías provistas" without invoking the Specialist. This guarantees zero hallucinated recommendations for out-of-domain clinical inputs.

Distance threshold calibration against the NCCN corpus established:

Medical-query distances: ~0.06–0.09

Out-of-domain distances: ~0.11–0.15

Hard threshold: 0.10

An optional HyDE module generates a hypothetical guideline paragraph and uses it as the embedding anchor for Stage 1 retrieval, resolving synonym mismatches (e.g., "neoplasia pulmonar" vs. "lung carcinoma").

5. Dual-Tier QLoRA Fine-Tuning

5.1 Training Corpus: OncoCoT (266,854 Samples)

Source

Type

Samples

Notes

PMC-Patients

Real clinical cases

~85,000

PubMed Central patient reports

Asclepius

Real clinical data

~85,000

Curated medical QA corpus

OncoCoT Synthetic

Synthetic (Qwen 3.6-27B)

96,941

Generated on MI300X at ~6,800 cases/hr · rejection rate 0.65%

Total

—

266,854

90/10 train/eval split · SHA-256 hashed · deduplicated

All cases use ChatML template for Qwen compatibility. Thinking tokens were disabled (chat_template_kwargs: {enable_thinking: False}) to prevent JSON parse corruption.

5.2 QLoRA Configuration

Both tiers use 4-bit NormalFloat4 (NF4) quantisation via BitsAndBytes, with LoRA adapters targeting all major projection modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj.

Parameter

Tier 1 (Qwen 3.5-9B)

Tier 2 (Qwen 3.6-27B)

Per-device batch size

Gradient accumulation

Effective batch size

Learning rate

2×10⁻⁴

1×10⁻⁴

LoRA rank (*r*)

Sequence packing

True, 2048 tokens

Early stopping

Patience = 3

Quantisation

NF4 4-bit

5.3 AMD MI300X Optimisation with Unsloth

The original HuggingFace transformers + PEFT pipeline failed on the MI300X due to two independent issues:

Tokenisation conflicts between trl v0.24.0 strict EOS validation and the Qwen3VLProcessor wrapper

Insufficient VRAM headroom for target effective batch sizes under standard precision

Migration to Unsloth's FastLanguageModel resolved both simultaneously:

VRAM reduction: ~60% drop in peak usage (from OOM to stable ~64 GB on 192 GB device)

Training speed: ~2× improvement to ~16 s/step at effective batch 16

AMD ROCm-specific adaptations required:

code

# 1. Pass inner tokenizer, not the Qwen3VLProcessor wrapper
trainer = SFTTrainer(tokenizer=model.get_tokenizer(), ...)

# 2. Prevent incompatible EOS injection
training_args = SFTConfig(eos_token=None, ...)

# 3. AMD-specific bitsandbytes for ROCm 6.2/gfx942
# pip install bitsandbytes --find-links <amd-continuous-release-wheel>

# 4. BF16 workaround (is_bf16_supported() returns False on ROCm despite hardware support)
training_args = TrainingArguments(fp16=True, ...)
# Final deployment uses native BF16:
model = AutoModelForCausalLM.from_pretrained(..., torch_dtype=torch.bfloat16)

5.4 Sequence Packing and Throughput Breakthrough

Sequence packing via packing=True in SFTConfig concatenates multiple short clinical records into single 2048-token sequences, eliminating padding overhead and drastically reducing forward pass count.

The combined effect of Unsloth kernels and sequence packing on the MI300X enabled full-dataset fine-tuning of the 266,854-sample corpus in approximately 50 minutes — against an initial 5-hour estimate — representing roughly a 6× training time compression. GPU utilisation peaked at ~70%, with consistent throughput at ~11.3 s/iteration.

Checkpoint-1000 results: Tier 1 adapter trained for 1,339 steps · training loss ≈ 0.05 · adapter size 187 MB · verified against 11-file manifest including adapter_model.safetensors, adapter_config.json, and tokenizer.json.

The system supports adaptive inference routing: local BF16 inference via the LocalModelManager singleton when ROCm is available, with graceful fallback to the Featherless.ai API for high availability.

6. Safety and Privacy Framework

6.1 Zero-PHI Policy

A dedicated Zero-PHI redaction node runs as the first processing step in the Ingestion node, before any text reaches an LLM. It identifies and replaces Protected Health Information (patient names, dates of birth, MRN numbers, addresses, facility identifiers) with clinically neutral placeholders. The redacted representation is stored in AgentState; the original text is discarded.

This ensures that no PHI reaches any downstream LLM call — local or remote — and satisfies HIPAA de-identification requirements by design rather than policy.

6.2 Layered Safety Architecture

The system's safety guarantees are enforced at four independent layers. A failure at any single layer does not compromise the overall posture.

Layer

Mechanism

Addresses

L1: Retrieval Gate

Distance Gate (cosine threshold 0.10)

Out-of-domain hallucinations

L2: Confidence Gate

RAG confidence score < 0.3 → block

Low-quality retrieval grounding

L3: Reflexion Critic

Formatting + safety + LLM entailment (max 2 retries)

Unsupported or unsafe Specialist outputs

L4: HITL Gate

Mandatory clinician interrupt for Tier 2 / flagged cases

High-complexity cases requiring expert judgment

Layers 1 and 2 operate at the retrieval layer. Layer 3 at the generation layer. Layer 4 at the deployment layer. All Layer 3 checks run as deterministic code — not LLM-controlled logic — preventing safety bypass via adversarial prompting.

7. Clinical Interface

The OncoAgent UI is implemented as a real-time streaming Gradio application in a ChatGPT-style conversational layout. It features:

Left sidebar: Session controls, KPI tiles, evidence source tabs

Main chat area: Live agentic reasoning updates as each node completes

Real-time transparency is achieved via LangGraph's .stream(stream_mode="updates") API, which emits {node_name: node_output} dictionaries as each node completes. The UI maps each node to a human-readable clinical label (e.g., corrective_rag → *"Retrieving NCCN/ESMO guidelines"*), providing clinicians with full pipeline visibility.

The rag_confidence score and retrieved source count are prominently surfaced, giving clinicians immediate visibility into the quality of guideline grounding behind each recommendation.

The interface was designed to WCAG 2.1 AA standards — Lucide-style inline SVG icons, slate-900/sky-500 dark theme, Figtree/Inter typography, prefers-reduced-motion media query, all transitions capped at 200 ms.

8. Results

Component

Metric

Value

Knowledge Base

Guidelines ingested

70+

PDFs parsed

138 in < 60 s

Index parsing errors

CRAG Pipeline

Document grading success rate (post-fix)

100%

RAG confidence score (uterine cancer test)

2.3+ (was 0.0 pre-fix)

Parallel grading latency (3–5 docs)

< 5 s

Complexity Router

Stage IV pancreatic + KRAS + BRCA2

Score = 0.80 → Tier 2 ✅

Training (Tier 1, 9B)

Full 266k-sample training time

~50 min (vs. 5 hr estimate)

Steady-state throughput

~11.3–16 s/step

GPU utilisation (MI300X)

~70% peak

VRAM utilisation (Unsloth)

~64 GB / 192 GB

Training loss at checkpoint-1000

~0.05

Synthetic data throughput (MI300X vs. API)

6,800 vs. 120 cases/hr (56× ↑)

Synthetic corpus rejection rate

0.65%

Graph Topology

Compiled nodes verified

8 / 8

Module test suites passed

6 / 6

Browser timeouts during triage

UI rendering latency

< 200 ms

9. Discussion

9.1 Hardware Sovereignty as a Clinical Requirement

The ability to run the complete OncoAgent stack — training, inference, RAG, and UI — on a single AMD MI300X instance without cloud API dependencies is not merely an engineering convenience. In hospital environments governed by HIPAA (US), GDPR (EU), and equivalent national frameworks, the legal and ethical obligation to maintain data within controlled infrastructure is absolute. OncoAgent demonstrates that SOTA multi-agent clinical AI is achievable within this constraint.

9.2 The Throughput Breakthrough

The 56× synthetic data generation acceleration (from ~120 to ~6,800 cases/hr) and the ~6× training time compression together represent a significant practical contribution to the feasibility of domain-specific fine-tuning in time-constrained settings. These results suggest that AMD's CDNA3 architecture, when paired with Unsloth's Triton kernel optimisations and SFT sequence packing, may be substantially underutilised by standard HuggingFace training pipelines — and that the performance gap can be closed without changes to the underlying model architecture.

9.3 Limitations

Several limitations warrant acknowledgement:

The training corpus relies on approximately 36% synthetically generated cases. Clinical accuracy validation against board-certified oncologist judgments has not yet been performed at scale.

The current knowledge base covers NCCN guidelines primarily in English; ESMO and non-English clinical corpora remain for future work.

The Tier 1 adapter represents checkpoint-1000 of a potentially longer trajectory; full convergence and downstream clinical benchmark evaluation (MedQA, USMLE-style oncology subsets) are planned for subsequent releases.

10. Conclusion

OncoAgent establishes a complete, open-source, privacy-preserving clinical decision support architecture for oncology that integrates SOTA multi-agent design patterns, domain-specific fine-tuning, and a four-stage grounded retrieval pipeline.

The system demonstrates that production-grade clinical AI does not require proprietary infrastructure: the full stack — including 266k-sample QLoRA fine-tuning, 70+ guideline RAG, eight-node LangGraph orchestration, three-layer reflexion safety validation, and real-time clinical streaming UI — runs on a single AMD Instinct MI300X instance under ROCm.

The architectural contributions — particularly the synthesis of Corrective RAG, Reflexion, and HITL gating into a single coherent safety stack — represent a replicable blueprint for domain-specific clinical AI deployments where hallucination consequences are life-critical.

All code, adapter weights, and the OncoCoT synthetic corpus will be released publicly on Hugging Face Spaces and GitHub.

References

Singhal, K. et al. (2023). Large language models encode clinical knowledge. Nature, 620, 172–180.

Nori, H. et al. (2023). Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. arXiv:2311.16452.

Wang, L. et al. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6), 186345.

Shi, W. et al. (2024). Corrective Retrieval Augmented Generation. arXiv:2401.15884.

Shinn, N. et al. (2023). Reflexion: Language agents with verbal reinforcement learning. NeurIPS 2023.

Nogueira, R. and Cho, K. (2019). Passage Re-ranking with BERT. arXiv:1901.04085.

Gao, L. et al. (2022). Precise Zero-Shot Dense Retrieval without Relevance Labels. arXiv:2212.10496.

Hu, E.J. et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685.

Dettmers, T. et al. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. NeurIPS 2023.

Han, S. et al. (2024). LangGraph: Building stateful multi-actor applications with LLMs. LangChain Technical Report.

*OncoAgent is intended as a clinical decision support tool. All outputs require review by licensed medical professionals prior to any clinical application.*

この記事をシェア

MarkTechPost2026年7月4日 06:25

lift-pdf を用いたスキーマ指向の請求書インテリジェンスパイプライン設計：経理処理・検証・帳簿生成のための抽出手法

TechCrunch AI2026年7月4日 03:43

ブラウザ戦争はもはや検索が主役ではない — Chrome や Safari に代わる最良の代替案

The Verge AI2026年7月3日 20:49

ミッドジャーニーの医療用スキャナー、多くの疑問を残した裏側レポート

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Hugging Face Blog·2026年5月10日 03:09·約26分

プライバシーを保護する腫瘍学臨床意思決定支援のための二層マルチエージェントフレームワーク「OncoAgent」の提案

#RAG #LangGraph #QLoRA #医療 AI #プライバシー保護 #マルチエージェント

TL;DR

AI深層分析2026年7月5日 05:13

重要/ 5段階

深度40%

キーポイント

二層型マルチエージェントアーキテクチャ

LLM の微調整と LangGraph を基盤とした最先端のトポロジーを組み合わせた、プライバシー保護に特化した臨床意思決定支援システムを構築している。

4 段階の補正型 RAG パイプライン

プライバシー保護とオープンソース化

患者データの機密性を厳格に維持しつつ、QLoRA などの技術を用いた軽量な微調整モデルを公開しており、医療現場での導入可能性を示唆している。

Corrective RAG with Document Grading

Retrieved documents are graded for clinical relevance before processing, automatically reformulating queries if irrelevant to eliminate hallucinations caused by semantically mismatched content.

Reflexion Safety Loop (Critic Node)

Specialized Medical Embeddings

The system rejects general-purpose models in favor of PubMedBert for embeddings to accurately capture clinical terminology semantics within a zero-cloud ChromaDB vector store.

反ハルシネーション閾値の確立

影響分析・編集コメントを表示

影響分析

編集コメント

記事に戻る

thumbnail: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/oncoagent-thumbnail.png

authors:

user: oncoagent-research

tags:

oncology

multi-agent

LangGraph

QLoRA

open-source

clinical-ai

healthcare

OncoAgent: プライバシーを保護する腫瘍学臨床意思決定支援のための二層型マルチエージェントフレームワーク

*技術的プレプリント · 2026 年 5 月 · OncoAgent リサーチグループ*

抄録

1. はじめに

検証済みのガイドラインに基づかない幻覚（ハルシネーション）に基づく推奨事項
プライバシーが敏感な病院環境におけるオンプレミスデプロイを妨げるクラウド API への依存
複雑な多疾患併存の症例において文脈飽和を起こしやすいモノリス型 LLM アーキテクチャ

OncoAgent は、以下の 3 つの中核原則に基づいて設計されています:

アーキテクチャの分解：臨床推論は、それぞれに制限された監査可能な機能を備えた 8 つの専門化された LangGraph ノードに分解されます。

根拠に基づく生成：すべてのモデル出力は、明示的な関連性ゲートを持つ 4 段階の検索パイプラインを通じて、キュレーションされたベクトル知識ベースにアンカーされています。

ハードウェア主権：完全な推論およびトレーニングスタックは、ROCm およびオープンソースフレームワークを使用して AMD Instinct MI300X でネイティブに実行され、データ流出なしでの病院展開を可能にします。

2. 関連研究

2.1 臨床用大規模言語モデルと意思決定支援

2.2 マルチエージェントアーキテクチャ

Claude Code パターン — 大規模言語モデルの推論から分離された決定論的安全装置

Hermes Agent パターン — セッションごとのメモリ隔離を備えた構造化ツール呼び出し

修正型 RAG (Shi et al., 2024) — ドキュメントの関連性評価とクエリ再構築

Reflexion (Shinn et al., 2023) — フィードバック強化による試行ループを通じた自己修正生成

2.3 医療分野における検索拡張生成 (Retrieval-Augmented Generation)

3. システムアーキテクチャ

3.1 概要

8 ノードのトポロジは以下の通りです：

Router → Ingestion → Corrective RAG → Specialist ↔ Critic → HITL Gate → Formatter → END

↓

Fallback → END

主要な特性：

5 つの条件付きエッジ

1 つのリフレクション再試行ループ（最大 2 回）

高複雑度または低信頼度の出力に対する必須のヒューマン・イン・ザ・ループ (HITL) 割り込み

症例の複雑さは、専門家の招致前に加重加算モデルを用いて定量化されます：

S = w_cancer + w_stage + w_mutations + w_treatment

ここで：

Factor

Condition

Weight

Cancer type

Rare

+0.40

Cancer type

Unknown primary

+0.30

Stage

Stage IV

+0.25

Stage

Stage III

+0.15

Mutations

≥2 identified

+0.30

Mutations

Single

+0.15

Prior treatment

Any keyword match

+0.10

決定境界： S ≥ 0.5 → Tier 2（Qwen 3.6-27B による深層推論） · S < 0.5 → Tier 1（Qwen 3.5-9B による高速トリアージ）

検証結果： KRAS および BRCA2 変異を有するステージ IV の膵癌症例において、正しく S = 0.80 が算出され、Tier 2 にルーティングされました。✅

臨床医はまた、ユーザーインターフェース（UI）を通じて手動で階層選択を上書きすることも可能です。

3.3 ドキュメント評価を伴う修正型 RAG

3.4 リフレクション安全ループ（クリティックノード）

クリティックノードは、あらゆる出力が HITL ゲートに到達する前に、3 層の検証カスケードを実行します：

フォーマットチェック — OncoCoT 出力スキーマへの構造的適合性を検証
セーフティチェック — 禁止された出力パターン（ガイドライン引用のない絶対用量、薬物相互作用の省略など）に対する決定論的ルールベースのスキャン
LLM 推論チェック — スペシャリストの推奨が取得された RAG コンテキストによって完全に支持されていることを確認

3.5 ヒューマン・イン・ザ・ループゲートとフォールバック

3.6 患者ごとのメモリ分離

4. 知識ベースの構築と RAG パイプライン

4.1 ガイドラインの取り込みとサニタイゼーション

4.2 医療埋め込みとベクトルストア

埋め込み: pritamdeka/S-PubMedBert-MS-MARCO — PubMed および MS-MARCO で非対称医療意味検索用にファインチューニング済み

ベクトルストア: ローカル ChromaDB 永続インデックス — クラウド不要、Zero-PHI (患者情報なし) に準拠

4.3 4 段階検索パイプライン

ステージ	コンポーネント	機能	設定
1. リコール	PubMedBERT Bi-Encoder	ワイドネット検索	トップ 15 候補
2. 距離ゲート	コサイン距離フィルタ	幻覚防止の下限	しきい値 = 0.10
3. リランキング	クロスエンコーダ (MS-MARCO MiniLM)	照合クエリとドキュメントの関連性	トップ 5 を返却
4. コンテキストトリミング	文字予算制限器	LLM コンテキストウィンドウ内に収める	最大 6,000 文字

NCCN コーパスに対する距離閾値の較正結果は以下の通りです：

医療クエリの距離：約 0.06–0.09

ドメイン外データの距離：約 0.11–0.15

ハード閾値：0.10

5. デュアルティア QLoRA 微調整

5.1 訓練コーパス：OncoCoT（266,854 サンプル）

ソース

タイプ

サンプル数

備考

PMC-Patients

実際の臨床症例

約 85,000

PubMed Central の患者報告書

Asclepius

実際の臨床データ

約 85,000

キュレーションされた医療 QA コーパス

OncoCoT Synthetic

合成データ（Qwen 3.6-27B）

96,941

MI300X で生成、時速約 6,800 サンプル・拒否率 0.65%

合計

—

266,854

90/10 の訓練/評価分割・SHA-256 ハッシュ化・重複除去済み

5.2 QLoRA 設定

パラメータ

ティア 1 (Qwen 3.5-9B)

ティア 2 (Qwen 3.6-27B)

デバイスごとのバッチサイズ

勾配累積

実効バッチサイズ

学習率

2×10⁻⁴

1×10⁻⁴

LoRA 階層 (*r*)

シーケンスパッキング

True, 2048 トークン

早期打ち切り

忍耐値 = 3

量子化

NF4 4-bit (NF4 4 ビット)

5.3 Unsloth を用いた AMD MI300X の最適化

元の HuggingFace transformers + PEFT パイプラインは、2 つの独立した問題により MI300X で失敗しました:

trl v0.24.0 の厳格な EOS 検証と Qwen3VLProcessor ラッパーとの間のトークン化競合

標準精度における目標の実効バッチサイズに対する VRAM (ビデオメモリ) の余裕不足

Unsloth's FastLanguageModel への移行により、両方の問題が同時に解決されました:

VRAM 削減: ピーク使用量が約 60% 減少（OOM から安定した約 64 GB に、192 GB デバイス上で）

訓練速度: 実効バッチサイズ 16 で約 2 倍の向上、ステップあたり約 16 秒に

AMD ROCm 固有の適応策として必要なのは:

1. Qwen3VLProcessor ラッパーではなく内部トークナイザーを渡す

trainer = SFTTrainer(tokenizer=model.get_tokenizer(), ...)

2. 互換性のない EOS の注入を防ぐ

training_args = SFTConfig(eos_token=None, ...)

3. ROCm 6.2/gfx942 用の AMD 固有の bitsandbytes

pip install bitsandbytes --find-links <amd-continuous-release-wheel>

4. BF16 のワークアラウンド（ハードウェアはサポートしているにもかかわらず、ROCm では is_bf16_supported() が False を返す）

training_args = TrainingArguments(fp16=True, ...)

最終的なデプロイではネイティブの BF16 を使用:

model = AutoModelForCausalLM.from_pretrained(..., torch_dtype=torch.bfloat16)

5.4 シーケンスパッキングとスループットの飛躍

6. セーフティとプライバシーフレームワーク

6.1 ゼロ PHI ポリシー

6.2 レイヤー別安全アーキテクチャ

システムの安全性保証は、4 つの独立したレイヤーで強制されます。単一のレイヤーでの失敗が全体のセキュリティ姿勢を損なうことはありません。

レイヤー	メカニズム	対応する課題
L1: 検索ゲート	ディスタンスゲート（コサイン閾値 0.10）	ドメイン外のハルシネーション
L2: 信頼度ゲート	RAG 信頼度スコア < 0.3 → ブロック	低品質な検索根拠
L3: リフレクション批評家	フォーマット + 安全性 + LLM 含意（最大 2 回の再試行）	サポートされていない、または不安全な専門家の出力
L4: HITL ゲート	Tier 2 / 標的ケースにおける必須の臨床医による中断	専門家の判断を要する高複雑度ケース

7. クリニカルインターフェース

6.2 レイヤー別安全アーキテクチャ

システムの安全性保証は、4 つの独立したレイヤーで強制されます。単一のレイヤーでの失敗が全体のセキュリティ姿勢を損なうことはありません。

レイヤー	メカニズム	対応する課題
L1: 検索ゲート	ディスタンスゲート（コサイン閾値 0.10）	ドメイン外のハルシネーション
L2: 信頼度ゲート	RAG 信頼度スコア < 0.3 → ブロック	低品質な検索根拠
L3: リフレクション批評家	フォーマット + 安全性 + LLM 含意（最大 2 回の再試行）	サポートされていない、または不安全な専門家の出力
L4: HITL ゲート	Tier 2 / 標的ケースにおける必須の臨床医による中断	専門家の判断を要する高複雑度ケース

7. クリニカルインターフェース"}

左サイドバー：セッション制御、KPI タイル、証拠ソースタブ
メインチャットエリア：各ノードが完了するたびに、ライブでエージェントによる推論の更新が表示されます

8. 結果

コンポーネント	メトリック	値
知識ベース	取り込んだガイドライン数	70+
	パースされた PDF 数	< 60 秒で 138 件
	インデックスパースエラー	0
CRAG パイプライン	ドキュメント評価成功率（修正後）	100%
	RAG 信頼度スコア（子宮がんテスト）	2.3+（修正前は 0.0）
	並列評価レイテンシ（3〜5 ドキュメント）	< 5 秒
複雑性ルーター	IV 期膵臓癌 + KRAS + BRCA2	スコア = 0.80 → Tier 2 ✅
トレーニング（Tier 1, 9B）	完全な 266k サンプルの学習時間	~50 分（推定 5 時間と比較）

定常スループット

~11.3–16 秒/ステップ

GPU 利用率 (MI300X)

~ピーク時 70%

VRAM 利用率 (Unsloth)

~64 GB / 192 GB

チェックポイント-1000 時点のトレーニング損失

~0.05

合成データのスループット (MI300X vs. API)

6,800 vs. 120 事例/時 (56 倍 ↑)

合成コーパス拒否率

0.65%

グラフトポロジー

コンパイル済みノードの検証結果

8 / 8

モジュールテストスイート合格数

6 / 6

トリアージ中のブラウザタイムアウト

UI レンダリングレイテンシ

< 200 ms

9. 考察

9.1 クリニカル要件としてのハードウェア主権

9.2 スループットの画期的進展

9.3 限界

いくつかの限界を認識する必要があります：

トレーニングコーパスは、約 36% が合成データによって生成されたケースに依存しています。臨床的精度の検証については、認定腫瘍医の判断との比較がまだ大規模には実施されていません。

現在の知識ベースは主に英語で記述された NCCN ガイドラインをカバーしており、ESMO や非英語圏の臨床コーパスは今後の課題として残されています。

Tier 1 アダプターは、より長い軌跡におけるチェックポイント 1000 に相当します。完全な収束と、下流の臨床ベンチマーク評価（MedQA、USMLE スタイルの腫瘍学サブセット）は、今後のリリースで予定されています。

10. 結論

すべてのコード、アダプター重み、および OncoCoT 合成コーパスは、Hugging Face Spaces および GitHub で公開されます。

References

Singhal, K. et al. (2023). Large language models encode clinical knowledge. Nature, 620, 172–180.

Nori, H. et al. (2023). Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. arXiv:2311.16452.

Wang, L. et al. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6), 186345.

Shi, W. et al. (2024). Corrective Retrieval Augmented Generation. arXiv:2401.15884.

Shinn, N. et al. (2023). Reflexion: Language agents with verbal reinforcement learning. NeurIPS 2023.

Nogueira, R. and Cho, K. (2019). Passage Re-ranking with BERT. arXiv:1901.04085.

Gao, L. 他 (2022). 関連性ラベルなしの精密なゼロショット密検索。arXiv:2212.10496.

Hu, E.J. 他 (2021). LoRA: 大規模言語モデルの低ランク適応。arXiv:2106.09685.

Dettmers, T. 他 (2023). QLoRA: 量子化された大規模言語モデルの効率的なファインチューニング。NeurIPS 2023.

Han, S. 他 (2024). LangGraph: LLM を用いた状態保持型マルチアクターアプリケーションの構築。LangChain Technical Report.

原文を表示

Back to Articles

thumbnail: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/oncoagent-thumbnail.png

authors:

user: oncoagent-research

tags:

oncology

multi-agent

LangGraph

QLoRA

open-source

clinical-ai

healthcare

OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support

*Technical preprint · May 2026 · OncoAgent Research Group*

Abstract

Keywords: clinical decision support, oncology AI, multi-agent systems, retrieval-augmented generation, QLoRA, AMD ROCm, open-source healthcare AI, HITL safety, LangGraph, Corrective RAG

1. Introduction

AI-assisted clinical decision support systems hold transformative potential for closing this gap, yet most commercially available systems fail in three critical ways:

Hallucinated recommendations not grounded in validated guidelines

Cloud API dependency that precludes on-premises deployment in privacy-sensitive hospital environments

Monolithic LLM architectures prone to context saturation under complex multi-comorbidity presentations

OncoAgent is designed around three core principles:

Architectural decomposition: Clinical reasoning is decomposed across eight specialised LangGraph nodes, each with a bounded, auditable function.

Grounded generation: All model outputs are anchored to a curated vector knowledge base through a four-stage retrieval pipeline with explicit relevance gating.

Hardware sovereignty: The full inference and training stack runs natively on AMD Instinct MI300X using ROCm and open-source frameworks — enabling hospital deployment without data exfiltration.

2. Related Work

2.1 Clinical LLMs and Decision Support

2.2 Multi-Agent Architectures

Decomposed multi-agent systems have emerged as a principled approach to complex reasoning tasks. OncoAgent synthesises four canonical SOTA patterns:

Claude Code pattern — deterministic safety harnesses separated from LLM reasoning

Hermes Agent pattern — structured tool-calling with per-session memory isolation

Corrective RAG (Shi et al., 2024) — document relevance grading and query reformulation

Reflexion (Shinn et al., 2023) — self-correcting generation via feedback-augmented retry loops

2.3 Retrieval-Augmented Generation in Medicine

3. System Architecture

3.1 Overview

The 8-node topology is:

code

Router → Ingestion → Corrective RAG → Specialist ↔ Critic → HITL Gate → Formatter → END
                                                                   ↓
                                                               Fallback → END

Key properties:

5 conditional edges

1 reflexion retry loop (max 2 iterations)

1 mandatory HITL interrupt for high-complexity or low-confidence outputs

3.2 Complexity Router and Model Tiering

Case complexity is quantified using a weighted additive model prior to specialist invocation:

code

S = w_cancer + w_stage + w_mutations + w_treatment

Where:

Factor

Condition

Weight

Cancer type

Rare

+0.40

Cancer type

Unknown primary

+0.30

Stage

Stage IV

+0.25

Stage

Stage III

+0.15

Mutations

≥2 identified

+0.30

Mutations

Single

+0.15

Prior treatment

Any keyword match

+0.10

Decision boundary: S ≥ 0.5 → Tier 2 (Qwen 3.6-27B deep reasoning) · S < 0.5 → Tier 1 (Qwen 3.5-9B speed triage)

Validation: A Stage IV pancreatic carcinoma case with KRAS + BRCA2 mutations correctly produced S = 0.80, routing to Tier 2. ✅

Clinicians may also manually override the tier selection through the UI.

3.3 Corrective RAG with Document Grading

After migrating from Qwen 3.5 to Qwen 2.5 Instruct for the grading step, success rate improved from 0% → 100%, with RAG confidence score reaching 2.3+ on uterine cancer triage tests.

3.4 Reflexion Safety Loop (Critic Node)

The Critic node runs a three-layer validation cascade before any output reaches the HITL gate:

Formatting check — validates structural compliance with the OncoCoT output schema

Safety check — deterministic rule-based scan for prohibited output patterns (absolute dosing without guideline citation, drug interaction omissions, etc.)

LLM entailment check — verifies that the Specialist's recommendation is fully supported by the retrieved RAG context

3.5 Human-in-the-Loop Gate and Fallback

3.6 Per-Patient Memory Isolation

4. Knowledge Base Construction and RAG Pipeline

4.1 Guideline Ingestion and Sanitisation

4.2 Medical Embeddings and Vector Store

Standard general-purpose embedding models (e.g., all-MiniLM-L6-v2) were rejected due to poor clinical terminology semantics. OncoAgent uses:

Embeddings: pritamdeka/S-PubMedBert-MS-MARCO — fine-tuned on PubMed and MS-MARCO for asymmetric medical semantic search

Vector store: Local ChromaDB persistent index — zero-cloud, Zero-PHI compliant

4.3 Four-Stage Retrieval Pipeline

Stage

Component

Function

Configuration

1. Recall

PubMedBERT Bi-Encoder

Wide-net retrieval

top-15 candidates

2. Distance Gate

Cosine Distance Filter

Anti-hallucination floor

threshold = 0.10

3. Re-Ranking

Cross-Encoder (MS-MARCO MiniLM)

Joint query-document relevance

top-5 returned

4. Context Trimming

Character-Budget Limiter

Fit within LLM context window

max 6,000 chars

Distance threshold calibration against the NCCN corpus established:

Medical-query distances: ~0.06–0.09

Out-of-domain distances: ~0.11–0.15

Hard threshold: 0.10

5. Dual-Tier QLoRA Fine-Tuning

5.1 Training Corpus: OncoCoT (266,854 Samples)

Source

Type

Samples

Notes

PMC-Patients

Real clinical cases

~85,000

PubMed Central patient reports

Asclepius

Real clinical data

~85,000

Curated medical QA corpus

OncoCoT Synthetic

Synthetic (Qwen 3.6-27B)

96,941

Generated on MI300X at ~6,800 cases/hr · rejection rate 0.65%

Total

—

266,854

90/10 train/eval split · SHA-256 hashed · deduplicated

All cases use ChatML template for Qwen compatibility. Thinking tokens were disabled (chat_template_kwargs: {enable_thinking: False}) to prevent JSON parse corruption.

5.2 QLoRA Configuration

Parameter

Tier 1 (Qwen 3.5-9B)

Tier 2 (Qwen 3.6-27B)

Per-device batch size

Gradient accumulation

Effective batch size

Learning rate

2×10⁻⁴

1×10⁻⁴

LoRA rank (*r*)

Sequence packing

True, 2048 tokens

Early stopping

Patience = 3

Quantisation

NF4 4-bit

5.3 AMD MI300X Optimisation with Unsloth

The original HuggingFace transformers + PEFT pipeline failed on the MI300X due to two independent issues:

Tokenisation conflicts between trl v0.24.0 strict EOS validation and the Qwen3VLProcessor wrapper

Insufficient VRAM headroom for target effective batch sizes under standard precision

Migration to Unsloth's FastLanguageModel resolved both simultaneously:

VRAM reduction: ~60% drop in peak usage (from OOM to stable ~64 GB on 192 GB device)

Training speed: ~2× improvement to ~16 s/step at effective batch 16

AMD ROCm-specific adaptations required:

code

# 1. Pass inner tokenizer, not the Qwen3VLProcessor wrapper
trainer = SFTTrainer(tokenizer=model.get_tokenizer(), ...)

# 2. Prevent incompatible EOS injection
training_args = SFTConfig(eos_token=None, ...)

# 3. AMD-specific bitsandbytes for ROCm 6.2/gfx942
# pip install bitsandbytes --find-links <amd-continuous-release-wheel>

# 4. BF16 workaround (is_bf16_supported() returns False on ROCm despite hardware support)
training_args = TrainingArguments(fp16=True, ...)
# Final deployment uses native BF16:
model = AutoModelForCausalLM.from_pretrained(..., torch_dtype=torch.bfloat16)

5.4 Sequence Packing and Throughput Breakthrough

Checkpoint-1000 results: Tier 1 adapter trained for 1,339 steps · training loss ≈ 0.05 · adapter size 187 MB · verified against 11-file manifest including adapter_model.safetensors, adapter_config.json, and tokenizer.json.

6. Safety and Privacy Framework

6.1 Zero-PHI Policy

This ensures that no PHI reaches any downstream LLM call — local or remote — and satisfies HIPAA de-identification requirements by design rather than policy.

6.2 Layered Safety Architecture

The system's safety guarantees are enforced at four independent layers. A failure at any single layer does not compromise the overall posture.

Layer

Mechanism

Addresses

L1: Retrieval Gate

Distance Gate (cosine threshold 0.10)

Out-of-domain hallucinations

L2: Confidence Gate

RAG confidence score < 0.3 → block

Low-quality retrieval grounding

L3: Reflexion Critic

Formatting + safety + LLM entailment (max 2 retries)

Unsupported or unsafe Specialist outputs

L4: HITL Gate

Mandatory clinician interrupt for Tier 2 / flagged cases

High-complexity cases requiring expert judgment

7. Clinical Interface

The OncoAgent UI is implemented as a real-time streaming Gradio application in a ChatGPT-style conversational layout. It features:

Left sidebar: Session controls, KPI tiles, evidence source tabs

Main chat area: Live agentic reasoning updates as each node completes

The rag_confidence score and retrieved source count are prominently surfaced, giving clinicians immediate visibility into the quality of guideline grounding behind each recommendation.

8. Results

Component

Metric

Value

Knowledge Base

Guidelines ingested

70+

PDFs parsed

138 in < 60 s

Index parsing errors

CRAG Pipeline

Document grading success rate (post-fix)

100%

RAG confidence score (uterine cancer test)

2.3+ (was 0.0 pre-fix)

Parallel grading latency (3–5 docs)

< 5 s

Complexity Router

Stage IV pancreatic + KRAS + BRCA2

Score = 0.80 → Tier 2 ✅

Training (Tier 1, 9B)

Full 266k-sample training time

~50 min (vs. 5 hr estimate)

Steady-state throughput

~11.3–16 s/step

GPU utilisation (MI300X)

~70% peak

VRAM utilisation (Unsloth)

~64 GB / 192 GB

Training loss at checkpoint-1000

~0.05

Synthetic data throughput (MI300X vs. API)

6,800 vs. 120 cases/hr (56× ↑)

Synthetic corpus rejection rate

0.65%

Graph Topology

Compiled nodes verified

8 / 8

Module test suites passed

6 / 6

Browser timeouts during triage

UI rendering latency

< 200 ms

9. Discussion

9.1 Hardware Sovereignty as a Clinical Requirement

9.2 The Throughput Breakthrough

9.3 Limitations

Several limitations warrant acknowledgement:

The training corpus relies on approximately 36% synthetically generated cases. Clinical accuracy validation against board-certified oncologist judgments has not yet been performed at scale.

The current knowledge base covers NCCN guidelines primarily in English; ESMO and non-English clinical corpora remain for future work.

The Tier 1 adapter represents checkpoint-1000 of a potentially longer trajectory; full convergence and downstream clinical benchmark evaluation (MedQA, USMLE-style oncology subsets) are planned for subsequent releases.

10. Conclusion

All code, adapter weights, and the OncoCoT synthetic corpus will be released publicly on Hugging Face Spaces and GitHub.

References

Singhal, K. et al. (2023). Large language models encode clinical knowledge. Nature, 620, 172–180.

Nori, H. et al. (2023). Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. arXiv:2311.16452.

Wang, L. et al. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6), 186345.

Shi, W. et al. (2024). Corrective Retrieval Augmented Generation. arXiv:2401.15884.

Shinn, N. et al. (2023). Reflexion: Language agents with verbal reinforcement learning. NeurIPS 2023.

Nogueira, R. and Cho, K. (2019). Passage Re-ranking with BERT. arXiv:1901.04085.

Gao, L. et al. (2022). Precise Zero-Shot Dense Retrieval without Relevance Labels. arXiv:2212.10496.

Hu, E.J. et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685.

Dettmers, T. et al. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. NeurIPS 2023.

Han, S. et al. (2024). LangGraph: Building stateful multi-actor applications with LLMs. LangChain Technical Report.

*OncoAgent is intended as a clinical decision support tool. All outputs require review by licensed medical professionals prior to any clinical application.*

この記事をシェア

MarkTechPost2026年7月4日 06:25

lift-pdf を用いたスキーマ指向の請求書インテリジェンスパイプライン設計：経理処理・検証・帳簿生成のための抽出手法

TechCrunch AI2026年7月4日 03:43

ブラウザ戦争はもはや検索が主役ではない — Chrome や Safari に代わる最良の代替案

The Verge AI2026年7月3日 20:49

ミッドジャーニーの医療用スキャナー、多くの疑問を残した裏側レポート

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

キーポイント

影響分析

編集コメント

OncoAgent: プライバシーを保護する腫瘍学臨床意思決定支援のための二層型マルチエージェントフレームワーク

抄録

1. はじめに

2. 関連研究

2.1 臨床用大規模言語モデルと意思決定支援

2.2 マルチエージェントアーキテクチャ

2.3 医療分野における検索拡張生成 (Retrieval-Augmented Generation)

3. システムアーキテクチャ

3.1 概要

3.3 ドキュメント評価を伴う修正型 RAG

3.4 リフレクション安全ループ（クリティックノード）

3.5 ヒューマン・イン・ザ・ループゲートとフォールバック

3.6 患者ごとのメモリ分離

4. 知識ベースの構築と RAG パイプライン

4.1 ガイドラインの取り込みとサニタイゼーション

4.2 医療埋め込みとベクトルストア

4.3 4 段階検索パイプライン

5. デュアルティア QLoRA 微調整

5.1 訓練コーパス：OncoCoT（266,854 サンプル）

5.2 QLoRA 設定

5.3 Unsloth を用いた AMD MI300X の最適化

1. Qwen3VLProcessor ラッパーではなく内部トークナイザーを渡す

2. 互換性のない EOS の注入を防ぐ

3. ROCm 6.2/gfx942 用の AMD 固有の bitsandbytes

pip install bitsandbytes --find-links <amd-continuous-release-wheel>

4. BF16 のワークアラウンド（ハードウェアはサポートしているにもかかわらず、ROCm では is_bf16_supported() が False を返す）

最終的なデプロイではネイティブの BF16 を使用:

5.4 シーケンスパッキングとスループットの飛躍

6. セーフティとプライバシーフレームワーク

6.1 ゼロ PHI ポリシー

6.2 レイヤー別安全アーキテクチャ

7. クリニカルインターフェース

6.2 レイヤー別安全アーキテクチャ

7. クリニカルインターフェース"}

8. 結果

9. 考察

9.1 クリニカル要件としてのハードウェア主権

9.2 スループットの画期的進展

9.3 限界

10. 結論

References

OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support

Abstract

1. Introduction

2. Related Work

2.1 Clinical LLMs and Decision Support

2.2 Multi-Agent Architectures

2.3 Retrieval-Augmented Generation in Medicine

3. System Architecture

3.1 Overview

3.2 Complexity Router and Model Tiering

3.3 Corrective RAG with Document Grading

3.4 Reflexion Safety Loop (Critic Node)

3.5 Human-in-the-Loop Gate and Fallback

3.6 Per-Patient Memory Isolation

4. Knowledge Base Construction and RAG Pipeline

4.1 Guideline Ingestion and Sanitisation

4.2 Medical Embeddings and Vector Store

4.3 Four-Stage Retrieval Pipeline

5. Dual-Tier QLoRA Fine-Tuning

5.1 Training Corpus: OncoCoT (266,854 Samples)

5.2 QLoRA Configuration

5.3 AMD MI300X Optimisation with Unsloth

5.4 Sequence Packing and Throughput Breakthrough

6. Safety and Privacy Framework

6.1 Zero-PHI Policy

6.2 Layered Safety Architecture

7. Clinical Interface

8. Results

9. Discussion

9.1 Hardware Sovereignty as a Clinical Requirement

9.2 The Throughput Breakthrough

9.3 Limitations

10. Conclusion

References

関連記事

キーポイント