Hugging Face Blog·2026年6月23日 09:00·約19分で読める

Transformers.js で提案されたクロスオリジンストレージ API の実験

#Transformers.js #エッジ AI #Web ブラウザ #データ永続化 #Hugging Face

TL;DR

Hugging Face は、Transformers.js 環境における提案中のクロスオリジンストレージ API の実装可能性を実験しており、ブラウザ上での大規模モデルのデータ永続化に向けた技術的検証を進めている。

AI深層分析2026年6月24日 03:02

注目/ 5段階

深度40%

キーポイント

クロスオリジンストレージ API の実験

Hugging Face が Transformers.js 環境で、現在提案段階にあるクロスオリジンストレージ API の実装可能性をテストしていることが報告されています。

ブラウザ内でのデータ永続化の課題解決

この実験は、Web ブラウザ上で動作する AI モデルが、セッションを超えてモデルやデータを安全に保存・管理するための基盤技術を探るものです。

Transformers.js の機能拡張

ブラウザネイティブな機械学習ライブラリとしての Transformers.js が、より堅牢なデータ管理機能を備えることで、エッジ AI アプリケーションの用途が広がる可能性があります。

影響分析・編集コメントを表示

影響分析

この取り組みは、ブラウザ上で動作する大規模言語モデル（LLM）や画像生成モデルが、ユーザーのデータを安全かつ永続的に保存できる基盤を提供するものであり、エッジ AI アプリケーションの開発におけるデータ管理の課題解決に寄与します。標準化されたストレージ API の実装が進めば、より複雑で状態を保持する Web ベースの AI ツールの開発が容易になるでしょう。

編集コメント

ブラウザネイティブな AI の普及には、モデルやユーザーデータの安全な永続化が不可欠であり、そのための標準 API への対応は重要な一歩です。

記事一覧に戻る

(これは、Google の Chrome チームに所属する開発者リレーションズエンジニア Thomas Steiner 氏によるゲスト投稿です。)

Transformers.js は、Web 開発者がタスク固有のパイプラインを通じて、Transformer の力を Web アプリで簡単に活用できる手段を提供します。ブラウザ内で推論を実行するには、開発者は pipeline() のインスタンスを作成し、そのパイプラインを使用したいタスクを指定します。具体的な例として、以下のスニペットは自動音声認識 (ASR: Automatic Speech Recognition) パイプラインを設定する方法を示しています。

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.2.0';

const asr = await pipeline(

'automatic-speech-recognition',

'Xenova/whisper-tiny.en',

{ device: 'webgpu' },

);

const result = await asr('jfk.wav');

console.log(result);

キャッシュの課題

ソースコードを確認すると、モデルとして Xenova/whisper-tiny.en を指定していることがわかりますが、これは一般的な英語音声認識タスクにとって非常に優れた選択です。実際、リンクされた抜粋[1]（https://github.com/huggingface/transformers.js/blob/bc9cf7400f4f2c8695016699f879e31026ff0313/packages/transformers/src/pipelines/index.js#L151-L158）によると、Transformers.js のデフォルトモデル解決において、これはまさにデフォルトのモデルとなっています。

モデルリソース

ブラウザでこの例 [2]（https://googlechrome.github.io/samples/transformersjs-automatic-speech-recognition/index.html）を実行すると、Transformers.js が自動的に関連するモデルリソースと Wasm ファイルのダウンロードおよびキャッシュ処理を行います。以下のスクリーンショットは、アプリにアクセスした後の Chrome DevTools の Cache storage セクションを示しています。ページを再読み込みすると、リソースは Cache API から提供され、モデルからの結果返却もほぼ瞬時に行われます。

[1]: https://github.com/huggingface/transformers.js/blob/bc9cf7400f4f2c8695016699f879e31026ff0313/packages/transformers/src/pipelines/index.js#L151-L158

[2]: https://googlechrome.github.io/samples/transformersjs-automatic-speech-recognition/index.html

しかし、Xenova/whisper-tiny.en は人気のあるモデルであり（前述の通り、Transformers.js における ASR デフォルトモデルとしても知られています）、あなたが訪れるアプリがこれを利用するのは一つだけではないことは容易に想像がつくでしょう。この状況をシミュレートするために、前回の例と同じアプリを異なるオリジンから提供したものを示します。この異なるオリジンのアプリを訪れると、ブラウザはほぼ即座に利用可能になるのではなく、前回のものと同じバイト単位で同一のモデルリソースであっても、再度ダウンロードしてキャッシュする必要があります。この toy example（玩具的な例）ですら、Chrome DevTools の Application パネル内の Storage セクションで確認できるように、重複したダウンロードと保存で合計 177 MB に達します。これがすぐに膨れ上がっていくことは容易に想像がつくでしょう。

Wasm ランタイムリソース

しかし、状況はさらに悪化します。トイ・サンプルにセカンドのパイプラインとして感情分析を追加してみましょう。デフォルトでは Xenova/distilbert-base-uncased-finetuned-sst-2-english モデルが使用されます (モデルを指定しない場合、Transformers.js のデフォルトのモデル解決機構が自動的にこれを選択します)。

const classifier = await pipeline('sentiment-analysis');

const sentiment = await classifier(result.text);

pre.append('\n\n' + JSON.stringify(sentiment, null, 2));

これらは全く異なる 2 つの AI モデルですが、Transformers.js の基盤となっている ONNX ランタイムライブラリ from the underlying ONNX Runtime library から提供される、同じ 4,733 kB の ort-wasm-simd-threaded.asyncify.wasm WebAssembly (Wasm) ランタイムファイルに依存しています。異なるオリジンの拡張デモを開くと、Network タブで Wasm ランタイムも再度ダウンロードされキャッシュされていることが確認できます。

したがって、同じ AI モデルを共有していないアプリを実行していても、ブラウザはすでに存在する共有 Wasm リソースに対して冗長なリクエストを行い、さらにそれを再度キャッシュすることで、ハードディスクの容量を消費してしまいます。

キャッシュ分離

AI モデルリソースの提供

デフォルトでは、AI モデルリソースは Hugging Face Hub から、最終的には Hugging Face CDN から提供されます。ブラウザは https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/config.json のようなリソースに対してリクエストを送信しますが、これは最終的に CDN URL である https://huggingface.co/api/resolve-cache/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/0b6928efcb76139cae2c6881d49cda67fe119f42/config.json?%2FXenova%2Fdistilbert-base-uncased-finetuned-sst-2-english%2Fresolve%2Fmain%2Fconfig.json=&etag=%223c36342ef1f74de2797d667c68c6b7b988d0b87c%22 へリダイレクトされます（本ケースの場合）。

Wasm ランタイムリソースの提供

Wasm ランタイムリソースは、デフォルトでは jsDelivr CDN から提供されます。例えば、執筆時点では ort-wasm-simd-threaded.asyncify.wasm は https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm から取得されます。

ここで、異なるオリジンで実行されているアプリであっても、最終的に同じ CDN URL からリソースを提供しているなら、キャッシュは問題ないはずだと考えるかもしれません。しかし残念ながら、これはブラウザにおけるキャッシュの仕組みが長年採用されてきた方法ではありません。Gaining security and privacy by partitioning the cache という記事で詳細が説明されていますが、本質的にはキャッシュはオリジンごとに分離されており、タイミング攻撃（timing attacks）を防ぐためです：Web サイトが HTTP リクエストに応答するまでの時間が、ブラウザが過去に同じリソースにアクセスしたことがあるかどうかを露呈し、それがセキュリティやプライバシーの漏洩につながる脆弱性となるからです。

Chrome の実装

具体的な実装はブラウザによって異なりますが、Chrome ではキャッシュされたリソースはリソース URLに加えて、ネットワーク分離キーを使用してキー付けされます。ネットワーク分離キーはトップレベルサイトと現在のフレームサイトで構成されます。https://googlechrome.github.io および https://rawcdn.rawgit.net にホストされていた先ほどの玩具例を考えてみましょう。これらがどちらも https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm から Wasm ランタイムを使用している場合、そのキャッシュキーは以下の表のようになります。

Network Isolation Key

Resource URL

Top-level site

Current-frame site

https://googlechrome.github.io

https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm

https://rawcdn.rawgit.net

https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm

したがって、リソース URL が完全に一致していても、ネットワーク分離キーが一致しないためキャッシュヒットせず、結果として重複ダウンロードと重複ストレージが発生します。これがクロスオリジンストレージ提案が目指す課題です。

クロスオリジンストレージ API の登場

💡 注意: Cross-Origin Storage API はまだ初期段階の提案であり、確定した仕様ではありません。提案された API はいまだどのブラウザでもネイティブ実装されていませんが、それを待たずに実験を開始することも可能です。Cross-Origin Storage 拡張機能をインストールして、すべてのページに navigator.crossOriginStorage のポリフィルを注入し、完全なフローをテストしてください。

提案されている Cross-Origin Storage (COS) API は、Web アプリが URL ではなく暗号化ハッシュによって識別される大規模ファイルを、オリジン境界を超えて保存および取得できる専用の navigator.crossOriginStorage インターフェイスを導入します。

この最後の点である「暗号化ハッシュ」が鍵となります。COS はファイルを URL やオリジンではなく、そのハッシュ値によって識別するため、https://googlechrome.github.io を訪問中にダウンロードした ort-wasm-simd-threaded.asyncify.wasm Wasm ランタイムは、https://rawcdn.rawgit.net がこれから要求しようとしているものとも同一と認識されます。どちらのオリジンから取得されたかに関わらずです。以下のコードスニペットは基本的なフローを示しています。

`const hash = {

algorithm: 'SHA-256',

value: '8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4',

};`

try {

const handle = await navigator.crossOriginStorage.requestFileHandle(hash);

// Cache hit! Get the file as a Blob and use it directly.

const fileBlob = await handle.getFile();

} catch (err) {

// Cache miss. Download from network, then store for next time.

const fileBlob = await fetch('https://cdn.jsdelivr.net/.../ort-wasm-simd-threaded.asyncify.wasm')

.then(r => r.blob());

const handle = await navigator.crossOriginStorage.requestFileHandle(

hash,

{ create: true, origins: '*' },

);

const writableStream = await handle.createWritable();

await writableStream.write(fileBlob);

await writableStream.close();

}

リソースがクロスオリジンストレージ (COS) に存在する場合は、FileSystemFileHandle が返され、そこから getFile() を介して直接 Blob として読み込むことができます（結果得られる File は Blob から継承されます）。リソースが COS に存在しない場合は、ネットワークからフォールバックし、そのリソースを COS へ書き込みます。これにより、次回の必要時に（あなたのアプリである場合もあれば、全く異なるオリジン上の別の無関係なアプリである場合も含め）再利用可能になります。

この API は、おそらく Origin Private File System (OPFS) API からご存知の FileSystemDirectoryHandle.getFileHandle() 関数（File System Standard に基づく）を意識して設計されています。ハッシュパラメータは、OPFS における名前のパラメータと同じ役割を果たし、リソースを一意に識別します。options.create フラグも同様に機能し、読み取り専用アクセスの場合は省略または false、書き込みを行う場合は true と設定されます。

誰が何を閲覧できるかを制御する

すべてのリソースをグローバルに共有する必要はありません。COS では、ファイルを保存する際に origins オプションを通じて、開発者が可視性を精密に制御できます。

origins: '*' を設定すると、ファイルはグローバルに利用可能になります。任意のオリジンがハッシュによってそのファイルを検出できます。これは、Web 上のすべてのアプリが単一のキャッシュコピーから恩恵を受けることを目的とした AI モデルリソースや、Transformers.js の例における Wasm ランタイムにとって適切な選択肢です。

origins: ['https://write.example.com', 'https://calculate.example.com'] のように特定のオリジンのリストを渡すことで、アクセスを指定されたサイトのみへ制限できます。これは、他者には発見されないことが望ましい、自社ドメイン間で共有される独自リソース（例えば、商用オフィススイートで使用される独自の校正 AI モデルなど）に対して効果的に機能します。

オリジンを完全に省略すると、ファイルは同じサイトのオリジンのみが利用可能になります。これは組織内のすべてのサブドメイン間で共有されるリソースに対する合理的なデフォルトですが、組織の境界を越えて利用することを意図したものではありません。

重要なルールとして、可視性はアップグレードすることはできますが、ダウングレードすることはできません。ファイルがすでにグローバルに公開されている場合、後から制限されたオリジンリストで保存しようとする試みは静かに無視されます。これにより、悪意のあるアクターがパブリックリソースを再保存してその利用範囲を狭めることが防止されます。逆のケースも可能です：最初に制限されたオリジンリストで保存されたファイルは、後からより寛容な設定にアップグレードできます。元の保存者だけでなく、任意のサイトが同じハッシュ（ハッシュは秘密ではありません）に対して create: true とより広いオリジン値を指定して requestFileHandle() を呼び出すことができ、ブラウザがハッシュが一致することを確認すれば、その時点からリソースはより広い聴衆に利用可能になります。ただし、アップグレードを行うサイトは必ず返されたハンドルを通じてファイル全体を書き込む必要があります。この要件は、サイトがアップグレード経路を悪用して特定のファイルが既に COS に保存されているかどうかを検出するサイドチャネルとして利用することを防ぐために存在します。

デザインによる整合性保証

COS の微妙だが重要な性質の一つは、ブラウザがファイルの書き込み時にハッシュを検証することです。書き込むデータが宣言されたハッシュと一致しない場合、書き込みはエラーで失敗します。これにより整合性チェックが自動的に行われ、COS からファイルを読み取るアプリケーションは、期待したバイト列を正確に取得できていることを確信できます。これは、ネットワークダウンロード後に自分でハッシュを計算した場合と同じ保証を得られることになります。

この仕組みは Transformers.js のシナリオにおいて二重に有用であることがわかりました。現在、モデルの重みをダウンロードした後、ほとんどのアプリケーションには、CDN が正しいバイト列を提供したことを実用的に検証する方法がありません。COS を使用すれば、ストア内のすべてのファイルは書き込み時に暗黙的に検証され、その出所が公式の Hugging Face CDN であってもランダムなサイトのセルフホストミラーであっても関係ありません。

プライバシーを犠牲にせずユーティリティを維持する

もちろん、クロスオリジンの共有キャッシュは、パーティション化された HTTP キャッシュとは逆の同じ疑問を提起します：どのサイトでもハッシュによってファイルの存在を検索できるなら、攻撃者がゲームエンジン Wasm モジュールがキャッシュされているかどうかを確認することで、ユーザーの閲覧履歴について何かを学習できてしまうのではないか？

COS はこの問題に対し、2 つの補完的なメカニズムで対応しています。

第一に、origins フィールドです：グローバルに検索可能にしてはならない独自リソースは、origin: '*' で保存すべきではありません。これは開発者教育を通じて、適切な場合に常に検討するよう開発者に促されています。

第二に、可用性ゲート機能です。グローバルに宣言されたファイルであっても、ブラウザは十分な数の異なるオリジンでそのファイルが検出されていない場合、ファイルの存在を確認するのを抑制することがあります。1 つまたは 2 サイトでのみ出現するファイルはクロスサイト識別子として機能し得るため、物理的にディスク上に何があるかに関わらず、ブラウザはそのファイルを存在しないものとして扱うエラーを返す可能性があります。Chrome チームでは、稀なリソースが引き起こす可能性のあるプライバシー漏洩を意識しており、キャッシュできる正確なリソースを制限することでこれを緩和する計画を立てています。具体的な緩和策はまだ詳細化されている段階です。

重要なのは、これはエラーが決定論的な答えではないということです。「保存されていない」ことを意味する場合もあれば、「保存されているがブラウザはそれを教えてくれない」ことを意味する場合もあります。アプリケーションは常に同じ方法で対処すべきであり、ネットワークへのフォールバックを行う必要があります。

Transformers.js の例におけるこの意味

以前のおもちゃの例に戻りましょう：ort-wasm-simd-threaded.asyncify.wasm ランタイムは 4,733 kB のサイズで、使用する AI モデルに関わらず、Transformers.js を搭載したすべてのアプリによって共有されます。COS（Cross-Origin Storage）を使用すると、最初にロードするアプリが一度だけダウンロードし、SHA-256 ハッシュ値をキーとして「origin: '*'」の条件で保存します。その後、https://googlechrome.github.io 上にあるか、https://rawcdn.rawgit.net 上にあるか、あるいは他のオリジンにあるかに関わらず、すべてのアプリが COS から即座にそのリソースを検出できます。重複する Whisper モデルの重みデータ 177 MB も同じ仕組みです：Xenova/whisper-tiny.en は一度だけダウンロードされ、2 回目のアクセス時にハッシュで認識され、COS からミリ秒単位で提供されます。もちろん、Xenova/distilbert-base-uncased-finetuned-sst-2-english についても同様のことが起こります。

Transformers.js 自体もすでにライブラリレベルで COS API のパイロット運用を行っています。Pull request #1549 は、オプトインフラグの背後に実験的な COS キャッシュバックエンドを導入しました。これを有効にするには、パイプラインを設定する前に単一の行を追加するだけです。

import { env, pipeline } from "https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.2.0";

// 👇 実験的な Cross-Origin Storage キャッシュバックエンドをオプトインします。

env.experimental_useCrossOriginStorage = true;

const asr = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', { device: 'webgpu' });

const result = await asr('jfk.wav');

console.log(result);

このフラグを設定すると、Transformers.js は各 Xet-tracked モデルファイル（大規模な ONNX 重みファイル）の SHA-256 ハッシュを解決するために、生の Xet ポインタ（例：生ポインタファイル）を取得し、その oid sha256: フィールドから抽出します。その後、このハッシュを navigator.crossOriginStorage のキーとして使用します。モデルがすでに COS（Cross-Origin Storage）に存在する場合（別のサイトが先にそこに保存した場合）、ネットワーク往復なしですぐに提供されます。もし存在しない場合は、通常のダウンロードにフォールバックし、その結果を次の呼び出しのために COS に保存します。この玩具例では、実用上の利点は、Xenova/whisper-tiny.en および Xenova/distilbert-base-uncased-finetuned-sst-2-english（もちろん ort-wasm-simd-threaded.asyncify.wasm も）が、異なるオリジンからどれだけ要求されても、一度だけ通信経路を跨げば済む点にあります。

フラグの接頭辞に experimental_ が付いている点にご注意ください。これは意図的なもので、基盤となるブラウザ API がまだ標準化されていないこと、およびメジャーバージョンアップなしに変更される可能性があることを示しています。

今日から試す

COS API はまだどのブラウザでもネイティブ実装されていませんが、それを待たずに実験することは可能です。Cross-Origin Storage 拡張機能をインストールして、すべてのページに navigator.crossOriginStorage のポリフィル（互換性実装）を注入し、完全なフローをテストしてください。拡張機能のソースコードを閲覧したり、利用手順に従ったりして、すぐに始められます。

拡張機能をインストールしたら、すぐに完全なエンドツーエンドの体験を試すことができます。まず、COS 有効化された最初のトイ例 (toy example) を開き、Xenova/whisper-tiny.en の読み込みを待ちます。次に、2 つ目のオリジンからの COS 有効化されたトイ例を開きます。以前見た 177 MB の再ダウンロードの代わりに、モデルは COS (Cross-Origin Storage API) からミリ秒単位で提供されます。拡張機能のポップアップウィンドウを開くと、COS が動作している様子を確認できます。リソース別表示を選択すると、SHA-256 ハッシュが 950978b1dbcbf250335358c1236053ba19a7f7849b33dc777f4421b72b7626fa のリソースが、https://googlechrome.github.io と https://rawcdn.rawgit.net の両方で共有されているのがわかります。これは直感的にはわかりにくいかもしれませんが、Hugging Face 上の SHA-256 ハッシュと比較して確認できる通り、https://huggingface.co/Xenova/whisper-tiny.en/blob/main/onnx/decoder_model_merged.onnx を見ていることになります。現時点では、この拡張機能はあなたのようなパワーユーザー向けに主に設計されています。ブラウザに実装されれば、ブラウザの設定ページでより親しみやすい統合が行われるようになります。以下のスクリーンショットは、リソース別表示タブがアクティブな状態の拡張機能ポップアップウィンドウを示しており、共有されているリソースとそのハッシュ値、さらにそのリソースを COS キャッシュに保持している 2 つのオリジンを確認できます。

呼びかけ

Transformers.js アプリを自作されている場合は、呼びかけはシンプルです。最初の pipeline() 呼び出しの前に env.experimental_useCrossOriginStorage = true を設定し、拡張機能をインストールして、ネットワークタブから重複するダウンロードが消える様子をご覧ください。各サイトが参加することで、他のすべてのサイトのユーザーにとって体験がより高速かつ低コストになります。参加は完全にリスクフリーです：COS API がユーザーに COS 拡張機能がインストールされていないためにサポートされない場合、コードはデフォルトのパス（Web Cache API）に自動的にフォールバックします。

Transformers.js は COS の実験を独りで行っているわけではありません。WebLLM（オプトイン方式、詳細はドキュメントを参照）や wllama（自動実行、詳細は PR を参照）も、この提案された API に大きな期待を寄せています。

Chrome チームでは、COS API のネイティブ実装を検討しています。これはまだ初期段階の提案ですので、API 自体やその提案の形状についてのご意見をお待ちしています。不具合報告は Cross-Origin Storage リポジトリで行い、サポート表明を行うか、PR を作成することも可能です。

原文を表示

Back to Articles

(This is a guest post by Developer Relations Engineer Thomas Steiner from the Chrome team at Google.)

Transformers.js provides Web developers with a simple way to use the power of transformers in their Web apps through task-specific pipelines. To run inference in the browser, developers create an instance of pipeline() and specify a task they want to use the pipeline for. As a concrete example, the following snippet shows how to set up an automatic speech recognition (ASR) pipeline.

code

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.2.0';

const asr = await pipeline(
  'automatic-speech-recognition',
  'Xenova/whisper-tiny.en',
  { device: 'webgpu' },
);
const result = await asr('jfk.wav');
console.log(result);

The cache challenge

You will notice in the source code that I specified Xenova/whisper-tiny.en as the model, which is a very decent choice for common English automatic speech recognition tasks. In fact, it's even *the* default model according to the Transformers.js default model resolution, as per the linked excerpt.

Model resources

When you run this example in the browser, Transformers.js automatically takes care of downloading and caching the relevant model resources and Wasm files. The following screenshot shows the Chrome DevTools Cache storage section after visiting the app. When you reload the page, the resources are served from the Cache API, and the model returns results almost instantly.

However, Xenova/whisper-tiny.en being a popular model (and, as mentioned before, even being *the* ASR default model in Transformers.js), you can well imagine that more than just one app that you visit would use it. To simulate this situation, here's the same example app from before, but served from a different origin. When you visit this different origin app, rather than being usable almost instantly, the browser instead has to download and cache all the model resources again, even if they're byte-by-byte the same as before. Even in this toy example, this adds up to 177 MB of duplicate download and storage, as you can examine in the Storage section of the Chrome DevTools Application panel. You can imagine that this quickly adds up.

Wasm runtime resources

But it gets worse. Let's add a second pipeline to the toy example: sentiment analysis. Sentiment analysis by default uses the Xenova/distilbert-base-uncased-finetuned-sst-2-english model. By not specifying the model, Transformers.js' default model resolution automatically picks it for you.

code

const classifier = await pipeline('sentiment-analysis');
const sentiment = await classifier(result.text);
pre.append('\n\n' + JSON.stringify(sentiment, null, 2));

Two entirely different AI models, but they depend on the same 4,733 kB ort-wasm-simd-threaded.asyncify.wasm WebAssembly (Wasm) runtime file from the underlying ONNX Runtime library that Transformers.js is built on top of. Open the extended demo on a different origin, and you will notice in the Network tab how also the Wasm runtime gets downloaded and cached again.

So even if you run apps that don't share the same AI models, your browser still makes redundant requests for shared Wasm resources you already have, and on top of that also caches them again, which consumes space on your hard disk.

Cache isolation

AI model resources serving

By default, AI model resources come from the Hugging Face Hub, and ultimately the Hugging Face CDN. The browser makes a request for a resource like https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/config.json which then gets redirected to the final CDN URL like https://huggingface.co/api/resolve-cache/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/0b6928efcb76139cae2c6881d49cda67fe119f42/config.json?%2FXenova%2Fdistilbert-base-uncased-finetuned-sst-2-english%2Fresolve%2Fmain%2Fconfig.json=&etag=%223c36342ef1f74de2797d667c68c6b7b988d0b87c%22 in this case.

Wasm runtime resources serving

The Wasm runtime resources are served from the jsDelivr CDN by default. For example, ort-wasm-simd-threaded.asyncify.wasm comes from https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm at the time of this writing.

Now you may say that if different apps, even though running on different origins, in the end serve their resources from the same CDN URLs, caching shouldn't be a problem, as long as the final URLs are the same. Unfortunately, this is not how caching works in browsers for a long time. The article Gaining security and privacy by partitioning the cache goes into all the details, but essentially, caches are isolated by origin to prevent timing attacks: the time a website takes to respond to HTTP requests can reveal that the browser has accessed the same resource in the past, which makes the browser vulnerable to security and privacy leaks.

Chrome's implementation

The concrete implementation may vary by browser, but in Chrome, cached resources are keyed using a Network Isolation Key in addition to the resource URL. The Network Isolation Key is composed of the top-level site and the current-frame site. Take the previous toy examples hosted on the origins https://googlechrome.github.io and https://rawcdn.rawgit.net. If they both use the Wasm runtime from https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm, their cache keys will look like in the following table.

Network Isolation Key

Resource URL

Top-level site

Current-frame site

code

https://googlechrome.github.io

code

https://googlechrome.github.io

code

https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm

code

https://rawcdn.rawgit.net

code

https://rawcdn.rawgit.net

code

https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm

So even if the resource URLs are exactly the same, since the Network Isolation Keys don't match, there's no cache hit, which means duplicate download and duplicate storage. This is the challenge that the Cross-Origin Storage proposal aims to tackle.

Enter the Cross-Origin Storage API

💡 Note: The Cross-Origin Storage API is an early-stage proposal that isn't final. While the proposed API is not yet natively implemented in any browser, you don't have to wait to experiment with it. Install the Cross-Origin Storage extension to inject the navigator.crossOriginStorage polyfill on all pages and test the complete flow.

The proposed Cross-Origin Storage (COS) API introduces a dedicated navigator.crossOriginStorage interface through which web apps can store and retrieve large files across origin boundaries, identified not by a URL, but by a cryptographic hash.

The Cross-Origin Storage API logo: a stylized walking person, as typically encountered on crosswalk signs.

That last point about cryptographic hashes is key. Because COS identifies files by their hash rather than by their URL or origin, the same ort-wasm-simd-threaded.asyncify.wasm Wasm runtime you downloaded while visiting https://googlechrome.github.io is recognized as identical to the one https://rawcdn.rawgit.net is about to request, no matter where either of the two origins fetched it from. See the following code snippet that illustrates the basic flow.

code

const hash = {
  algorithm: 'SHA-256',
  value: '8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4',
};

try {
  const handle = await navigator.crossOriginStorage.requestFileHandle(hash);
  // Cache hit! Get the file as a Blob and use it directly.
  const fileBlob = await handle.getFile();
} catch (err) {
  // Cache miss. Download from network, then store for next time.
  const fileBlob = await fetch('https://cdn.jsdelivr.net/.../ort-wasm-simd-threaded.asyncify.wasm')
    .then(r => r.blob());
  const handle = await navigator.crossOriginStorage.requestFileHandle(
    hash,
    { create: true, origins: '*' },
  );
  const writableStream = await handle.createWritable();
  await writableStream.write(fileBlob);
  await writableStream.close();  
}

If the resource is in COS, you get back a FileSystemFileHandle from which you can read the blob directly via getFile() (the resulting File inherits from Blob). If the resource is not in COS, you fall back to the network, and write the resource into COS for the next app that needs it, which could be your app, or another unrelated app, potentially on a completely different origin.

The API is deliberately shaped after the File System Standard's FileSystemDirectoryHandle.getFileHandle() you likely are familiar with from the Origin Private File System (OPFS) API. The hash parameter plays the same role as the name parameter in OPFS: uniquely identifying a resource. The options.create flag works the same way: absent or false for read-only access, true when you intend to write.

Control who can read what

Not every resource should be globally shared. COS gives developers precise control over visibility through the origins option when storing a file.

Setting origins: '*' makes a file globally available. Any origin can find it by hash. This is the right choice for AI model resources or the Wasm runtime in the Transformers.js example: the whole point is that every app on the Web benefits from a single cached copy.

Passing a specific list of origins, like origins: ['https://write.example.com', 'https://calculate.example.com'], restricts access to those sites. This works well for proprietary resources shared across a company's own properties that shouldn't be discoverable by anyone else, like a proprietary proofreading AI model used in a commercial office suite.

Omitting origins entirely makes the file available only to same-site origins. This is a sensible default for resources shared across all of an organization's subdomains, but not intended to cross organizational boundaries.

One important rule: visibility can be upgraded but never downgraded. If a file is already globally available, a later attempt to store it with a restricted origins list is silently ignored. This prevents a malicious actor from re-storing a public resource and narrowing its availability. The reverse is possible: a file initially stored with a restricted origins list can later be made more permissive. Any site, not just the original storer, can call requestFileHandle() for the same hash (hashes are not a secret) with create: true and a broader origins value, and given the browser verifies the hash matches, the resource becomes available to the wider audience from that point on. Note that the upgrading site must still write the full file through the returned handle. This requirement exists to prevent sites from exploiting the upgrade path as a side-channel to detect whether a particular file was already stored in COS.

Integrity by design

A subtle but important property of COS is that the browser verifies the hash when you write a file. If the data you write doesn't match the declared hash, the write fails with an error. This makes integrity checking automatic: an app reading a file from COS can be confident it's getting exactly the bytes it expected. The same guarantee it would have had if it had computed the hash itself after a network download.

This turns out to be doubly useful in the Transformers.js scenario. Today, after downloading model weights, most apps have no practical way to verify that the CDN served the right bytes. With COS, every file in the store is implicitly verified on write, no matter where it came from, the official Hugging Face CDN or a random site's self-hosted mirror.

Privacy without sacrificing utility

Of course a cross-origin shared cache raises the same question as the partitioned HTTP cache in reverse: if any site can probe for the presence of a file by hash, couldn't an attacker learn something about the user's browsing history by checking whether, say, a game engine Wasm module is cached?

COS addresses this through two complementary mechanisms:

First, the origins field: proprietary resources that shouldn't be globally probeable simply shouldn't be stored with origins: '*', which, through developer education, developers are encouraged to consider whenever it makes sense.

Second, availability gating: even for globally declared files, the browser may suppress confirmation of a file's presence if it hasn't been encountered across a sufficient number of distinct origins. A file that only appears on one or two sites could still serve as a cross-site identifier, so the browser may return an error as if the file weren't there at all, regardless of what's physically on disk. On the Chrome team, we are conscious of the possible privacy leaks uncommon resources could cause and plan generally to mitigate it through restricting which exact resources can be cached. The concrete mitigations are still being fleshed out.

Crucially, this means an error is not a definitive answer. It might mean "not stored", or it might mean "stored, but the browser isn't telling you". Apps should always handle it the same way: fall back to the network.

What this means for the Transformers.js example

Going back to the toy examples from before: the ort-wasm-simd-threaded.asyncify.wasm runtime weighs in at 4,733 kB and is shared by every Transformers.js-powered app regardless of which AI model it uses. With COS, the first app to load it downloads it once and stores it under its SHA-256 hash with origins: '*'. Every subsequent app, whether on https://googlechrome.github.io, on https://rawcdn.rawgit.net, or any other origin, finds it in COS immediately. The 177 MB of duplicate Whisper model weights? Same story: Xenova/whisper-tiny.en gets downloaded once, recognized by hash the second time around, and served from COS in milliseconds. And of course, the same also happens for Xenova/distilbert-base-uncased-finetuned-sst-2-english.

Transformers.js itself is already piloting the COS API at the library level. Pull request #1549 introduced an experimental COS cache backend behind an opt-in flag. Enabling it takes a single line before you set up your pipeline:

code

import { env, pipeline } from "https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.2.0";

// 👇 Opt in to the experimental Cross-Origin Storage cache backend.
env.experimental_useCrossOriginStorage = true;

const asr = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', { device: 'webgpu' });
const result = await asr('jfk.wav');
console.log(result);

With that flag set, Transformers.js resolves the SHA-256 hash for each Xet-tracked model file (the large ONNX weight files) by fetching the raw Xet pointer (example raw pointer file) and extracting its oid sha256: field. It then uses that hash as the key for navigator.crossOriginStorage. If the model is already in COS (because another site stored it there first), it's served instantly without a network round-trip. If not, it falls back to a regular download and stores the result in COS for the next caller. With the toy example, the advantage in practice is that Xenova/whisper-tiny.en and Xenova/distilbert-base-uncased-finetuned-sst-2-english (and of course ort-wasm-simd-threaded.asyncify.wasm) only ever need to cross the ether once, regardless of how many different origins ask for them.

Note the experimental_ prefix on the flag. It's intentional and signals that the underlying browser API has not yet been standardized and may change without a major version bump.

Try it today

The COS API is not yet natively implemented in any browser, but you don't have to wait to experiment with it. Install the Cross-Origin Storage extension to inject the navigator.crossOriginStorage polyfill on all pages and test the complete flow. You can inspect the source code of the extension and follow the usage instructions to get started.

With the extension installed, you can try the full end-to-end experience right now: open the first toy example with COS enabled, let it load Xenova/whisper-tiny.en, then open the toy example with COS enabled from the second origin. Instead of the 177 MB re-download you saw before, the model is served from COS in milliseconds. When you open the extension's popup window, you can see COS in action. If you View by Resource, you can see the resource with the SHA-256 hash 950978b1dbcbf250335358c1236053ba19a7f7849b33dc777f4421b72b7626fa shared across https://googlechrome.github.io and https://rawcdn.rawgit.net. It may not be obvious, but as you can verify by comparing the SHA-256 hash on Hugging Face, you're looking at https://huggingface.co/Xenova/whisper-tiny.en/blob/main/onnx/decoder_model_merged.onnx. For now, the extension is mostly aimed at power users like you. Once implemented in the browser, there will be a friendlier integration in the browser's Settings page. The screenshot below shows the extension's popup window with the View by Resource tab active, where you can see the shared resource with its hash and the two origins that have it in their COS cache.

Call to action

If you're building your own Transformers.js app, the call to action is simple: add env.experimental_useCrossOriginStorage = true before your first pipeline() call, install the extension, and watch the duplicate downloads disappear from your Network tab. Every site that opts in makes the experience faster and cheaper for every other site's users. Opting in is completely risk-free: if the COS API isn't supported because the user doesn't have the COS extension installed, the code just falls back to the default path (the Web Cache API).

Transformers.js is not alone in experimenting with COS. WebLLM (opt-in, see documentation) and wllama (automatic, see PR) are likewise excited about this proposed API.

On the Chrome team, we're considering implementing the COS API natively in the browser. As an early stage proposal, we welcome feedback on the API, and the shape of the proposal itself. The Cross-Origin Storage repository is the place to file issues, express support, or open PRs.

この記事をシェア

NVIDIA Developer Blog★42026年6月17日 07:30

NVIDIA XR AI を用いた AR グラスおよび XR デバイス向け AI エージェントの構築

NVIDIA は、AR グラスや XR デバイス上で動作する AI エージェントを構築するための技術とアプローチを NVIDIA Developer Blog で発表した。

Google DeepMind★32026年6月9日 23:10

Gemma 4 12B の紹介：統一型エンコーダー非搭載マルチモーダルモデル

Google DeepMind が、エンコーダーを不要とした新しいマルチモーダルモデル「Gemma 4 12B」を発表した。このモデルは画像とテキストの両方を処理できる統合型アーキテクチャを採用している。

Google Developers AI★42026年6月3日 09:00

Google AI Edge を活用した Gemma 4 12B のローカル導入：ラップトップで実行可能なエージェント型ワークフローの実現

Google DeepMind は、メモリ 16GB の一般的なラップトップでも動作する「Gemma 4 12B」モデルを発表し、macOS 上で Google AI Edge Gallery を介してローカルデータ処理や視覚的洞察生成を可能にするエージェント型 AI ワークフローを提供している。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.2.0'; const asr = await pipeline( 'automatic-speech-recognition', 'Xenova/whisper-tiny.en', { device: 'webgpu' }, ); const result = await asr('jfk.wav'); console.log(result);

import { env, pipeline } from "https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.2.0"; // 👇 Opt in to the experimental Cross-Origin Storage cache backend. env.experimental_useCrossOriginStorage = true; const asr = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', { device: 'webgpu' }); const result = await asr('jfk.wav'); console.log(result);

キーポイント

影響分析

編集コメント

キャッシュの課題

モデルリソース

Wasm ランタイムリソース

キャッシュ分離

AI モデルリソースの提供

Wasm ランタイムリソースの提供

Chrome の実装

クロスオリジンストレージ API の登場

誰が何を閲覧できるかを制御する

デザインによる整合性保証

プライバシーを犠牲にせずユーティリティを維持する

Transformers.js の例におけるこの意味

今日から試す

呼びかけ

The cache challenge

Model resources

Wasm runtime resources

Cache isolation

AI model resources serving

Wasm runtime resources serving

Chrome's implementation

Enter the Cross-Origin Storage API

Control who can read what

Integrity by design

Privacy without sacrificing utility

What this means for the Transformers.js example

Try it today

Call to action

関連記事

キーポイント

影響分析

編集コメント

キャッシュの課題

モデルリソース

Wasm ランタイムリソース

キャッシュ分離

AI モデルリソースの提供

Wasm ランタイムリソースの提供

Chrome の実装

クロスオリジンストレージ API の登場

誰が何を閲覧できるかを制御する

デザインによる整合性保証

プライバシーを犠牲にせずユーティリティを維持する

Transformers.js の例におけるこの意味

今日から試す

呼びかけ

The cache challenge

Model resources

Wasm runtime resources

Cache isolation

AI model resources serving

Wasm runtime resources serving

Chrome's implementation

Enter the Cross-Origin Storage API

Control who can read what

Integrity by design

Privacy without sacrificing utility

What this means for the Transformers.js example

Try it today

Call to action

関連記事