Vercel Blog·2026年5月29日 13:00·約9分

推論盗難からの保護

#LLM セキュリティ #推論盗用 #API ガバナンス #Vercel

TL;DR

Vercel は、従来のレート制限や認証では不十分である「推論盗用」攻撃の実態と、BotID を活用したリクエストごとの検証という新たな防御アプローチを解説している。

AI深層分析2026年5月30日 07:03

重要/ 5段階

深度40%

キーポイント

推論盗用の経済的インセンティブ

1 回のプロンプトコストが数千円に達する Frontier モデルにおいて、攻撃者は無料または低価格で推論リソースを窃取し、割引価格で転売することで高利益を得ている。

既存防御策の限界

セッションごとの認証や IP レート制限は、攻撃者が数千台のプロキシと使い捨てアカウントを用意してコストを分散させるため、容易に回避され無効化される。

転売可能なアーキテクチャの危険性

AI プレイグラウンドやサポートボットは、システムプロンプトを迂回する攻撃や、OpenAI/Anthropic 互換アダプタによる標準クライアントへの直接投入により、高リスクとなる。

リクエストごとの検証アプローチ

Vercel は BotID を用いて各 AI リクエストを個別に分析・検証することで、セッションベースの対策では防げない盗用を防ぐ実装例を示している。

セッション境界の脆弱性とリクエスト単位の検証

攻撃者はアダプターを介して認証を回避するため、防御チェックはセッション開始時ではなく、各推論リクエストごとに実行する必要があります。

コスト非対称性を利用した防御戦略

推論コストは高額である一方、検証コストは極めて低いため、すべてのリクエストにゲートを設けることで攻撃者のバイパスコストを割に合わないものにする。

従来のCAPTCHAの限界とBotIDの有効性

現代のAIモデルは画像CAPTCHAを容易に突破するため、クライアントサイド機械学習を用いた不可視の検証「Vercel BotID」が各リクエストで機能する。

重要な引用

Inference theft is the unauthorized use of someone else's paid AI inference, either for free consumption or downstream resale.

Rate limits and auth walls aren't sufficient on their own because checks that run once per session get amortized away across thousands of stolen calls.

Sophisticated attackers wrap your custom AI endpoint in an OpenAI- or Anthropic-compatible adapter and fan calls out through residential proxies.

The check has to run on the call the adapter is proxying, not the session it sits behind.

Per-request gates force that ratio down to one, and even at high inference prices, defeating a check on every call isn't worth the cost.

Inference will stay orders of magnitude more expensive than the requests carrying it, so resale stays profitable and attackers will keep iterating.

影響分析・編集コメントを表示

影響分析

この記事は、生成 AI の普及に伴い新たな脅威として浮上した「推論盗用」の経済構造と技術的実態を明確に解明しており、開発者が従来のセキュリティ観念から脱却し、リクエスト単位の厳格な検証を導入する必要性を強く示唆しています。企業にとっては、AI エンドポイント設計における根本的な見直しと、BotID などの高度な防御技術の採用が経営リスク管理上の喫緊の課題であることを伝えています。

編集コメント

AI エンドポイントを公開するすべての開発者にとって、従来の認証・レート制限の限界を理解し、リクエストごとの深度分析を導入する時期が来ていることを示す重要な警鐘です。

HTTP リクエストは安価です。Vercel では約 100 万回あたり 2 ドルを請求しており、1 回の呼び出しあたりのコストは数セントに過ぎません。しかし、フロンティアモデルのエージェントに対する単一のプロンプトには 2 ドルの費用がかかることもあり、AI は従来の 100 万倍も高価になり、推論盗用（inference theft）は攻撃者が運営できる最も利益率の高いビジネスの一つとなっています。私たちは自社の API でも同様の攻撃事例を目撃しています。

インターネットに公開されている AI エンドポイントがある場合、悪用のリスクは高く、請求額が数万ドルやそれ以上に膨れ上がることも容易です。

これらのエンドポイントを保護するには、セッションやサインアップ時ではなく、すべての AI リクエストに対して検証を実行する必要があります。レート制限（rate limits）や認証壁（auth walls）だけでは不十分です。1 セッションごとに一度だけ実行されるチェックは、数千回にわたる盗まれた呼び出しにコストが分散されてしまうためです。

Vercel では、BotID による深層分析を介してすべての AI リクエストをゲートしています。あなた自身も数行のコードで同様の対策をエンドポイントに実装できます。

推論盗用とは何か

推論盗用（inference theft）とは、他人が支払った AI の推論リソースを、無償での利用や下流市場への再販売のために許可なく使用することです。オペレーターは AI 呼び出しごとに支払いを行いますが、攻撃者は推論に対して何も支払わず、トークンを割引価格で再販売します。これはレート制限の悪用を超え、盗まれたリソースを実際の市場で再販売する行為にまで及びます。

どの AI エンドポイントがリスクにさらされているか？

呼び出し側が LLM（大規模言語モデル）のプロンプトに対して意味のある制御権を持つ、インターネットに公開されたすべてのエンドポイントが標的となります。エンドポイントの汎用性が高いほど、盗まれた 1 回の呼び出しあたりの収益は高くなります。

AI プレイグラウンド、例えば AI SDK Playground は、呼び出し元がプロンプト、モデル、そして多くの場合パラメータに対して最大限の制御権を持つという点で最も危険な形態です。盗まれた呼び出しは、あらゆる標準的なクライアントにスムーズに流れ込みます。

システムプロンプトがサーバー側で固定されているサポートボットやドキュメントアシスタントは露出度が低くなりますが、攻撃者はシステムプロンプトを回避してモデルと対話する方法を安価に習得しており、転売が可能になるレベルまで達しています。

転売価値とは、盗まれた呼び出しをプロバイダー互換のクライアントに投入する容易さを指します。

なぜ Web 防御では推論窃盗が軽減されないのか

IP レート制限や認証壁は、1 回あたりの経済性が劇的に低い攻撃に対して構築されたものであり、その場合、IP アドレスやアカウントを操作することのコストに見合う価値がありませんでした。

しかし、盗まれた推論からの収益は十分に高いため、攻撃者は数千もの住宅用プロキシ IP を調達し、あなたのゲート（防御）を突破できる規模で使い捨てアカウントを登録します。レート制限は IP アドレスの群れ全体に分散され、実在するアカウントが認証を通過してしまいます。

悪用のアーキテクチャ

洗練された攻撃者は、カスタム AI エンドポイントを OpenAI 互換または Anthropic 互換のアダプターでラップし、住宅用プロキシを通じて呼び出しを広範囲に展開します。

アダプターは中核コンポーネントです。これは、被害者の固有 API を OpenAI 互換または Anthropic 互換の形式に変換する一度きりのエンジニアリングコストであり、窃取された推論（inference）をあらゆる標準的なコーディングエージェントや SDK に直接投入可能にします。ゼロの限界推論コストに対してリスト価格のわずか 5〜10% で転売できるため、高利益率のビジネスモデルとなり得ます。

最近の事例として、Chipotlai Max というフォークされたコーディングエージェントがあります。これはプロキシを同梱しており、チップotle のカスタマーサポートチャットボートを OpenAI 互換エンドポイントに変換しています。このプロジェクトは、同じ推論窃取アプローチを Home Depot、Lowe's、Target、Starbucks へ移植するための支援を公然と募っています。

アダプターは、攻撃者の下流ユーザーにとってのセッション境界でもあります。彼らは直接エンドポイントに認証するのではなく、アダプターに対して認証を行います。呼び出しがあなたの API に到達した時点では、すでにあなたが防御しようとしていた境界線を越えてしまっています。チェックを行うべきは背後にあるセッションではなく、アダプターがプロキシしている呼び出し自体です。

自社エンドポイントに対する実際の攻撃の形状

2026 年 4 月 12 日、Vercel ドキュメント AI チャットエンドポイントへのトラフィックが急増し、Anthropic の Claude Haiku 4.5 モデルにおいて通常の約 10 倍に達しました。ピーク時には毎分 1,300 リクエストに上昇し、これは一日あたり 1 万ドルを超える推論コストのランニングレートに相当します。

攻撃は、実際のクライアント IP を隠蔽する住宅用プロキシを通じて行われました。2 日間にわたる数十万件のボットリクエストにおいて、標準的な IP ごとのレート制限では何ら有効な対策を講じることができませんでした。

推論盗難に対する防御方法

AI エンドポイントを推論盗難から守るには、すべてのリクエストを検証する必要があります。私たちは Vercel の BotID を使用し、AI リクエストが到着する前にルートハンドラー内で呼び出される深い分析を行います。

検証はすべての AI リクエストで実行されなければならない

もしゲートがリクエストごとのチェックではなくセッション開始時に実行されていた場合、攻撃者は回避コストを一度支払うだけで済み、数十万回に及ぶ盗まれた呼び出しを手にして立ち去ることになったでしょう。セッションごとに実行されるあらゆるチェックは、攻撃者の回避コストをその後のすべての推論呼び出しに分散させることになります。一方、リクエストごとのゲートはこの比率を 1 に強制し、高い推論価格であっても、すべての呼び出しでチェックを突破するコストに見合わないものとなります。

ここで防御側に有利に働くコストの非対称性が機能します。攻撃者が盗もうとしているのは、呼び出しあたりのコストが最も高いリソースである推論ですが、保護のための検証は呼び出しあたりのコストが最も安いものの一つです。

BotID を用いたリクエスト検証の実装

従来の画像 CAPTCHA は、現代の攻撃者に対してもはや機能しません。なぜなら、推論を盗む価値があるほどに強力な AI モデル自体が、それらを容易に回避できるからです。

私たちは Vercel BotID を AI エンドポイントに展開し、すべてのリクエストをゲートします。BotID は Kasada によって提供される深い分析を備えた非表示の CAPTCHA で、クライアントサイドの機械学習を使用して人間とボットを識別します。可視的な課題を表示する必要がないため、セッション開始時だけでなく、すべてのリクエストで実行することが可能です。

推論 (inference)

CAPTCHA

BotID の深層分析により、スパイク発生直後の数分間で一万件以上のボットリクエストが検出されブロックされました。二十四時間以内に、そのエンドポイントへのリクエスト量は通常のレベルで安定しました。

サーバーサイドでは、checkBotId() がルートハンドラー内で実行され、現在処理中のリクエストに対する分類結果を返します。

このルートはクライアント側でも宣言する必要があります。これを怠ると、BotID がリクエストにチャレンジヘッダーを追加しないため、checkBotId() は失敗してしまいます:

次回の next.config.ts ラッパーおよび完全なセットアップについては、BotID のドキュメントをご覧ください。

推論の保護を、アクセス制御だけでなく

推論コストは、それを含むリクエストのコストに比べて桁違いに高くなるため、転売は依然として利益を生み続け、攻撃者は継続して手法を進化させ続けます。

AI エンドポイントを保護するために:

暴露されている AI エンドポイントの特定と監査を行う

攻撃の可能性に基づいて優先順位を付けます。呼び出し側のプロンプト制御が緩いほど、より狙われやすい標的となります

すべてのリクエストに対して、あらゆるエンドポイントをゲート（通過管理）する

AI エンドポイント保護に関するナレッジベースガイドで始めましょう。

さらに詳しく読む

原文を表示

HTTP requests are cheap. Vercel charges ~$2/million, a fraction of a cent per call. But a single prompt to an agent on a frontier model can cost $2, making AI a million times more expensive, and inference theft one of the highest-margin businesses an attacker can run. We have seen this type of attack on our own APIs.

If you have AI endpoints exposed to the internet, the risk of abuse is high and can easily run up bills in the tens of thousands of dollars or more.

Protecting those endpoints requires verification to run on every AI request, not on the session or signup. Rate limits and auth walls aren't sufficient on their own because checks that run once per session get amortized away across thousands of stolen calls.

At Vercel, we gate every AI request through BotID deep analysis, and you can do the same on your own endpoints with a few lines of code.

What inference theft is

Inference theft is the unauthorized use of someone else's paid AI inference, either for free consumption or downstream resale. The operator pays per AI call; the attacker pays nothing for the inference, then resells the tokens at a discount. This goes beyond rate-limit abuse to actual resale of a stolen resource in a market.

Which AI endpoints are at risk?

Any internet-facing endpoint that gives a caller meaningful control over an LLM prompt is a target. The more general the endpoint, the higher the payout per stolen call.

AI playgrounds, like the AI SDK Playground, are the most dangerous shape because the caller has maximum control over the prompt, the model, and often the parameters. Stolen calls land cleanly into any standard client.

Support bots and documentation assistants are less exposed when system prompts are fixed server-side, but attackers have learned how to talk the models around system prompts cheaply enough to make resale viable.

Resale value tracks how easily the stolen calls can be dropped into a provider-compatible client.

Why web defenses don't mitigate inference theft

IP rate limits and auth walls were built for attacks with dramatically lower per-call economics, where gaming IPs and accounts weren't worth the cost.

The payoff from stolen inference is high enough that attackers will procure residential proxy IPs by the thousands and register throwaway accounts at whatever scale defeats your gate. Rate limits get diluted across the fleet of IP addresses, and real accounts pass authentication.

The architecture of abuse

Sophisticated attackers wrap your custom AI endpoint in an OpenAI- or Anthropic-compatible adapter and fan calls out through residential proxies.

The adapter is the key component. It is a one-time engineering cost that presents the victim's idiosyncratic API as OpenAI- or Anthropic-compatible, so the stolen inference drops into any standard coding agent or SDK. Resale at even five to ten percent of list price against zero marginal inference cost can make for a generous-margin business.

A recent example is Chipotlai Max, a forked coding agent that ships with a proxy turning Chipotle's customer-support chatbot into an OpenAI-compatible endpoint. The project openly solicits help porting the same inference theft approach to Home Depot, Lowe's, Target, and Starbucks.

The adapter is also the session boundary for the attacker's downstream users. They authenticate to the adapter, not to your endpoint. By the time a call hits your API, it has already crossed the boundary you were planning to defend. The check has to run on the call the adapter is proxying, not the session it sits behind.

The shape of a real attack on our own endpoint

On April 12, 2026, traffic to the Vercel docs AI chat endpoint spiked to roughly ten times normal volume on Anthropic's Claude Haiku 4.5 model. Traffic rose to 1,300 requests per minute at peak, which would have translated to an inference cost run rate of over ten thousand dollars per day.

The attack came in through residential proxies that obscured the real client IPs. Across hundreds of thousands of bot requests over two days, standard per-IP rate limits had nothing useful to act on.

How to defend against inference theft

Protecting AI endpoints against inference theft requires verification of every request. We use Vercel's BotID with deep analysis, called inside the route handler before the AI request lands.

Verification has to run on every AI request

If our gate had run at session start instead of per request, the attacker would have paid the bypass cost once and walked away with hundreds of thousands of stolen calls. Any check that runs per session amortizes the attacker's bypass cost across every subsequent inference call. Per-request gates force that ratio down to one, and even at high inference prices, defeating a check on every call isn't worth the cost.

This is where the cost asymmetry works in the defender's favor. Inference is the most expensive resource per call the attacker is stealing, but verification is one of the cheapest costs per call for protection.

Implementing request verification with BotID deep analysis

Traditional image CAPTCHAs no longer hold up against modern attackers because the same AI models that make inference worth stealing can easily bypass them.

We deploy Vercel BotID on our AI endpoints, gating every request. BotID is an invisible CAPTCHA with deep analysis powered by Kasada that uses client-side machine learning to distinguish humans from bots without showing a visible challenge, which means it can run on every request rather than only at session start.

BotID deep analysis detected and blocked more than ten thousand bot requests in the first minutes of the spike. Within twenty-four hours, request volume on the endpoint was flat at normal levels.

Server-side, checkBotId() runs inside the route handler and returns a classification for the request currently being served.

The route also has to be declared on the client. Without this, checkBotId() fails because BotID doesn't attach the challenge headers to the request:

See the BotID docs for the next.config.ts wrapper and the full setup.

Protect inference, not just access

Inference will stay orders of magnitude more expensive than the requests carrying it, so resale stays profitable and attackers will keep iterating.

To protect your AI endpoints:

Audit which of your AI endpoints are exposed

Prioritize by attack likelihood: more caller prompt control means an easier target

Gate every endpoint on every request

Get started in our AI endpoint protection Knowledge Base Guide.

この記事をシェア

TechCrunch AI重要度42026年7月15日 06:50

OpenAI の新フラッグシップモデルが自己判断でファイルを削除、利用者が警告を繰り返す

Vercel Blog2026年7月14日 09:00

Chat SDK に X（旧 Twitter）アダプターサポートを追加

Vercel Blog2026年7月14日 09:00

AgentMail が Vercel Marketplace に登場

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Vercel Blog·2026年5月29日 13:00·約9分

推論盗難からの保護

#LLM セキュリティ #推論盗用 #API ガバナンス #Vercel

TL;DR

AI深層分析2026年5月30日 07:03

重要/ 5段階

深度40%

キーポイント

推論盗用の経済的インセンティブ

既存防御策の限界

転売可能なアーキテクチャの危険性

リクエストごとの検証アプローチ

Vercel は BotID を用いて各 AI リクエストを個別に分析・検証することで、セッションベースの対策では防げない盗用を防ぐ実装例を示している。

セッション境界の脆弱性とリクエスト単位の検証

攻撃者はアダプターを介して認証を回避するため、防御チェックはセッション開始時ではなく、各推論リクエストごとに実行する必要があります。

コスト非対称性を利用した防御戦略

従来のCAPTCHAの限界とBotIDの有効性

現代のAIモデルは画像CAPTCHAを容易に突破するため、クライアントサイド機械学習を用いた不可視の検証「Vercel BotID」が各リクエストで機能する。

重要な引用

Inference theft is the unauthorized use of someone else's paid AI inference, either for free consumption or downstream resale.

Rate limits and auth walls aren't sufficient on their own because checks that run once per session get amortized away across thousands of stolen calls.

Sophisticated attackers wrap your custom AI endpoint in an OpenAI- or Anthropic-compatible adapter and fan calls out through residential proxies.

The check has to run on the call the adapter is proxying, not the session it sits behind.

Per-request gates force that ratio down to one, and even at high inference prices, defeating a check on every call isn't worth the cost.

Inference will stay orders of magnitude more expensive than the requests carrying it, so resale stays profitable and attackers will keep iterating.

影響分析・編集コメントを表示

影響分析

編集コメント

インターネットに公開されている AI エンドポイントがある場合、悪用のリスクは高く、請求額が数万ドルやそれ以上に膨れ上がることも容易です。

推論盗用とは何か

どの AI エンドポイントがリスクにさらされているか？

転売価値とは、盗まれた呼び出しをプロバイダー互換のクライアントに投入する容易さを指します。

なぜ Web 防御では推論窃盗が軽減されないのか

悪用のアーキテクチャ

自社エンドポイントに対する実際の攻撃の形状

推論盗難に対する防御方法

検証はすべての AI リクエストで実行されなければならない

BotID を用いたリクエスト検証の実装

推論 (inference)

CAPTCHA

サーバーサイドでは、checkBotId() がルートハンドラー内で実行され、現在処理中のリクエストに対する分類結果を返します。

次回の next.config.ts ラッパーおよび完全なセットアップについては、BotID のドキュメントをご覧ください。

推論の保護を、アクセス制御だけでなく

AI エンドポイントを保護するために:

暴露されている AI エンドポイントの特定と監査を行う

攻撃の可能性に基づいて優先順位を付けます。呼び出し側のプロンプト制御が緩いほど、より狙われやすい標的となります

すべてのリクエストに対して、あらゆるエンドポイントをゲート（通過管理）する

AI エンドポイント保護に関するナレッジベースガイドで始めましょう。

さらに詳しく読む

原文を表示

If you have AI endpoints exposed to the internet, the risk of abuse is high and can easily run up bills in the tens of thousands of dollars or more.

At Vercel, we gate every AI request through BotID deep analysis, and you can do the same on your own endpoints with a few lines of code.

What inference theft is

Which AI endpoints are at risk?

Any internet-facing endpoint that gives a caller meaningful control over an LLM prompt is a target. The more general the endpoint, the higher the payout per stolen call.

Resale value tracks how easily the stolen calls can be dropped into a provider-compatible client.

Why web defenses don't mitigate inference theft

IP rate limits and auth walls were built for attacks with dramatically lower per-call economics, where gaming IPs and accounts weren't worth the cost.

The architecture of abuse

Sophisticated attackers wrap your custom AI endpoint in an OpenAI- or Anthropic-compatible adapter and fan calls out through residential proxies.

The shape of a real attack on our own endpoint

The attack came in through residential proxies that obscured the real client IPs. Across hundreds of thousands of bot requests over two days, standard per-IP rate limits had nothing useful to act on.

How to defend against inference theft

Protecting AI endpoints against inference theft requires verification of every request. We use Vercel's BotID with deep analysis, called inside the route handler before the AI request lands.

Verification has to run on every AI request

Implementing request verification with BotID deep analysis

Traditional image CAPTCHAs no longer hold up against modern attackers because the same AI models that make inference worth stealing can easily bypass them.

BotID deep analysis detected and blocked more than ten thousand bot requests in the first minutes of the spike. Within twenty-four hours, request volume on the endpoint was flat at normal levels.

Server-side, checkBotId() runs inside the route handler and returns a classification for the request currently being served.

The route also has to be declared on the client. Without this, checkBotId() fails because BotID doesn't attach the challenge headers to the request:

See the BotID docs for the next.config.ts wrapper and the full setup.

Protect inference, not just access

Inference will stay orders of magnitude more expensive than the requests carrying it, so resale stays profitable and attackers will keep iterating.

To protect your AI endpoints:

Audit which of your AI endpoints are exposed

Prioritize by attack likelihood: more caller prompt control means an easier target

Gate every endpoint on every request

Get started in our AI endpoint protection Knowledge Base Guide.

この記事をシェア

TechCrunch AI重要度42026年7月15日 06:50

OpenAI の新フラッグシップモデルが自己判断でファイルを削除、利用者が警告を繰り返す

Vercel Blog2026年7月14日 09:00

Chat SDK に X（旧 Twitter）アダプターサポートを追加

Vercel Blog2026年7月14日 09:00

AgentMail が Vercel Marketplace に登場

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

推論盗難からの保護

キーポイント

重要な引用

影響分析

編集コメント

関連記事

推論盗難からの保護

キーポイント

重要な引用

影響分析

編集コメント

関連記事