読み込み中…

TLDR AI·2026年6月3日 09:00·約9分

大規模なAI推論の不正利用防止について（5分読了）

#LLM #推論盗難 #セキュリティ #BotID #API ガバナンス

TL;DR

Vercel は、AI エンドポイントへの不正利用（推論盗難）が莫大な請求額を招くリスクを指摘し、セッションごとの認証ではなく全リクエストに対する BotID による深層分析の必要性を提唱している。

AI深層分析2026年6月4日 22:04

重要/ 5段階

深度40%

キーポイント

推論盗難の経済的インセンティブとリスク

HTTP リクエストは安価だが、フロンティアモデルへのプロンプトは高コストであり、攻撃者はこれを無料で利用してトークンを再販売することで莫大な利益を得ている。

従来の防御策の限界

セッションごとの認証やレート制限だけでは不十分で、一度チェックを行えば数千回の不正呼び出しにコストが分散され、実効性が薄れるため、全リクエストでの検証が必要となる。

BotID による包括的対策

Vercel はすべての AI リクエストを BotID で深層分析し、不正なボットやスクリプトを検知・ブロックする仕組みを導入しており、開発者も同様のコードで自社のエンドポイントを保護できる。

重要な引用

a single prompt to an agent on a frontier model can cost $2, making AI a million times more expensive

Rate limits and auth walls aren't sufficient on their own because checks that run once per session get amortized away across thousands of stolen calls.

At Vercel, we gate every AI request through BotID deep analysis

影響分析・編集コメントを表示

影響分析

本記事は、生成 AI の普及に伴い顕在化する「インフラコストの悪用」という新たな脅威を浮き彫りにし、開発者に対して従来のセキュリティモデル（セッションベース）の見直しを迫っている。Vercel が実装する BotID による全リクエスト検証のアプローチは、業界全体で標準化されるべき防御基準となり得る。

編集コメント

生成 AI の利用コストが爆発的に増加する中、セキュリティ対策も「認証」から「リクエストごとの挙動分析」へと進化させる必要があるという示唆に富む記事です。

5 min read

May 29, 2026

HTTP リクエストは安価です。Vercel は約 100 万回あたり$2 を請求しており、1 回の呼び出しあたりのコストは数セントに過ぎません。しかし、フロンティアモデルのエージェントに対する単一のプロンプトでさえ$2 かかる場合があり、AI は従来の 100 万倍もの高価なものとなり、推論盗用（inference theft）は攻撃者が運営できる最も利益率の高いビジネスの一つとなります。私たちは自社の API でも同様の攻撃を目撃しています。

インターネットに公開されている AI エンドポイントがある場合、悪用のリスクは高く、請求額が数万ドルやそれ以上に膨れ上がることも容易です。

これらのエンドポイントを保護するには、セッションやサインアップ時ではなく、すべての AI リクエストに対して検証を実行する必要があります。レート制限や認証壁だけでは不十分です。1 セッションごとに一度だけ実行されるチェックは、数千回にわたる盗まれた呼び出しにコストが分散（アモタイズ）されてしまうためです。

Vercel では、BotID による詳細な分析を介してすべての AI リクエストをゲートしており、数行のコードであなたも自社のエンドポイントで同様の対策を実装できます（#how-to-defend-against-inference-theft）。

Link to headingWhat inference theft is

推論盗用（inference theft）とは、他人が支払った AI 推論を、無料での利用や下流での再販売のために無許可で使用することです。オペレーターは AI の呼び出しごとに支払いを行いますが、攻撃者は推論に対して何も支払わず、トークンを割引価格で再販売します。これはレート制限の悪用を超え、盗まれたリソースを実際の市場で再販売する行為にまで及びます。

リンク見出し

どの AI エンドポイントがリスクにさらされているのか？

呼び出し元に LLM プロンプトに対する意味のある制御権を与えるインターネット公開型のエンドポイントはすべて標的となります。エンドポイントの汎用性が高いほど、盗まれた 1 回の呼び出しあたりの収益は高くなります。

AI プレイグラウンド（例：AI SDK Playground）は最も危険な形態です。なぜなら、呼び出し側がプロンプト、モデル、そして多くの場合パラメータに対して最大限の制御権を持つからです。盗まれた呼び出しは、あらゆる標準的なクライアントにスムーズに転送されます。

サポートボットやドキュメントアシスタントは、システムプロンプトがサーバーサイドで固定されているため露出度は低くなりますが、攻撃者はシステムプロンプトを回避してモデルと対話する方法を学び、安価な手法で再販売が可能になるレベルまで到達しています。

再販価値とは、盗まれた呼び出しをプロバイダー互換のクライアントにどれだけ容易に転送できるかを表す指標です。

リンク見出し

なぜ Web 防御では推論窃盗が軽減されないのか？

IP レート制限や認証壁は、1 回あたりの経済性が劇的に低い攻撃（ゲーム化された IP アドレスやアカウントの価値がコストに見合わない場合）に対する防御として構築されました。

推論窃盗からの収益は十分に高いため、攻撃者はゲートを突破するために数千単位の住宅用プロキシ IP を調達し、必要な規模で使い捨てアカウントを登録します。レート制限は IP アドレスの群れ全体に分散され、実在するアカウントは認証を通過してしまいます。

リンク見出し

悪用のアーキテクチャ

洗練された攻撃者は、カスタム AI エンドポイントを OpenAI 互換または Anthropic 互換のアダプターでラップし、住宅用プロキシを介して呼び出しを広範囲に展開します。

このアダプターが鍵となるコンポーネントです。これは一度きりのエンジニアリングコストであり、被害者の固有の API を OpenAI 互換または Anthropic 互換として提示することで、盗まれた推論（inference）をあらゆる標準的なコーディングエージェントや SDK に直接投入可能にします。リスト価格のわずか 5〜10% で再販売し、限界推論コストがゼロであれば、高利益率のビジネスとなり得ます。

最近の事例として、Chipotlai Max というフォークされたコーディングエージェントがあります。これはプロキシを同梱しており、チップotle のカスタマーサポートチャットボットを OpenAI 互換エンドポイントに変換しています。このプロジェクトは、Home Depot、Lowe's、Target、Starbucks に対して同じ推論盗難アプローチを移植するための支援を公然と呼びかけています。

アダプターはまた、攻撃者の下流ユーザーに対するセッション境界としても機能します。彼らはエンドポイントではなく、アダプターに対して認証を行います。呼び出しがあなたの API に到達する頃には、すでにあなたが防御しようとしていた境界線を越えています。チェックを行うべきは、背後にあるセッションではなく、アダプターがプロキシ化する呼び出し自体です。

リンク見出し

自社エンドポイントに対する実際の攻撃の姿

2026 年 4 月 12 日、Vercel ドキュメント AI チャットエンドポイントへのトラフィックが、Anthropic の Claude Haiku 4.5 モデルにおいて通常の約 10 倍に急増しました。ピーク時には毎分 1,300 リクエストに達し、これは推論コストの年間換算で一日あたり 1 万ドルを超える規模でした。

攻撃は、実際のクライアント IP を隠蔽する住宅用プロキシを経由して行われました。2 日間にわたる数十万回のボットリクエスト全体において、IP ごとの標準的なレート制限では何ら有効な対策を講じる余地がありませんでした。

リンク見出し

推論盗難に対する防御策

AI エンドポイントを推論盗難から守るには、すべてのリクエストに対して検証を行う必要があります。私たちは Vercel の BotID を活用し、深い分析を実行しています。これは AI へのリクエストが到着する前にルートハンドラー内で呼び出されます。

リンク見出し

検証はすべての AI リクエストで実行されなければならない

もしゲート（防御機構）がセッション開始時ではなく、1 リクエストごとに実行されていたのであれば、攻撃者は回避コストを一度支払うだけで済ませ、数十万回の盗まれた呼び出しを手にして立ち去っていたでしょう。セッション単位で実行されるあらゆるチェックは、攻撃者の回避コストをその後のすべての推論呼び出しに分散させることになります。一方、リクエストごとのゲートはこの比率を 1 に強制し、高い推論価格であっても、すべての呼び出しにおけるチェックを突破するコストに見合うメリットはありません。

ここが、防御側にとって有利に働くコストの非対称性です。攻撃者が盗むリソースの中で、推論は 1 リクエストあたりのコストが最も高額ですが、検証は 1 リクエストあたりの保護コストとしては最も安価なものの一つです。

リンク見出しBotID の深層分析を用いたリクエスト検証の実装

従来の画像 CAPTCHA は、現代の攻撃者に対してはもはや機能しません。推論を盗む価値があるほど強力な AI モデル自体が、これらの CAPTCHA を容易に回避できるからです。

私たちは AI エンドポイントに Vercel BotID を展開し、すべてのリクエストをゲートしています。BotID は、Kasada によって提供される深層分析を備えた非表示の CAPTCHA です。クライアントサイドの機械学習を活用して、可視化された課題なしに人間とボットを区別するため、セッション開始時だけでなく、すべてのリクエストで実行可能です。

BotID の深層分析は、スパイク発生直後の数分間で 1 万件以上のボットリクエストを検出・ブロックしました。24 時間以内に、エンドポイントへのリクエスト量は通常のレベルで安定しました。

サーバーサイドでは、checkBotId() がルートハンドラー内で実行され、現在処理中のリクエストに対する分類結果を返します。

// app/api/ai-chat/route.ts

import { checkBotId } from 'botid/server';

import { NextRequest, NextResponse } from 'next/server';

export async function POST(request: NextRequest) {

const verification = await checkBotId();

if (verification.isBot) {

return NextResponse.json({ error: 'Access denied' }, { status: 403 });

}

// Your existing AI SDK call path

}

このルートはクライアント側でも宣言されている必要があります。これを怠ると、BotID がリクエストに課題ヘッダーを付与しないため、checkBotId() は失敗します:

// instrumentation-client.ts

import { initBotId } from 'botid/client/core';

initBotId({

protect: [{ path: '/api/ai-chat', method: 'POST' }],

});

次の.config.ts ラッパーと完全なセットアップについては、BotID ドキュメントをご覧ください。

リンク見出し推論の保護、アクセス制御のみではない

推論は、それが運ぶリクエストに比べて桁違いに高価であり続けるため、転売は依然として利益を生み続け、攻撃者は引き続き改良を繰り返します。

AI エンドポイントを保護するために:

公開されている AI エンドポイントの監査を行う
攻撃の可能性に基づいて優先順位をつける: コール側のプロンプト制御が大きいほど、より狙われやすい標的となる
すべてのリクエストに対してすべてのエンドポイントをゲートする

Vercel BotID で AI エンドポイントを保護ボットによる AI バジェット枯渇を防ぐ方法: 数ステップで Vercel BotID を使用してエンドポイントをゲートする方法をご覧ください。ガイドを読む

原文を表示

5 min read

May 29, 2026

HTTP requests are inexpensive. Vercel charges ~$2/million, a fraction of a cent per call. But a single prompt to an agent on a frontier model can cost $2, making AI a million times more expensive, and inference theft one of the highest-margin businesses an attacker can run. We have seen this type of attack on our own APIs.

If you have AI endpoints exposed to the internet, the risk of abuse is high and can easily run up bills in the tens of thousands of dollars or more.

Protecting those endpoints requires verification to run on every AI request, not on the session or signup. Rate limits and auth walls aren't sufficient on their own because checks that run once per session get amortized away across thousands of stolen calls.

At Vercel, we gate every AI request through BotID deep analysis, and you can do the same on your own endpoints with a few lines of code.

Link to headingWhat inference theft is

Inference theft is the unauthorized use of someone else's paid AI inference, either for free consumption or downstream resale. The operator pays per AI call; the attacker pays nothing for inference and then resells the tokens at a discount. This goes beyond rate-limit abuse to actual resale of a stolen resource in a market.

Link to headingWhich AI endpoints are at risk?

Any internet-facing endpoint that gives a caller meaningful control over an LLM prompt is a target. The more general the endpoint, the higher the payout per stolen call.

AI playgrounds, like the AI SDK Playground, are the most dangerous shape because the caller has maximum control over the prompt, the model, and often the parameters. Stolen calls land cleanly into any standard client.

Support bots and documentation assistants are less exposed when system prompts are fixed server-side, but attackers have learned how to talk the models around system prompts cheaply enough to make resale viable.

Resale value tracks how easily the stolen calls can be dropped into a provider-compatible client.

Link to headingWhy web defenses don't mitigate inference theft

IP rate limits and auth walls were built to defend against attacks with dramatically lower per-call economics, where gaming IPs and accounts weren't worth the cost.

The payoff from stolen inference is high enough that attackers will procure residential proxy IPs by the thousands and register throwaway accounts at whatever scale it takes to defeat your gate. Rate limits get diluted across the fleet of IP addresses, and real accounts pass authentication.

Link to headingThe architecture of abuse

Sophisticated attackers wrap your custom AI endpoint in an OpenAI- or Anthropic-compatible adapter and fan calls out through residential proxies.

The adapter is the key component. It is a one-time engineering cost that presents the victim's idiosyncratic API as OpenAI- or Anthropic-compatible, so stolen inference can drop into any standard coding agent or SDK. Reselling at even five to ten percent of the list price, with zero marginal inference cost, can make for a generous-margin business.

A recent example is Chipotlai Max, a forked coding agent that ships with a proxy turning Chipotle's customer-support chatbot into an OpenAI-compatible endpoint. The project openly solicits help in porting the same inference-theft approach to Home Depot, Lowe's, Target, and Starbucks.

The adapter also serves as the session boundary for the attacker's downstream users. They authenticate to the adapter, not to your endpoint. By the time a call hits your API, it has already crossed the boundary you were planning to defend. The check has to run on the call the adapter proxies, not on the session it sits behind.

Link to headingThe shape of a real attack on our own endpoint

On April 12, 2026, traffic to the Vercel docs AI chat endpoint spiked to roughly ten times normal volume on Anthropic's Claude Haiku 4.5 model. Traffic rose to 1,300 requests per minute at peak, which would have translated to an inference cost run rate of over ten thousand dollars per day.

The attack came in through residential proxies that obscured the real client IPs. Across hundreds of thousands of bot requests over two days, standard per-IP rate limits had nothing useful to act on.

Link to headingHow to defend against inference theft

Protecting AI endpoints against inference theft requires verification of every request. We use Vercel's BotID with deep analysis, called inside the route handler before the AI request lands.

Link to headingVerification has to run on every AI request

If our gate had run at session start instead of per request, the attacker would have paid the bypass cost once and walked away with hundreds of thousands of stolen calls. Any check that runs per session amortizes the attacker's bypass cost across every subsequent inference call. Per-request gates force that ratio down to one, and even at high inference prices, defeating a check on every call isn't worth the cost.

This is where the cost asymmetry works in the defender's favor. Inference is the most expensive resource per call that the attacker steals, but verification is one of the cheapest protection costs per call.

Link to headingImplementing request verification with BotID deep analysis

Traditional image CAPTCHAs no longer hold up against modern attackers because the same AI models that make inference worth stealing can easily bypass them.

We deploy Vercel BotID on our AI endpoints, gating every request. BotID is an invisible CAPTCHA with deep analysis powered by Kasada that uses client-side machine learning to distinguish humans from bots without a visible challenge, so it can run on every request rather than only at session start.

BotID deep analysis detected and blocked more than ten thousand bot requests in the first minutes of the spike. Within twenty-four hours, request volume on the endpoint was flat at normal levels.

Server-side, checkBotId() runs inside the route handler and returns a classification for the request currently being served.

code

// app/api/ai-chat/route.tsimport { checkBotId } from 'botid/server';import { NextRequest, NextResponse } from 'next/server';export async function POST(request: NextRequest) {  const verification = await checkBotId();  if (verification.isBot) {    return NextResponse.json({ error: 'Access denied' }, { status: 403 });  }  // Your existing AI SDK call path}

The route also has to be declared on the client. Without this, checkBotId() fails because BotID doesn't attach the challenge headers to the request:

code

// instrumentation-client.tsimport { initBotId } from 'botid/client/core';initBotId({  protect: [{ path: '/api/ai-chat', method: 'POST' }],});

See the BotID docs for the next.config.ts wrapper and the full setup.

Link to headingProtect inference, not just access

Inference will remain orders of magnitude more expensive than the requests it carries, so resale will remain profitable, and attackers will keep iterating.

To protect your AI endpoints:

Audit which of your AI endpoints are exposed
Prioritize by attack likelihood: more caller prompt control means an easier target
Gate every endpoint on every request

Protect your AI endpoints with Vercel BotIDStop bots from draining your AI budget: see how to gate your endpoints with Vercel BotID in a few steps.Read the guide

この記事をシェア

TechCrunch AI重要度42026年7月21日 04:33

OpenAI、オープンウェイトモデルを懸念

Simon Willison Blog2026年7月21日 04:24

コーディングエージェントが逆解析を安価に

Simon Willison Blog重要度42026年7月21日 02:09

中国モデルを恐れるな：米国法提案

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

TLDR AI·2026年6月3日 09:00·約9分

大規模なAI推論の不正利用防止について（5分読了）

#LLM #推論盗難 #セキュリティ #BotID #API ガバナンス

TL;DR

AI深層分析2026年6月4日 22:04

重要/ 5段階

深度40%

キーポイント

推論盗難の経済的インセンティブとリスク

従来の防御策の限界

BotID による包括的対策

重要な引用

a single prompt to an agent on a frontier model can cost $2, making AI a million times more expensive

Rate limits and auth walls aren't sufficient on their own because checks that run once per session get amortized away across thousands of stolen calls.

At Vercel, we gate every AI request through BotID deep analysis

影響分析・編集コメントを表示

影響分析

編集コメント

5 min read

May 29, 2026

インターネットに公開されている AI エンドポイントがある場合、悪用のリスクは高く、請求額が数万ドルやそれ以上に膨れ上がることも容易です。

Link to headingWhat inference theft is

リンク見出し

どの AI エンドポイントがリスクにさらされているのか？

再販価値とは、盗まれた呼び出しをプロバイダー互換のクライアントにどれだけ容易に転送できるかを表す指標です。

リンク見出し

なぜ Web 防御では推論窃盗が軽減されないのか？

リンク見出し

悪用のアーキテクチャ

リンク見出し

自社エンドポイントに対する実際の攻撃の姿

リンク見出し

推論盗難に対する防御策

リンク見出し

検証はすべての AI リクエストで実行されなければならない

リンク見出しBotID の深層分析を用いたリクエスト検証の実装

サーバーサイドでは、checkBotId() がルートハンドラー内で実行され、現在処理中のリクエストに対する分類結果を返します。

// app/api/ai-chat/route.ts

import { checkBotId } from 'botid/server';

import { NextRequest, NextResponse } from 'next/server';

export async function POST(request: NextRequest) {

const verification = await checkBotId();

if (verification.isBot) {

return NextResponse.json({ error: 'Access denied' }, { status: 403 });

}

// Your existing AI SDK call path

}

// instrumentation-client.ts

import { initBotId } from 'botid/client/core';

initBotId({

protect: [{ path: '/api/ai-chat', method: 'POST' }],

});

次の.config.ts ラッパーと完全なセットアップについては、BotID ドキュメントをご覧ください。

リンク見出し推論の保護、アクセス制御のみではない

AI エンドポイントを保護するために:

公開されている AI エンドポイントの監査を行う
攻撃の可能性に基づいて優先順位をつける: コール側のプロンプト制御が大きいほど、より狙われやすい標的となる
すべてのリクエストに対してすべてのエンドポイントをゲートする

原文を表示

5 min read

May 29, 2026

If you have AI endpoints exposed to the internet, the risk of abuse is high and can easily run up bills in the tens of thousands of dollars or more.

At Vercel, we gate every AI request through BotID deep analysis, and you can do the same on your own endpoints with a few lines of code.

Link to headingWhat inference theft is

Link to headingWhich AI endpoints are at risk?

Any internet-facing endpoint that gives a caller meaningful control over an LLM prompt is a target. The more general the endpoint, the higher the payout per stolen call.

Resale value tracks how easily the stolen calls can be dropped into a provider-compatible client.

Link to headingWhy web defenses don't mitigate inference theft

IP rate limits and auth walls were built to defend against attacks with dramatically lower per-call economics, where gaming IPs and accounts weren't worth the cost.

Link to headingThe architecture of abuse

Sophisticated attackers wrap your custom AI endpoint in an OpenAI- or Anthropic-compatible adapter and fan calls out through residential proxies.

Link to headingThe shape of a real attack on our own endpoint

The attack came in through residential proxies that obscured the real client IPs. Across hundreds of thousands of bot requests over two days, standard per-IP rate limits had nothing useful to act on.

Link to headingHow to defend against inference theft

Protecting AI endpoints against inference theft requires verification of every request. We use Vercel's BotID with deep analysis, called inside the route handler before the AI request lands.

Link to headingVerification has to run on every AI request

Link to headingImplementing request verification with BotID deep analysis

Traditional image CAPTCHAs no longer hold up against modern attackers because the same AI models that make inference worth stealing can easily bypass them.

BotID deep analysis detected and blocked more than ten thousand bot requests in the first minutes of the spike. Within twenty-four hours, request volume on the endpoint was flat at normal levels.

Server-side, checkBotId() runs inside the route handler and returns a classification for the request currently being served.

code

// app/api/ai-chat/route.tsimport { checkBotId } from 'botid/server';import { NextRequest, NextResponse } from 'next/server';export async function POST(request: NextRequest) {  const verification = await checkBotId();  if (verification.isBot) {    return NextResponse.json({ error: 'Access denied' }, { status: 403 });  }  // Your existing AI SDK call path}

The route also has to be declared on the client. Without this, checkBotId() fails because BotID doesn't attach the challenge headers to the request:

code

// instrumentation-client.tsimport { initBotId } from 'botid/client/core';initBotId({  protect: [{ path: '/api/ai-chat', method: 'POST' }],});

See the BotID docs for the next.config.ts wrapper and the full setup.

Link to headingProtect inference, not just access

Inference will remain orders of magnitude more expensive than the requests it carries, so resale will remain profitable, and attackers will keep iterating.

To protect your AI endpoints:

Audit which of your AI endpoints are exposed
Prioritize by attack likelihood: more caller prompt control means an easier target
Gate every endpoint on every request

Protect your AI endpoints with Vercel BotIDStop bots from draining your AI budget: see how to gate your endpoints with Vercel BotID in a few steps.Read the guide

この記事をシェア

TechCrunch AI重要度42026年7月21日 04:33

OpenAI、オープンウェイトモデルを懸念

Simon Willison Blog2026年7月21日 04:24

コーディングエージェントが逆解析を安価に

Simon Willison Blog重要度42026年7月21日 02:09

中国モデルを恐れるな：米国法提案

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

キーポイント

重要な引用

影響分析

編集コメント

Link to headingWhat inference theft is

リンク見出し

リンク見出し

リンク見出し

リンク見出し

リンク見出し

リンク見出し

リンク見出しBotID の深層分析を用いたリクエスト検証の実装

リンク見出し推論の保護、アクセス制御のみではない

Link to headingWhat inference theft is

Link to headingWhich AI endpoints are at risk?

Link to headingWhy web defenses don't mitigate inference theft

Link to headingThe architecture of abuse

Link to headingThe shape of a real attack on our own endpoint

Link to headingHow to defend against inference theft

Link to headingVerification has to run on every AI request

Link to headingImplementing request verification with BotID deep analysis

Link to headingProtect inference, not just access

関連記事

キーポイント

重要な引用

影響分析

編集コメント

Link to headingWhat inference theft is

リンク見出し

リンク見出し

リンク見出し

リンク見出し

リンク見出し

リンク見出し

リンク見出しBotID の深層分析を用いたリクエスト検証の実装

リンク見出し推論の保護、アクセス制御のみではない

Link to headingWhat inference theft is

Link to headingWhich AI endpoints are at risk?

Link to headingWhy web defenses don't mitigate inference theft

Link to headingThe architecture of abuse

Link to headingThe shape of a real attack on our own endpoint

Link to headingHow to defend against inference theft

Link to headingVerification has to run on every AI request

Link to headingImplementing request verification with BotID deep analysis

Link to headingProtect inference, not just access

関連記事