AIニュース最前線
最新ニュースAI日報Hacker日報週報動画AIツールトレンド企業

AIニュース最前線

世界中のAI最新情報を日本語で毎時更新

最新ニュース日報トレンド企業プレミアムRSS
© 2026 ainew.jp特定商取引法に基づく表記
ニュース一覧元記事を開く
TLDR AI·2026年6月3日 09:00·約6分で読める

大規模なAI推論の不正利用防止について(5分読了)

#LLM#推論盗難#セキュリティ#BotID#API ガバナンス
TL;DR

Vercel は、AI エンドポイントへの不正利用(推論盗難)が莫大な請求額を招くリスクを指摘し、セッションごとの認証ではなく全リクエストに対する BotID による深層分析の必要性を提唱している。

AI深層分析2026年6月4日 22:04
4
重要/ 5段階
深度40%
4
関連度30%
5
実用性20%
5
革新性10%
3

キーポイント

1

推論盗難の経済的インセンティブとリスク

HTTP リクエストは安価だが、フロンティアモデルへのプロンプトは高コストであり、攻撃者はこれを無料で利用してトークンを再販売することで莫大な利益を得ている。

2

従来の防御策の限界

セッションごとの認証やレート制限だけでは不十分で、一度チェックを行えば数千回の不正呼び出しにコストが分散され、実効性が薄れるため、全リクエストでの検証が必要となる。

3

BotID による包括的対策

Vercel はすべての AI リクエストを BotID で深層分析し、不正なボットやスクリプトを検知・ブロックする仕組みを導入しており、開発者も同様のコードで自社のエンドポイントを保護できる。

影響分析・編集コメントを表示

影響分析

本記事は、生成 AI の普及に伴い顕在化する「インフラコストの悪用」という新たな脅威を浮き彫りにし、開発者に対して従来のセキュリティモデル(セッションベース)の見直しを迫っている。Vercel が実装する BotID による全リクエスト検証のアプローチは、業界全体で標準化されるべき防御基準となり得る。

編集コメント

生成 AI の利用コストが爆発的に増加する中、セキュリティ対策も「認証」から「リクエストごとの挙動分析」へと進化させる必要があるという示唆に富む記事です。

5 min read

May 29, 2026

HTTP requests are inexpensive. Vercel charges ~$2/million, a fraction of a cent per call. But a single prompt to an agent on a frontier model can cost $2, making AI a million times more expensive, and inference theft one of the highest-margin businesses an attacker can run. We have seen this type of attack on our own APIs.

If you have AI endpoints exposed to the internet, the risk of abuse is high and can easily run up bills in the tens of thousands of dollars or more.

Protecting those endpoints requires verification to run on every AI request, not on the session or signup. Rate limits and auth walls aren't sufficient on their own because checks that run once per session get amortized away across thousands of stolen calls.

At Vercel, we gate every AI request through BotID deep analysis, and you can do the same on your own endpoints with a few lines of code.

Link to headingWhat inference theft is

Inference theft is the unauthorized use of someone else's paid AI inference, either for free consumption or downstream resale. The operator pays per AI call; the attacker pays nothing for inference and then resells the tokens at a discount. This goes beyond rate-limit abuse to actual resale of a stolen resource in a market.

Link to headingWhich AI endpoints are at risk?

Any internet-facing endpoint that gives a caller meaningful control over an LLM prompt is a target. The more general the endpoint, the higher the payout per stolen call.

AI playgrounds, like the AI SDK Playground, are the most dangerous shape because the caller has maximum control over the prompt, the model, and often the parameters. Stolen calls land cleanly into any standard client.

Support bots and documentation assistants are less exposed when system prompts are fixed server-side, but attackers have learned how to talk the models around system prompts cheaply enough to make resale viable.

Resale value tracks how easily the stolen calls can be dropped into a provider-compatible client.

Link to headingWhy web defenses don't mitigate inference theft

IP rate limits and auth walls were built to defend against attacks with dramatically lower per-call economics, where gaming IPs and accounts weren't worth the cost.

The payoff from stolen inference is high enough that attackers will procure residential proxy IPs by the thousands and register throwaway accounts at whatever scale it takes to defeat your gate. Rate limits get diluted across the fleet of IP addresses, and real accounts pass authentication.

Link to headingThe architecture of abuse

Sophisticated attackers wrap your custom AI endpoint in an OpenAI- or Anthropic-compatible adapter and fan calls out through residential proxies.

The adapter is the key component. It is a one-time engineering cost that presents the victim's idiosyncratic API as OpenAI- or Anthropic-compatible, so stolen inference can drop into any standard coding agent or SDK. Reselling at even five to ten percent of the list price, with zero marginal inference cost, can make for a generous-margin business.

A recent example is Chipotlai Max, a forked coding agent that ships with a proxy turning Chipotle's customer-support chatbot into an OpenAI-compatible endpoint. The project openly solicits help in porting the same inference-theft approach to Home Depot, Lowe's, Target, and Starbucks.

The adapter also serves as the session boundary for the attacker's downstream users. They authenticate to the adapter, not to your endpoint. By the time a call hits your API, it has already crossed the boundary you were planning to defend. The check has to run on the call the adapter proxies, not on the session it sits behind.

Link to headingThe shape of a real attack on our own endpoint

On April 12, 2026, traffic to the Vercel docs AI chat endpoint spiked to roughly ten times normal volume on Anthropic's Claude Haiku 4.5 model. Traffic rose to 1,300 requests per minute at peak, which would have translated to an inference cost run rate of over ten thousand dollars per day.

The attack came in through residential proxies that obscured the real client IPs. Across hundreds of thousands of bot requests over two days, standard per-IP rate limits had nothing useful to act on.

Link to headingHow to defend against inference theft

Protecting AI endpoints against inference theft requires verification of every request. We use Vercel's BotID with deep analysis, called inside the route handler before the AI request lands.

Link to headingVerification has to run on every AI request

If our gate had run at session start instead of per request, the attacker would have paid the bypass cost once and walked away with hundreds of thousands of stolen calls. Any check that runs per session amortizes the attacker's bypass cost across every subsequent inference call. Per-request gates force that ratio down to one, and even at high inference prices, defeating a check on every call isn't worth the cost.

This is where the cost asymmetry works in the defender's favor. Inference is the most expensive resource per call that the attacker steals, but verification is one of the cheapest protection costs per call.

Link to headingImplementing request verification with BotID deep analysis

Traditional image CAPTCHAs no longer hold up against modern attackers because the same AI models that make inference worth stealing can easily bypass them.

We deploy Vercel BotID on our AI endpoints, gating every request. BotID is an invisible CAPTCHA with deep analysis powered by Kasada that uses client-side machine learning to distinguish humans from bots without a visible challenge, so it can run on every request rather than only at session start.

BotID deep analysis detected and blocked more than ten thousand bot requests in the first minutes of the spike. Within twenty-four hours, request volume on the endpoint was flat at normal levels.

Server-side, checkBotId() runs inside the route handler and returns a classification for the request currently being served.

code
// app/api/ai-chat/route.tsimport { checkBotId } from 'botid/server';import { NextRequest, NextResponse } from 'next/server';export async function POST(request: NextRequest) {  const verification = await checkBotId();  if (verification.isBot) {    return NextResponse.json({ error: 'Access denied' }, { status: 403 });  }  // Your existing AI SDK call path}

The route also has to be declared on the client. Without this, checkBotId() fails because BotID doesn't attach the challenge headers to the request:

code
// instrumentation-client.tsimport { initBotId } from 'botid/client/core';initBotId({  protect: [{ path: '/api/ai-chat', method: 'POST' }],});

See the BotID docs for the next.config.ts wrapper and the full setup.

Link to headingProtect inference, not just access

Inference will remain orders of magnitude more expensive than the requests it carries, so resale will remain profitable, and attackers will keep iterating.

To protect your AI endpoints:

  • Audit which of your AI endpoints are exposed
  • Prioritize by attack likelihood: more caller prompt control means an easier target
  • Gate every endpoint on every request

Protect your AI endpoints with Vercel BotIDStop bots from draining your AI budget: see how to gate your endpoints with Vercel BotID in a few steps.Read the guide

この記事をシェア

関連記事

Latent Space★42026年6月5日 15:44

[AINews] 今日は何も大きな出来事はありませんでした

Anthropic が RSI の兆候を示し、OpenAI の ChatGPT が月間アクティブユーザー数で 10 億人を突破。SpaceX AI は IPO について説明しているが、最も重要なのは AIE WF のチケット確保とイベント参加である。

Ars Technica AI★42026年6月5日 05:44

ロシアのプロパガンダに抵抗する能力において最も優れた大規模言語モデルとは

エストニア言語研究所は、外国の敵対国が推進する危険なプロパガンダを拡散する懸念に対応するため、大規模言語モデルがロシア連邦の戦略的トピックに対して立場を取らない能力を評価する「プロパガンダ抵抗」ベンチマークを発表した。

AWS Machine Learning Blog★42026年6月5日 01:59

NVIDIA Nemotron 3 Ultra が Amazon SageMaker JumpStart で利用可能に

AWS は、推論速度を5倍向上させ、コストを最大30%削減する「NVIDIA Nemotron 3 Ultra」モデルを、Amazon SageMaker JumpStart でワンクリックデプロイ可能にしたと発表した。

ニュース一覧に戻る元記事を読む