The Register AI/ML·2026年4月26日 23:48·約10分

トークン最適化はAI戦略ではない

#LLM #コスト最適化 #ビジネス戦略 #AI倫理

TL;DR

The Register AI/MLの分析記事は、AI導入においてコストのみを重視する「トークンマキシマイズ」の思考停止を批判し、ビジネス要件への適合性を優先すべきだと主張している。

AI深層分析2026年4月27日 21:47

注目/ 5段階

深度40%

キーポイント

トークンマキシマイズへの批判

AIの価値を単に処理できるトークン数やコストで最大化しようとする「Tokenmaxxing」という戦略は、本質的なビジネス価値を見失う危険な思考停止である。

コストと適合性のバランス

AIの価格タグを確認する前に、その技術が自社の具体的なユースケースやビジネス要件に本当に適合しているかを評価することが重要である。

AI導入の社会的・企業的影響

AIのコスト構造と適用範囲は、企業の存亡を分け、ひいては社会のあり方を形作るため、安易な採用は避けるべきである。

影響分析・編集コメントを表示

影響分析

この記事は、現在のAIバブルの中で見られがちな「技術偏重」や「コスト偏重」のリスクを指摘しており、実務レベルでのAI導入判断基準を見直すきっかけとなる。企業は単に安いモデルを使うのではなく、ビジネスプロセスとの整合性を取った適切なAI活用を考える必要がある。

編集コメント

技術のポテンシャルだけでなく、ビジネス文脈における「適合性」を問うこの視点は、多くの企業が陥りやすい失敗を防ぐ上で極めて重要です。

AIのコストとは何か？これはシンプルでありながら重要な問いです。その答えは企業の運命を決定し、社会の形を規定するでしょう。しかし、追加の文脈なしには意味のある答えを出せない問いでもあります。

一つの可能性ある回答は「多すぎる」です。スタンフォード人間中心AI研究所（Stanford HAI）の2026年人工知能指数レポートによると、米国の民間部門におけるAIへの投資額は2025年に2859億ドルに達しましたレポート。この資金は経済的な利益をもたらす一方で、[環境資源、インフラ、地域社会]に大きな負担をかけています[https://www.lincolninst.edu/publications/land-lines-magazine/articles/land-water-impacts-data-centers/]。

レポートは次のように述べています。「AIデータセンターの電力容量は29.6GWに達し、ニューヨーク州のピーク需要と同等です。また、GPT-4o単体の推論処理における年間水使用量は、1200万人分の飲料水の必要量を上回る可能性があります。」

さらに、プロンプトという「スロットマシン」への過度な依存によりスキルが衰退したり、育たなかったりすることによる、人間の能力コストもあります。

しかし、これは短期間では測定しにくいものです。さらに、現在の米国行政が規制の緩和や公衆の懸念に関心を示さないことを踏まえると、政府と業界が[市民の不安]に直面せざるを得なくなるまで、財政的な細部に焦点を当てる方が容易かもしれません[https://sfstandard.com/2026/04/12/sam-altman-s-home-targeted-second-attack/]。

まず、トークンから始めましょう。これは現在、AIモデルの入力と出力を販売するための基本単位です。AnthropicやGitHubのようなプラン提供者が、トークン補助付きのサブスクリプションから従量制の使用へと顧客をシフトさせているため、AIサブスクリプションプランを利用する開発者にとって、トークンの価格は常に重要な関心事となっています。

法務スタートアップ「Iqidis」のAI部門責任者であり、Chocolate Milk CultというAIコミュニティグループの創設者でもある機械学習研究者のDevansh氏は、今年初めに公開された投稿の中で計算を行いました。その答えは、非常に特定の文脈において、100万トークンあたり約0.0038ドルでした。

これは、1時間あたり2.50ドルでレンタルされ、100%の利用率で毎秒185トークンを生成するNvidia H100 GPUでの推論（inference）の基礎コストです。

しかし、Devans氏が指摘するように、誰も100%の利用率で運用していません。利用率が30%の場合、価格は約0.013ドル/Mトークンとなり、10%の場合は約0.038ドル/Mトークンになります。

Anthropicは現在、最新のモデル「Opus 4.7」に対して、入力で100万トークンあたり5ドル、出力で25ドルを請求しています。Googleの「Gemma 4 26B A4B」の場合、執筆時点での加重平均入力価格はOpenRouterによると100万トークンあたり0.096ドルです。

異なるハードウェアで、異なる時期に価格設定され、異なるエネルギーコストをかけ、異なるモデルを用い、異なる利用率で計算を行えば、結果は異なります。

「ラボが提供するAPI単価だけを見れば、それがトークンコストの非常に良い指標になる」と、Devansh氏は電話インタビューで『The Register』に語った。「西部のラボにとっての話ですが」。

「一部の人はAnthropicが約50％の粗利益率を目指していると言いますが、実際にはトークン1つあたりのコストは多くの変数が一つにまとまったものです。モデル自体、その背後にある研究、そして人々が見えないモデルの継続的な更新があります。それらすべてを考慮に入れなければなりません。単一の呼び出しにおける推論コストだけをみるのは、実際にはシステムを見る良い方法ではありません」。

Devansh氏は、組織はトークンの特定のコストに焦点を当てる傾向がないと述べました。なぜなら、顧客が価値を感じるサービスを提供することに注力しているからです。

「多くの法的業務では、実際にはコストを顧客に転嫁できます。そして顧客は文句を言いません。なぜなら、何が行われ、どのように行われたかという透明性を求めているからです」と彼は語った。「つまり、コストを正当化できる限り、これにどれくらいかかるかという懸念は少なくなります。…価値を一貫して提供し続けることができる限り、コストの見積もりはそれほど心配するものではありません」

メタやShopifyのような企業が、トークン使用量を主要な業績指標（KPI）として位置づけ、注目を集めている。これに対し従業員たちはAIツールの過度な使用を通じて自身の価値を示そうとする対応を見せた。これはすぐに高額なコストにつながり、より意味のあるビジネス指標にはほとんど寄与しない可能性がある。

「トークン支出は生産性と直接相関するのでしょうか？」とデヴァンシュ氏は尋ねる。「全くありません。私はこの研究を非常に広範囲にわたって行ってきました。……以前はコードの行数や、入力した単語数といった他の無意味な生産性指標がありましたが、これはまさにその愚かさの最新形態です。中堅管理職は常に自分自身を正当化し、頭を使わずに人々をランク付けする方法を見つけようとするでしょう。」

しかしデヴァンシュ氏によれば、大規模言語モデル（LLM）の一つの問題は、それらをどのように適用するのが最善か私たちがまだ知らない点にあるという。したがって、何が機能し何が機能しないかを示す信号を提供する新しいワークフローを生み出す可能性を秘めているため、人々にトークンを使用することを促すこと自体に価値があるかもしれない。

AnthropicはClaudeを賢くしようとする際に、意図的に機能を制限したことを認める
Claude Opus 4.7が過度に熱心なクエリ修正者になったと開発者が苦情
米国防総省は自律型潜水艦を含むドローンプログラムを弱体化させたいと考えている
Microsoftは読みづらいメッセージを用いてRemote Desktopのセキュリティを強化

ITコンサルティング会社「Future Tech Enterprise」のCEO、ボブ・ベネロ氏は『The Register』に対し、同社はフォーチュン100企業をクライアントとすることが多く、それらの多くの企業が、達成したい内容を十分に検討せずに多額の資金を投入するAIプロジェクトを開始していると語った。

ベネロ氏は、同社がクライアントと関わる際の目標は、望まれるビジネス成果を特定することであり、その過程でAIが関与するかどうかは場合によるのだと述べた。

Future Techの最近のノースループ・グラマンとの取り組みにはAIが含まれていた。同IT企業は、国防企業がプロジェクトに関連するAIワークロードを運用できるよう支援するため、「Nvidia Enterprise AI Factory」の実装を手伝った。

ベネロ氏は、企業が自らの環境におけるAIの影響を評価し、ROI（投資対効果）を測定し、その技術がどのように有用であるかを発見することに苦戦していると指摘した。

「つまり、特に6ヶ月前と比べてコストが3倍になっている現状では、どこにお金を使い、どのような成果を得たいかを特定するために、多くの事前作業が必要となります」と彼は語った。そして、「Ramageddon」――AI計算ブームによるRAMの不足――を例に挙げた。

ヴェネロ氏は、OpenAIがサムスンやSKハイニックスからのメモリチップ購入を約束していることや、ミクロンなどのOEMメーカーが高帯域幅メモリへシフトしていることを、現在のRAM（Random Access Memory）危機の触媒として指摘する。彼は、すべてが高額化しているため、これはAI導入におけるROI（投資対効果）の計算を複雑にすると語った。

彼は、クラウドプロバイダーが従量制課金を提供することで支援できると述べつつも、それに対していくつかの懸念を示した。

「私はオフプレミスAI（オンプレミスではないAI）を大いに好むわけではありません。私たちの視点からは少し怖いのです」と彼は語った。

セキュリティ上の懸念はさておき、ヴェネロ氏は、大規模組織にとってクラウド依存による生産性のリスクが甚だ大きいと指摘した。彼はMicrosoft Office 365を例に挙げた。「Office 365がダウンしたことがないか？」と彼は尋ねた。「何度もあります。そして、そのような障害は非常に多く発生しています。」

彼は、クラウドのダウンタイムが1分あたり千ドルのコストを企業にもたらすのであれば、それは許容できるかもしれないと述べた。「しかし、1分あたり百万ドルの損失であれば、おそらく必要な制御措置について考える必要があるでしょう。それはおそらくオンプレミスソリューションです」と彼は語った。

AIは、レビューが不十分なコードのデプロイメントや、重いAI使用に起因するインフラストラクチャへの負荷を通じて、クラウドの安定性を悪化させている可能性がある。顧客はヴェネロ氏の言う通り、「間違いなくそれを目の当たりにしています。そして、そうでない場合でも、私たちは彼らにそれを教育しています」と彼は語った。

OpenClawの急激な人気によって引き起こされた容量上の課題を踏まえて、ベネロは「人々はこれを環境に投げ込み、奇妙なことが起こりました。したがって、リスクとそれに結びついた3つの異なるリスクの柱について、生態系レベルでの議論が行われる必要があります」と語った。

さらに彼は、ハイパースケール企業が品質を犠牲にして速度に焦点を当てることで問題の一因となっていると指摘した。「現在それは競争です。誰が勝ち、誰が最も多くのリソースを奪うか。そして誰もが全力で取り組んでいます。それが信じられないほどの混乱を引き起こしているのです」

「私たちが顧客に求めているのは、一歩引いて考えることです」と彼は述べた。「達成したい目標とその理由を見つめ直し、関連する投資と適切なタイムラインを確認し、それらの成果を測定することです」

AIを慎重かつ意図的にアプローチすることは、AIプロジェクトが本番環境へ移行する可能性を高めます。

ベネロは、彼が見てきた企業の中で、AIについて教育を受ける前のプロトタイプのうち実際にデプロイされるのはおそらく15％程度だと語った。しかし、ガイダンスがあれば、その数字は45〜50％になると彼は言う。

「それは非常にユースケース固有のことです」と彼は述べた。「そして、目指す成果を明確にし、それらの成果を測定すれば、あなたは成功します。もしそうでなければ、つまりAIそのもののためにAIを行っている場合、成功率は5％に留まるでしょう」

もしかすると、AIのコストを問うことが最初の質問であるべきではないのかもしれない。ある従業員がトークンを消費することで自分の価値を示すよう圧力を感じていることを指摘し、ヴェネロ氏は次のように述べている。「なぜなのか？そして、それらを何に使用しているのか？」という問いを立てるべきだ。®

原文を表示

What does AI cost? It's a simple question and an important one – the answer will determine the fate of companies and shape society. But it's also a question that can't be answered in a meaningful way without additional context.

One possible response is "too much." US private AI investment reached $285.9 billion in 2025, according to Stanford HAI's 2026 Artificial Intelligence Index Report. That money has economic benefits but also adds stress to environmental resources, utilities, and communities.

As the report states, "AI data center power capacity rose to 29.6 GW, comparable to New York state at peak demand, and annual GPT-4o inference water use alone may exceed the drinking water needs of 12 million people."

Then there's the cost to human competency, when skills atrophy or never develop due to overreliance on prompt slot machines.

But that's difficult to measure over a short period of time. And given the current US administration's disinterest in regulatory restraint or public concern, it's perhaps easier to focus on the financial minutiae until government and industry can be forced to reckon with civic unease.

You could start with the token, the basic unit for selling the input and output of AI models at the moment. The price of tokens has been much on the mind of developers using AI subscription plans because plan providers like Anthropic and GitHub have been pushing customers away from token-subsidized subscriptions toward pay-as-you-go consumption.

Devansh, a machine learning researcher, head of AI at legal startup Iqidis, and founder of an AI community group called the Chocolate Milk Cult, did the math in a post published earlier this year. The answer is about $0.0038 per million tokens – in a very specific context.

That's the base cost for inference on an Nvidia H100 GPU, rented at a cost of $2.50/hour and generating 185 tokens/second at 100 percent utilization.

But as Devansh observes, no one runs at 100 percent utilization. At 30 percent utilization, the price would be ~$0.013/M tokens; at 10 percent, it would be ~$0.038/M tokens.

Anthropic currently charges $5/M tokens (input) and $25/M tokens (output) for its latest model, Opus 4.7. For Google's Gemma 4 26B A4B, the weighted average input price at the time of writing is $0.096/M tokens, per OpenRouter.

If you run the numbers on different hardware, priced at a different time, with different energy costs, on different models, with different utilization, you'll get different results.

"If you were to just look at what the labs provide as the cost per API, it's a very good signal for what the token costs them, for the Western labs," Devansh told *The Register* in a phone interview.

"Some people say that Anthropic's trying to get about a fifty percent gross margin. But in reality what a token cost is actually many variables rolled into one. You have the model, you have the research behind the model, constant updates in the models that people don't see. So you have to factor all of those in. It's not just the cost of inference at one call, which is actually not a very good way to look at the system."

Devansh said organizations tend not to focus on the specific cost of tokens because they're focused on delivering a service that customers value.

"In a lot of legal work, you can actually pass costs along to your customer and the customers will not complain because they want to see transparency into what was done and how it was done," he said. "So from that perspective, there's less of a worry about how much this will cost as long as you can justify your costs. … As long as you're consistently able to deliver the value, I think forecasting costs are a little bit less worrisome."

Companies like Meta and Shopify have made headlines by treating token usage as a key performance indicator, and employees have answered the call by trying to signal their value through heavy use of AI tools. That can get expensive quickly and may not do much for more meaningful business metrics.

"Is token spend directly correlated with productivity?" said Devansh. "Absolutely not. I've done this research very extensively. … Before you used to have lines of code and other kinds of stupid productivity metrics, like how many words you typed. So this is just the latest in that era of stupidity. I think middle managers will always try to justify themselves and find a way they can rank people without having to apply their brains."

But one of the issues with LLMs, said Devansh, is we don't know how best to apply them. So there's potential value in just encouraging people to spend tokens in case they come up with new kinds of workflows that provide signals about what works and what doesn't.

Anthropic admits it dumbed down Claude when trying to make it smarter

Claude Opus 4.7 has turned into an overzealous query cop, devs complain

Pentagon wants to water down drone program with autonomous subs

Microsoft beefs up Remote Desktop security with ... hard-to-read messages

Bob Venero, CEO of IT consultancy Future Tech Enterprise, told *The Register* that his company tends to work with Fortune 100 clients, and that many of them have stood up AI projects that involve throwing around a lot of money without thinking through what they want to accomplish.

Venero said when his company engages with clients, the goal is to figure out the desired business outcome, which may or may not involve AI.

Future Tech's recent work with Northrop Grumman did involve AI – the IT biz helped implement an Nvidia Enterprise AI Factory to help the defense firm run AI workloads relevant to its projects.

Venero said that companies are struggling to assess the impact of AI in their environment, to measure ROI, and to discover how the technology may be useful.

"So there's a lot of pre-work that needs to be done to identify where they want to spend their money and what the outcome is going to be, especially when costs are 3x of what they were six months ago," he said, citing "Ramageddon" – the shortage of RAM due to the AI compute boom.

Venero points to OpenAI's commitment to purchase memory chips from Samsung and SK Hynix, and the shift of OEMs like Micron toward high-bandwidth memory, as catalysts for the current RAM crisis. That complicates ROI calculations for AI deployments, he said, because everything has become more expensive.

Cloud providers can help by offering consumption-based pricing, he said, but he has some reservations about that.

"I'm not a huge fan of off-prem AI," he said. "It's a little bit scary from our perspective."

Setting aside the security concerns, Venero said the productivity risk of cloud dependency is substantial for large organizations. He pointed to Microsoft Office 365. "Has Office 365 ever gone down?" he said. "Multiple times. And there are so many of those outages that happen."

If a cloud outage costs a company a thousand dollars per minute of downtime, he said, maybe that's acceptable. "If it's a million dollars a minute, you probably want to think about the controls that need to be in place, and that's probably an on-prem solution," he said.

AI may be making cloud stability worse, through the deployment of under-reviewed code and the infrastructure stress that has followed from heavy AI use. Customers, Venero said, "are absolutely seeing that. And when they're not, we're educating them."

In light of the capacity challenges created by the sudden popularity of OpenClaw, Venero said, "People threw this into their environment and it did crazy stuff. So there's definitely an ecosystem conversation that needs to happen about risk and the three different pillars of risk that are tied to it."

And, he said, the hyperscalers have contributed to the problem by focusing on speed at the expense of quality. "Right now it's a race. Who's going to win? Who's going to take the most? And everybody's throwing everything at it. And it's just causing this incredible turmoil."

"What we want our customers to do is step back," he said. "Take a look at what you want to accomplish and why. Look at the associated investments and the right timeline to do it and then measure those outcomes."

Approaching AI thoughtfully and deliberately makes it more likely for AI projects to make it into production.

Venero said that among the companies he's seen, prior to being educated about AI, maybe 15 percent of their prototypes would actually be deployed. With guidance, he said, that figure is more like 45 or 50 percent.

"It's very use-case-specific," he said. "And when you have the outcomes that you're trying to drive to and then you measure those outcomes, you will be successful. If you're not, if you're doing AI for the sake of AI, it's gonna be five percent."

Maybe asking what AI costs should not be the first question. Citing the pressure some employees feel to show their value by expending tokens, Venero said, the question should be, "Why? And what are you using them for?" ®

この記事をシェア

KDnuggets重要度42026年6月27日 00:00

Apple Silicon で MLX を用いた言語モデルのファインチューニング

The Zvi重要度42026年6月26日 23:51

ホワイトハウスが個別に GPT-5.6 のアクセス権をその場しのぎで決定する方針へ

AWS Machine Learning Blog重要度42026年6月26日 23:42

AWS を活用した保険仲介向けドメイン特化型 AI の先駆者、Cara の取り組み

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む