読み込み中…

TLDR AI·2026年6月4日 09:00·約3分

1 ドルあたりの知能（2 分読了）

#LLM #Cost Efficiency #Benchmarking #Microsoft #Claude

TL;DR

Microsoft がモデル評価に「平均トークン使用量」を追加したことで、AI の性能だけでなくコスト効率（知能対ドル）が業界の新たな標準指標となり、過剰なベンチマーク最適化や補助金依存の時代が終わる兆候を示している。

AI深層分析2026年6月5日 19:12

重要/ 5段階

深度40%

キーポイント

新評価指標の導入

Microsoft がモデルリリースカードに「平均トークン使用量」を追加し、性能とコストの両軸でモデルを評価する基準を確立した。

コスト効率の可視化

特定のベンチマーク（SWE-Bench Verified）において、Microsoft モデルは Claude Haiku 4.5 の約 1/3 のトークンで同等以上の性能を達成した。

業界の転換点

この動きは、補助金依存やトークン数最大化（tokenmaxxing）に頼る時代が終わり、実用的なコスト対効果（Intelligence Per Dollar）が最優先される時代への移行を示唆している。

企業の予算制約

Uber や Salesforce などの大手企業が AI 支出の抑制や採用凍結に踏み切っている背景には、最先端モデルをあらゆる用途で維持するコスト負担が限界に達している現実がある。

パフォーマンスとコストの二軸評価

Microsoft が導入した「平均トークン使用量」指標により、ベンチマークは従来の性能だけでなく、その知能を得るためのコストも同時に測定されるようになった。

AI 普及における予算制約の現実化

Uber や Salesforce の事例に見られるように、大手企業ですら最先端モデルをあらゆる用途に使用できるほど潤沢な予算を持っていないため、コスト効率への転換が不可避となっている。

業界全体での「結果単価」へのシフト

モデル開発層だけでなくアプリケーション層も競争し始め、トークン数ではなく「クローズされたチケット」や「完了した PR」といった具体的な成果あたりのコストで価格設定が行われるようになる。

重要な引用

Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.

This is yet another sign that the era of subsidies, tokenmaxxing, & all-out performance for many use cases is over.

This new dual benchmark answers the buyer's only question : what is my intelligence per dollar?

Every layer in the stack now has to price the same way the customer thinks : per result, not per token.

影響分析・編集コメントを表示

影響分析

この記事は、AI 業界が技術的な性能競争から、経済的な持続可能性とコスト効率を重視する成熟期へと移行したことを示す重要な転換点です。企業にとって、単に高性能なモデルを選ぶだけでなく、運用コスト（トークン使用量）を最適化することが競争優位性の源泉となる時代が到来しました。

編集コメント

AI の普及において、性能の向上だけでなく「いかに安く高性能を発揮できるか」が次の重要な課題となっています。この指標の標準化は、開発者にとってモデル選定の新たな指針となるでしょう。

昨日、Microsoft はモデルリリースカードに新しい指標を追加しました。これはおそらく標準となるでしょう。

平均トークン使用量です。

最初の行では、Microsoft のモデルは Claude Haiku 4.5 が消費するトークンの約 3 分の1を使用しながら、SWE-Bench Verified で 71.6 を達成しています。

ベンチマークは現在、全体的なパフォーマンスと、その知能を達成するためのコストという2つの異なる次元で測定されています。

これは、多くのユースケースにおいて補助金2、トークン最大化3、そして全力でのパフォーマンス追求の時代が終焉したことを示す、さらなる兆候です。

世界でも最も価値のある企業でさえ、あらゆる考えられるユースケースに対して最先端の知能を維持することはできません。Uber は4ヶ月で予算を使い果たした後、従業員の AI 支出に上限を設けました。Salesforce は Anthropic のトークンに3億ドルを投じており、エンジニアの採用を凍結しています。

この新しい二軸ベンチマークは、購入者が唯一問うている答えを示します：「1 ドルあたりの知能はいくらか？」

Artificial Analysis はすでにこれをベンチマークしています。GPT 5.5 と Claude Opus 4.8 は、Intelligence Index（知能指数）において互いに1ポイント以内の位置にあり、約60です。このインデックスを実行するコストは、GPT 5.5 で3,357ドル、Opus 4.8 で4,685ドルです。同じ答えですが、Opus 4.8 は40%高価です。

モデル企業は今や両方の次元で競争しなければなりません。アプリケーション層はさらに一段高いレベル、つまり「1 つの解決済みチケット、リリースされた PR（プルリクエスト）、または解決されたサポートケースが実際にいくらかかるか」という成果あたりのドル数で競い合います。

スタックのすべての層は、今や顧客が考えるように同じ方法で価格を設定する必要があります：トークン数ではなく、結果に対してです。

MAI-Code-1-Flash の登場 — Microsoft は、リリースカードに平均トークン使用量を記載した新しいコーディングモデルを発表しました。↩︎

持続不可能な補助金 — AI 補助金の時代は終わりを迎えています。↩︎

トークンマックスxing — 追加のトークンを使ってベンチマークを操作するモデルは、その優位性を失いつつあります。↩︎

Microsoft が Claude Code ライセンスをキャンセルし、開発者を GitHub Copilot CLI へ移行 — エンジニアリングでの使用量が予算を超えたため、Microsoft は Experiences and Devices division（Windows, Microsoft 365, Outlook, Teams, Surface）全体で Claude Code のライセンスをキャンセルしました。↩︎

Uber が 4 ヶ月間で予算を使い果たした後、従業員の AI 支出に上限を設定 — Uber は 4 ヶ月間で予算を使い果たした後に、従業員の AI 支出に上限を設定しました。↩︎

Salesforce が AI に 3 億ドルを投じ、エンジニアリング採用を凍結 — Salesforce は AI に 3 億ドルを投じ、エンジニアリングの採用を凍結しました。↩︎

AI モデル & API プロバイダー分析 — AI モデルのコストに関する独立した分析です。↩︎

原文を表示

Yesterday Microsoft added a new metric to a model release card, one that will likely become a standard.1

Average token usage.

In the first row, the Microsoft model hits 71.6 on SWE-Bench Verified using about a third of the tokens Claude Haiku 4.5 burns.

Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.

This is yet another sign that the era of subsidies2, tokenmaxxing3, & all-out performance for many use cases is over.

Even the most valuable companies in the world cannot afford state-of-the-art intelligence for every conceivable use case.4 Uber capped employee AI spending after blowing through its budget in four months.5 Salesforce is spending $300M on Anthropic tokens & has frozen engineering hires.6

This new dual benchmark answers the buyer’s only question : what is my intelligence per dollar?

Artificial Analysis already benchmarks this.7 GPT 5.5 & Claude Opus 4.8 land within a point of each other on the Intelligence Index, around 60. Running the index costs $3,357 on GPT 5.5 & $4,685 on Opus 4.8. Same answer, 40% more expensive.

Model companies must now compete on both dimensions. The application layer will compete one level up, on dollars per outcome, what a closed ticket, a shipped PR, or a resolved support case actually costs.

Every layer in the stack now has to price the same way the customer thinks : per result, not per token.

Introducing MAI-Code-1-Flash — Microsoft announces a new coding model with average token usage on the release card. ↩︎

The Unsustainable Subsidy — The era of AI subsidies is ending. ↩︎

Tokenmaxxing — Models that game benchmarks with extra tokens are losing their edge. ↩︎

Microsoft cancels Claude Code licenses, shifting developers to GitHub Copilot CLI — Microsoft cancelled Claude Code licenses across its Experiences and Devices division (Windows, Microsoft 365, Outlook, Teams, Surface) after engineering usage outran budgets. ↩︎

Uber caps employee AI spending after blowing through budget in 4 months — Uber caps employee AI spending after blowing through budget in four months. ↩︎

Salesforce Spends $300M on AI, Freezes Engineering Hires — Salesforce Spends $300M on AI, Freezes Engineering Hires. ↩︎

AI Model & API Providers Analysis — Independent analysis of AI model costs. ↩︎

この記事をシェア

AWS Machine Learning Blog重要度42026年7月21日 01:58

Couchbase、Amazon Bedrock で多モデル AI 基盤を構築

TLDR AI2026年7月20日 09:00

Fable5とGPT-5.6のNP困難問題比較

Simon Willison Blog重要度42026年7月18日 15:00

Claude、Fable 5 を全プランに恒久追加

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

TLDR AI·2026年6月4日 09:00·約3分

1 ドルあたりの知能（2 分読了）

#LLM #Cost Efficiency #Benchmarking #Microsoft #Claude

TL;DR

AI深層分析2026年6月5日 19:12

重要/ 5段階

深度40%

キーポイント

新評価指標の導入

Microsoft がモデルリリースカードに「平均トークン使用量」を追加し、性能とコストの両軸でモデルを評価する基準を確立した。

コスト効率の可視化

特定のベンチマーク（SWE-Bench Verified）において、Microsoft モデルは Claude Haiku 4.5 の約 1/3 のトークンで同等以上の性能を達成した。

業界の転換点

企業の予算制約

パフォーマンスとコストの二軸評価

AI 普及における予算制約の現実化

業界全体での「結果単価」へのシフト

重要な引用

Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.

This is yet another sign that the era of subsidies, tokenmaxxing, & all-out performance for many use cases is over.

This new dual benchmark answers the buyer's only question : what is my intelligence per dollar?

Every layer in the stack now has to price the same way the customer thinks : per result, not per token.

影響分析・編集コメントを表示

影響分析

編集コメント

昨日、Microsoft はモデルリリースカードに新しい指標を追加しました。これはおそらく標準となるでしょう。

平均トークン使用量です。

最初の行では、Microsoft のモデルは Claude Haiku 4.5 が消費するトークンの約 3 分の1を使用しながら、SWE-Bench Verified で 71.6 を達成しています。

ベンチマークは現在、全体的なパフォーマンスと、その知能を達成するためのコストという2つの異なる次元で測定されています。

この新しい二軸ベンチマークは、購入者が唯一問うている答えを示します：「1 ドルあたりの知能はいくらか？」

スタックのすべての層は、今や顧客が考えるように同じ方法で価格を設定する必要があります：トークン数ではなく、結果に対してです。

MAI-Code-1-Flash の登場 — Microsoft は、リリースカードに平均トークン使用量を記載した新しいコーディングモデルを発表しました。↩︎

持続不可能な補助金 — AI 補助金の時代は終わりを迎えています。↩︎

トークンマックスxing — 追加のトークンを使ってベンチマークを操作するモデルは、その優位性を失いつつあります。↩︎

Microsoft が Claude Code ライセンスをキャンセルし、開発者を GitHub Copilot CLI へ移行 — エンジニアリングでの使用量が予算を超えたため、Microsoft は Experiences and Devices division（Windows, Microsoft 365, Outlook, Teams, Surface）全体で Claude Code のライセンスをキャンセルしました。↩︎

Uber が 4 ヶ月間で予算を使い果たした後、従業員の AI 支出に上限を設定 — Uber は 4 ヶ月間で予算を使い果たした後に、従業員の AI 支出に上限を設定しました。↩︎

Salesforce が AI に 3 億ドルを投じ、エンジニアリング採用を凍結 — Salesforce は AI に 3 億ドルを投じ、エンジニアリングの採用を凍結しました。↩︎

AI モデル & API プロバイダー分析 — AI モデルのコストに関する独立した分析です。↩︎

原文を表示

Yesterday Microsoft added a new metric to a model release card, one that will likely become a standard.1

Average token usage.

In the first row, the Microsoft model hits 71.6 on SWE-Bench Verified using about a third of the tokens Claude Haiku 4.5 burns.

Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.

This is yet another sign that the era of subsidies2, tokenmaxxing3, & all-out performance for many use cases is over.

This new dual benchmark answers the buyer’s only question : what is my intelligence per dollar?

Every layer in the stack now has to price the same way the customer thinks : per result, not per token.

Introducing MAI-Code-1-Flash — Microsoft announces a new coding model with average token usage on the release card. ↩︎

The Unsustainable Subsidy — The era of AI subsidies is ending. ↩︎

Tokenmaxxing — Models that game benchmarks with extra tokens are losing their edge. ↩︎

Microsoft cancels Claude Code licenses, shifting developers to GitHub Copilot CLI — Microsoft cancelled Claude Code licenses across its Experiences and Devices division (Windows, Microsoft 365, Outlook, Teams, Surface) after engineering usage outran budgets. ↩︎

Uber caps employee AI spending after blowing through budget in 4 months — Uber caps employee AI spending after blowing through budget in four months. ↩︎