読み込み中…

The Verge AI·2026年6月8日 23:00·約54分

マイクロソフト AI 最高責任者、超知能は近いが職を奪わないと発言

#Superintelligence #LLM #Frontier Models #OpenAI #Microsoft

TL;DR

Microsoft AI の CEO Mustafa Suleyman は、OpenAI との新たな契約により独立して超知能（Superintelligence）の開発を進める体制を整備中であり、これは業界における技術覇権争いの決定的な転換点となる。

AI深層分析2026年6月9日 00:02

重要/ 5段階

深度40%

キーポイント

OpenAI との戦略的関係再定義

過去15〜18ヶ月かけて OpenAI との関係を再構築し、昨年の契約更新により、同社モデルのライセンス購入を継続しつつも、Microsoft 独自で超知能を開発する権利と体制を確立した。

独自超知能チームの編成

10月以降、先端的なモデル（frontier models）を訓練するための十分な規模を持つ計算クラスターを構築し、超知能開発に特化したチームを組成している。

市場と社会への姿勢

AI に対する世論の悪化や政治的反発に対し、Microsoft は消費者製品の質の高さでこれを克服する方針を示しており、Claude の意識に関する議論などにも言及している。

重要な引用

culminated in a new contract that we got done in October of last year

freeing us up to be able to pursue superintelligence independently as well as keep buying and licensing their models

I've been assembling the Superintelligence team, building clusters of sufficient scale to train frontier models

影響分析・編集コメントを表示

影響分析

このニュースは、AI 覇権争いにおいて Microsoft が単なる OpenAI の依存先から、同等かそれ以上の独自技術を持つ競合・パートナーへと進化することを示しています。特に「超知能」の独立開発へのコミットメントは、今後数年間の AI インフラ競争とモデル性能の行方を左右する重要な転換点となります。

編集コメント

Microsoft が OpenAI との関係を「依存」から「独立した共闘・競合」へとシフトさせた点は、業界構造を根本から変える可能性を秘めています。特に独自で超知能を開発する意欲は、今後の技術競争の行方を占う上で極めて重要な指標です。

今日は Microsoft AI の CEO、ムスタファ・スレイマン氏とお話しします。実は今日のイントロは短くしようと思っています——動画でお分かりいただける通り、今週は妻の実家の農場から仕事をしているのですが、それにしても今回は非常に刺激的なエピソードです。

ムスタファ氏の新しいモデルのトレーニングへのアプローチから、Anthropic 社が Claude を意識があるかのように語る点に対する批判まで、あらゆる話題をカバーしました。もちろん、Microsoft と OpenAI の関係性についてや、現在の AI に関する否定的な世論調査や政治的な反発に対してムスタファ氏がどう考えているか、またそれを打ち破るのに十分な消費者向け製品が存在するかどうかについても議論しました。

先ほども言った通り、これは非常に刺激的な内容です。

それでは、Microsoft AI の CEO、ムスタファ・スレイマン氏。始めましょう。

*本インタビューは、長さの調整と明瞭さのために軽微に編集されています。

ムスタファ・スレイマン氏、あなたは Microsoft AI の CEO です。*Decoder*へようこそ。

またお話しできて光栄です。

あなたとお話しできることを大変楽しみにしています。私たちのこれまでの対話の中で、AI について、それが私たちにどのような感情を抱かせるべきか、そしてその目的は何かというテーマでの会話は、私がこれまで行ったすべての会話の中でも特に好きなものの一つでした。

マイクロソフトではいくつかの大きな変化があり、特に AI に対する人々の感じ方についての非常に重要な再文脈化についてお話ししたいと考えています。そして、マイクロソフトの主要な開発者カンファレンスである「Microsoft Build」では、多くの新発表が行われました、またコンピュータの本来あるべき姿や、おそらくどこにあるべきかについての大きなアイデアも数多く示されました。これらについても詳しく掘り下げていきます。

まず最初から始めましょう。これはその後のすべての内容を理解する上で重要な、深い「デコーダー（Decoder）」に関する話です。マイクロソフトに入社されて以来、AI の仕組みを再構築されました。あなたの役割も変化しました。前回お話ししたときは、一連の消費者向け製品の責任者でした。しかしその後、その担当は一旦脇に置かれました。現在は新しいモデルの訓練を行っており、最前線に立っています。

現在のマイクロソフト AI の構造と、マイクロソフト内部での構造について説明してください。

おそらく過去 15〜18 ヶ月ほど、私たちは OpenAI との関係再構築という旅を続けてきました。その過程には少し時間がかかりました。しかし、昨年の 10 月に締結した新しい契約によって、この旅は集約されたと考えています。その契約には多くの条項が含まれており、パートナーシップの確立と拡大も含まれていましたが、最も重要なのは、スーパーインテリジェンス（超知能）を独立して追求できる道を開くと同時に、同社のモデルを購入・ライセンスし続ける権利も維持した点です。**

10 月以来、私はスーパーインテリジェンスチームを編成し、フロンティアモデルの訓練に十分な規模のクラスターを構築し、スーパーインテリジェンスに焦点を当てたチームを採用してきました。これは私たちにとって非常に大きな転換点でした。なぜなら、これにより私はスーパーインテリジェンスのミッションに集中できるようになったからです。その結果、今週 Build で発表したいくつかの成果につながりました。私たちはすべてのモダリティ（※多様なデータ形式や処理モード）を含む 7 つの新モデルを発表しました。これはかなり大きな転換であり、長期間にわたる計画の結果であり、今後数年間で絶対的なフロンティアを追求する立場になったことは私たちにとって大きな安堵です。

Microsoft に採用された時点でこれが計画でしたか？

過去 18 ヶ月間、確かにそれが計画でした。OpenAI との関係は多くの浮き沈みを経験してきました。多くの点で、これは歴史上最も成功したパートナーシップの一つとして記録されることになるでしょう。OpenAI にとっても Microsoft にとっても素晴らしい関係であり、すべての良い関係は進化します。これは私たちの進化における次の段階だと考えています。

その進化について具体的に伺います。私たちは皆、Elon Musk と OpenAI、そして Sam Altman の間の裁判を直近で見ました。Microsoft はその裁判に関与していました。具体的には、時々 Microsoft の弁護士が立ち上がり、「私たちは当時そこにいませんでした」と言うのです。すると誰かが「はい」と答え、それで一件落着となります。

しかし明らかに、その裁判で明らかになったこと、そしてこの間ずっと明確だったことは、当初の考えは OpenAI が研究機関としてモデルを提供し、Microsoft が製品を構築するというものだったということです。Microsoft には市場への展開に関する専門知識があり、エンタープライズ分野での専門知識もあり、消費者向け市場でもさまざまな方法で地位を取り戻そうとしていました。これはプラットフォームの転換点であり、研究業務は OpenAI で完了し、製品開発は Microsoft の内部で行われる予定でした。

これが変化した点です：OpenAI はより多くの消費者向け製品を作りたいと考えていました。明らかに、あなたの新しい役割と新たな焦点を踏まえると、Microsoft も自社のモデルをますます作りたがっています。なぜ分かれたのでしょうか？その関係で何が機能しなかったのでしょうか？

つまり、OpenAI は非常に野心に満ちた創設チームとサム・アルトマン氏によって率いられていると思います。そのため、彼らがより多くの支持を得て莫大な収益を上げるにつれて、フルスタックで事業を展開する機会を見出したのは自然な成り行きでした。単に消費者向け製品の開発を始めたというだけでなく、もちろん ChatGPT は非常に成功しました。また、自社データセンターの構築にも着手し、独自のチップの開発も始めました。彼ら自身の消費者向けハードウェアデバイスに関する噂が飛び交っています。ChatGPT Enterprise を通じてモデルを直接市場へ展開し始めています。つまり、過去 2〜4 年にかけて、研究の枠を超えてスタック全体にわたって事業範囲を大幅に広げてきたのです。同様に、マイクロソフトについても同じことが言えます。このパートナーシップはすでに 5〜6 年が経過しており、今後さらに 4〜6 年は継続する見込みです。

同様に、私たちは世界有数のテクノロジー企業の一つです。世界の上位 500 社中 493 社が、データの保存と処理の大部分を当社のシステムに依存し、Azure や M365、Teams を利用しています。人々は往々にして、私たちの規模がいかに巨大で、エンタープライズ市場における展開力がどれほど大きいかを過小評価しているように思えます。したがって、長期的には、つまり 5〜10 年というスパンで考えれば、持続可能性を完全に確保し、単に他社の知的財産を受け取ってわずかな修正を加え、自社の製品に組み込んで生産化するだけの存在ではなく、自力で立ち上がり、世界クラスのモデルを創出できる存在である必要があります。

つまり、スーパーインテリジェンスはもうすぐ到来します。私はそれが目前にあると考えています。そして、それは歴史上最も価値のある技術になるでしょう。長期的に、第三者に知的財産を提供し続けることに構造的に依存する道はないはずです。

したがって、これは明らかに OpenAI などが取締役会の問題を抱えた際に引き起こされた転換点です。しかし、私が参入し、私のチームが加わったことで、私たちはその構築を開始し、現在はその転換期にあります。私は私たちが非常に良い立場にあると考えています。なぜなら、OpenAI（この件で驚くほど素晴らしい成果を上げてきたと私は思います）のためにも、私たち自身のためにも、比較的安定した、慎重な、長期的に最適なポジションを取ることができるからです。

スーパーインテリジェンスについてもう少し時間を割きたいです。今は一旦その話題に留めておき、ここでさらに一巡して転換の仕組みを理解したいだけです。

裁判の過程で、マイクロソフト CEO のサティア・ナデラ氏から非常に面白いメッセージがありました。彼は「私はインテルになりたくなく、OpenAI がマイクロソフトになるような状況にはしたくない」と述べています。これは、マイクロソフト CEO 自身が「私は提供者になりたくなく、彼らが価値を提供し、すべての価値を収集するプラットフォームになることを望まない。そして私たちは置き換えられるかもしれない」と言っている文脈では非常に皮肉です。「ChatGPT が Azure で動作し、OpenAI がすべての価値を得て、その後私たちが置き換えられるかもしれない」という状況は、過去に Windows とインテルの間で起こったことと同じです。

それは気づきですか？ナデラ氏からあなたに話がありましたか？「ああ、OpenAI は取締役会の問題を抱えている。私たちはフロンティアに戻り、自力で立つ必要がある」と言ったあの会議はどのようなものでしたか？その会話はどのように行われ、その決定はどのように下されたのでしょうか？

もちろん、それはサティアの決断であり、エイミー、ブラッド、そして会社内の多くの他の人々の決断でもあります。しかし、私は何でもそうですが、これはゆっくりと進行する変化だと考えています。会社が私たちが進んでいる方向に少しの微調整や修正が必要であることを認識し始めたからです。そのため、11 月の取締役会での出来事よりもずっと以前からこの動きはあり、直接競合しているさまざまな戦線の状況や、そこから生じる緊張関係を見つめるにつれて、時間とともに蓄積されていったのだと思います。また、そのようなパートナーシップが永遠に続くものではないことも理解しています。

OpenAI は 1 兆ドル規模の公開企業になりたいと考えており、驚異的な収益を上げて急成長しています。彼らは運営の自由を持ち、あらゆる場所からコンピューティングリソースを購入し、自社のコンピューティングインフラを構築し、誰とでもパートナーシップを結ぶことを望んでいます。したがって、この契約は、企業の規模やスケール、ニーズのバランスなどが非常に異なっていた時期に結ばれたものです。その時点では合理的でしたが、その後、これは私たちが自ら所有し、制御し、自社の顧客に対して正しく対応できるものでなければならないことがはっきりしてきました。

先ほど申し上げた通り、私たちは企業向けにおいて信じられないほどの展開力を有しており、これは世界に比類なきものだと考えています。したがって、顧客のために最良のものを作り続ける必要があります。これは、ChatGPTを通じて消費者向けと企業向けの両方を同時に最適化してきた企業とは少し異なるアプローチになります。さらに、スーパーインテリジェンスという根本的な科学ミッションにも取り組んでおり、これには多数の異なる方向性が含まれています。これらの方向性は互いに重複する部分もありますが、消費者向けや企業向けの方向性に対しては直交していると言えるかもしれません。当然ながら、パートナーシップはこのようにして進化し、定期的に再設定されていくものだと考えています。

しかし、フロンティアモデルを構築するのは非常に高額なコストがかかるそうです。信頼できる情報によると、これは非常に高価なプロジェクトです。いつかマイクロソフトの CFO であるアミー・ハッドが「予算は確保された」と言わなければなりません。それはいつのことだったのでしょうか？単なるテキストメッセージでしたか？それとも会議が行われたのでしょうか？その具体的な状況について教えてください。

私としては、昨年の前半に我々は概ね意思決定を行ったと考えています。これは明らかにすべての契約交渉に影響を与え、その後すべてが解決され、10 月に署名されました。これは重要な投資ですが、私たちはそれを達成するための十分な時間を有しています。つまり、すでに自給自足ミッションに対しては相当な投資を行ってきているのです。

当社の Maia 200 チップは、例えば非常に優れたチップです。現在、自社クラスター内で GB200 よりも 30% 安価なチップを製造・出荷できるようになりました。さらに、このチップと自社のモデルを共同設計できるようになったことで、先ほど発表した MAI-Thinking-1 モデルは、Maia 200 で実行することで得られる 30% の性能向上に加え、タスク用にモデルを共最適化した場合に、ワットあたりのパフォーマンスが 1.4 倍改善されることを実現しています。

つまり、最も重要なユースケース（明らかにエージェント型コーディング、開発者、企業です）において、自社のスタックを所有・制御し、エンドツーエンドで全体の共同設計努力を主導することの価値は、今後数年間で私たちが投資しなければならないコストに見合う利益をもたらすことが明白です。

「自立したミッション」とおっしゃいましたが、それは非常に丁寧な表現で、ご自身で立ち上がりたい、自分自身の道を歩みたいという意味です。マイクロソフト内部には、ある発言について論争があるとの話を伺っています。私の同僚ヘイデン・フィールドが Build に関する記事で記した一節**。私はそれを読み上げます。これはヘイデンの言葉です。素晴らしい一節です。彼女はこう言いました。「今年の Microsoft Build は、Instagram で『渇望トラップ（注：注目を集めるための魅力的な投稿）』を投稿するばかりの、新しく離婚した女性のような雰囲気だった」と。」

解散は完了し、今こそ力を示す時です。これが私たちの新しいモデルです。私たちは自力で立ち上がります。あなたは「最先端のモデルを構築し、主要な研究機関と競争する」とおっしゃっていますが、マイクロソフト社内では「自分たちで自由に活動できる」という感覚があるのでしょうか？

決してありません。全くそうではありません。もちろん、それは魅力的な見出しであり面白い表現ですが、現実は私たちが OpenAI と今後何年も続くパートナーシップを結んでいることです。実際、2030 年以降も継続する計画です。彼らは依然として世界で最も優れたモデルを生産しています。GPT-5.5 は傑出したモデルです。Codex やこれから登場するサイバーセキュリティ向けモデル（cybersecurity models）は素晴らしく、私たちが行う業務の大部分を動かす原動力となっています。

したがって、当然のことながらこの関係は継続していきます。私はこれがそのようなパートナーシップにおける自然な流れだと考えています。何か不審なことや驚くべきことではありません。OpenAI もこれを非常に理解し、支持しています。彼らは明らかに急速に成長した企業であり、私たちも自らのアジェンダを追求しなければならないことを理解しているからです。したがって、これは極めて通常の事態です。

では、もう一つの「Decoder」の質問をさせていただきます。その後、Build での発表事項、特にスーパーインテリジェンス（superintelligence）についてお話ししたいと考えています。

前回お話しした際、AI の進展速度を考慮し、意思決定のフレームワークは 6 ヶ月サイクルで運用していると仰っていました。当時はその考えが妥当でした。状況も落ち着き、あるいは焦点がより明確になったのかもしれません。現在の意思決定のフレームワークはどうなっていますか？

我们还是沿用同样的周期节奏。每个周期结束时，我们会进行一次为期一周的线下聚会。我坚信这一点，尽管我们目前仍保持着每周四天在办公室办公的文化。事实上，下下周，我的整个超级智能团队将在波士顿进行为期四天的线下集会。这次会议将全面回顾 Build 项目的进展、我们的收获、哪些地方做得不够好、需要改进之处，以及下一个周期的规划——这一次周期将延长至八周，随后再进行一周的线下聚会，全年计划已就此全部制定完毕。因此，整个组织都清楚这是我们运作的节奏。

我认为强调这一时间框架实际上非常重要，因为季度规划往往会变得有些模糊和抽象。我认为六到八周（具体取决于在日历中的位置）实际上是制定非常清晰、可巩固的任务的最佳时长。

したがって、私たちはこの 6〜8 ヶ月のサイクルのリズムに加えて、スクワッド（小隊）という形態でも運営しています。スクワッドとは、特定のミッションに焦点を当てた多様な専門分野が混在するサブグループであり、必ずしもマネージャーの階層構造に属しているわけではありません。実際には、DRI（直接責任者）によって運営され、この DRI は多くの場合 IC（個人貢献者）が務め、その役割は—

これは「直接責任者」と「個人貢献者」を指します。

はい、まさにその通りです。ありがとうございます。私たちは、マネージャーの役割と、特定のミッションを実行する DRI の役割を分離するというアプローチを採用したと考えています。それは、優れた DRI であることが非常に過酷だからです。24 時間年中無休で全精力を注ぎ込み、可能な限り最大限のパフォーマンスを発揮し続ける必要があります。一方、マネージャーの役割は、多くの場合コーチとして振る舞い、サポートを提供し、ガイダンスやフィードバックを与え、あらゆる障害を取り除き、人々のキャリア成長を支援することに重点が置かれます。そのため、これらの役割を分離することで、DRIs を 2〜3 のサイクルごとにローテーションさせることが可能となり、一部のメンバーは異なるポジションに挑戦したり、ローテーションを経験したりできます。これは非常に柔軟な構造であり、私たちが素早く対応できることを可能にする素晴らしい仕組みだと考えています。

Build について話しましょう。まずはスーパーインテリジェンスから始めたいと思います。あなたはこれまで何度もこの言葉に触れてきました。私はちょうど Google IO に出席したばかりですが、かつてあなたの同僚だったデミス・ハサビスは、基調講演の締めくくりで「私たちは特異点のふもとにあり、Google の全パワーを備えた AGI（汎用人工知能）が近づいている」と述べています。**

あなたはスーパーインテリジェンスはすでに到来したと言っています。これらはすべて同じものを指しているのでしょうか？AGI を説明するために異なる用語を使っているのでしょうか？違いはあるのでしょうか？あなたの文脈における「スーパーインテリジェンス」の定義と、デミスの言う「特異点」との違いをどのようにお考えですか？

もちろん、私が「すでに到来した」と言ったわけではありません。「近づいている」と言いました。これらの表現には多くの流動性があると思います。しかし、私たちが明確に観察できるのは、現在すべてのモダリティ（多様なデータ形式）において対数線形なヒルクライミングが進行しているということです。つまり、適用する計算能力の各桁ごとの増加、データの各増分、そしてベンチマークでの向上の間には非常に直接的な関係があります。ここでいうベンチマークとは、公開されたものも内部のものも含まれ、強化学習環境において私たちが焦点を当てる目標です。これは非常に重要な観察結果です。

私が皆さんもしているだろうと考える予測について、一部の人が懐疑的だったり疑問を投げかけたりする理由も理解できますが、それらはこれらのモデルの性能向上に関する 10 年以上にわたる実証的な観察に基づいた非常に根拠のあるものです。つまり、本質的に同じ汎用アーキテクチャにおいて、計算量は 12 オーダーの桁分増加し、15 年間で FLOPS（浮動小数点演算回数）が 1 兆倍に増大しました。そして実際には、音声、画像、テキスト、コード、および多くの他の時系列予測タスクにおいて機能してきました。したがって、計算量のさらなるオーダーの桁数の増加によって、他の環境内でもこの対数線形的な方法で継続的に上昇し続けることができるだろうと、私たちは基本的に外挿しているのです。

そしてそれは、既存のデータから単に外挿するだけでなく、私たちが知らないことを教示し、新たな発見をもたらすようなモデルを訓練できるのかという問いを生み出します。次に、それらが自己改善する能力を持ち、どの仮説を設定すべきか、どの仮説を追及すべきかを決定するプロセスを加速し、それぞれの仮説に対するトレーニングデータをどのように生成するか、それらを新しい実行にどう組み込むか、あるいは実際のアーキテクチャ自体においてさえも革新を起こすことができるのかという問いが生じます。

したがって、この複合的な進歩を認識するためには、両方の条件が満たされる必要があると考えますが、計算リソースの次の数桁分の拡大を適用するだけで、私たちは引き続き劇的な成果を得ていくでしょう。おそらく多くのタスクにおいて人間の性能と同等に達することになるはずです。これは過去 6 ヶ月間にコーディング分野で実際に確認された現象と同じです。

コーディングは非常に興味深い分野です。なぜなら、検証が容易だからです。コードを書き、コンピュータに実行させれば、実行されるか失敗するかのどちらかになります。もちろんセキュリティに関するいくつかの欠点も見てきました。これらの欠点は明白であり、コーディングセキュリティに対する規制アプローチがさまざまな形で展開されているのを私たちは目撃しています。おそらく私の携帯電話やパソコンでも、セキュリティ上の大惨事につながるようなコードを「雰囲気」で書いてしまったことがあるかもしれません。しかし、それは私が引き受けるリスクなのかもしれません。

他のすべての機能はそれほど簡単には見えません。私はいつも法律分野を取り上げますが、それは私の専門背景があるからです。しかし、裁判官はコンピュータがコードを検証する方式のように法的な文章を検証しません。もし間違えた場合、裁判官はあなたを投獄することさえあります。おそらく遭遇しうる最悪の出力検証エラーと言えるでしょう。

コーディング分野で効果性を測定できるほど容易に、他の分野全体での効果性をどのように測定できるでしょうか？なぜなら、この点こそが、コーディングから他の分野への比喩や類推が非常に早く崩壊する場所だからです。

私はそうは思いません。コーディングについては、コードの正しい実行を検証できます。実行されるか、クラッシュするかのどちらかです。しかし、そこには非常に多くのニュアンスがあります。書かれたコードの質が本当に重要なのです：その拡張性、再構成可能性、実用性がどうであるか。単にコードの一部が動くというだけでなく、DevOps や SRE として本番環境で実際にそのモデルがどのようにして自分が書いたコードに戻り、それを実用的かつ有用な方法で使用するかという点も重要です。

そして当然のことながら、生成された出力の質を評価する必要があります。高品質で機能するコードであっても、それが本当にあなたが望んでいたアプリやウェブサイトなのかどうかです。そこには美的判断や商業的判断が伴います。検証不可能な報酬を内部化することの課題は、コードにも存在します。コードはまだ主に検証可能な報酬信号であるにもかかわらずです。もう一つ観察すべき点は、チャットもまた非検証可能な空間であるということです。しかし私たちは、非常に強力な実世界の利用による相互作用を通じて、ほぼ人間レベルのパフォーマンスまで到達することに成功しました。

待ってください。とても興味深いです。どうやってチャットの人間レベルのパフォーマンスを測定するのですか？

はい、私は多くの人が人間レベルのパフォーマンスを持つ AI と長く意味のある対話をしていると考えています。その品質は非常に優れており、高い感情知能を備えています。全体的に非常に正確で、ハルシネーション（幻覚）も最小限に抑えられています。バイアスについては以前ほど頻繁には議論されていません。これは現実世界の観察に基づいています。多くの人の評価基準によれば、すでに幅広いタスクにおいて対話における人間レベルのパフォーマンスに達していると考えられます。

あなたの評価基準は何ですか？もちろん、多くの人々の評価基準もそうですか？私はこれのほとんどに同意しませんが、それは私の基準です。では、あなたの基準は何ですか？

私の基準は、アシスタントに声をかけて、Teams やメールで起こったすべての会話の要約、ドキュメントへの更新情報を毎日まとめて提供するよう依頼したときのことです。すると、次に取るべきアクションの一覧を含む統合された要約が得られます。これは私のチーフ・オブ・スタッフが生み出すものよりも基本的に優れています。つまり、要約、分析、提案されるアクション、そしてチャットにおいて人間レベルのパフォーマンスに達していると言えます。

毎日何百万人もの人が、感情的なサポート、カウンセリング、療法、コーチング、アドバイスのためにこれを利用しています。これはすべてのチャットボットの中で最も人気のあるユースケースの一つであり、この主張を裏付ける非常に堅牢な指標だと言えるでしょう。

私は、あなたがこの問題について多くの時間を費やし、特にこれらのチャットボットとの感情的なつながりについて深く考えてきたことを知っています。これらはあなたが構築し、展開した製品です。

私は、メールやタスクリストを要約し、何を優先すべきかについての概要を提供する点で非常に優れたものと、危機にある人に対する感情的なコーチとして機能するものの間に、かなり大きな区別を設けます。

これらは類似したタスクではありません。人間においても、必ずしも同種の知能とは限りません。リスト作成が得意で、感情的なサポートが苦手な人を知っています。それらすべてをどのように脳内で統合し、「さて、これはチャットにおいて広範に人間レベルのパフォーマンスだ」と言うのでしょうか？

チャットを、AI がその一方の当事者として関与する双方向のやり取りと定義し、それが何らかの目標を広く満たすものだとした場合、スポーツの結果を知りたい場合や、どのレストランに行くかについてのアドバイス、執筆したエッセイに対するコーチングとフィードバック、次にどの仕事を受けるべきかの提案、あるいは上司との難しい会話への準備などについて学びたい場合に該当します。回答を受け取り、行き来し、5 回から 6 回のやり取りを経て、専門家に頼ったり、友人に相談したり、場合によってはコーチに支払ったりする必要があるような有用な出力を得ることができます。

客観的・経験的に言えば、毎日何億人もの人々がこれらのチャットボットからそのような体験をしています。それが技術的に人間レベルの性能を代表するかどうかについて議論の余地があるかもしれません。しかし、それを主張することはかなり妥当なことだと私は考えます。

これが上昇し続ける理由は何もないでしょう？過去3年間の上昇率が、私が最も驚くべき点だと考えています。そして今から先は、この上昇の根本的な駆動力——計算資源（compute）、データ、実世界のユーザーからのインタラクション——を抽出しようとしています。これらの要素は継続する見込みです。

これらはチャット、感情的なサポート、生産性向上といった分野だけでなく、医療、教育現場でのライブプロダクション展開、家庭をますます管理するアシスタントなど、多くの他のドメインにも適用されると考えています。要するに、あなたの日常生活にあるあらゆるものを対象とし、あなたをもっと生産的にするための方向性は、おそらく継続していくでしょう。

あなたは今、依然として同じ基本アーキテクチャであるトランスフォーマー（transformers）とアテンション（attention）について言及されました。私たちは15年間このアーキテクチャに計算資源を適用し続けており、大きな進歩を遂げています。あなたは非常にユニークな立場にあります。

Build で、最初のフラグシップ推論モデルである MAI-Thinking-1 を発表されました。ゼロから始められたわけですが、このモデルをアーキテクチャ設計しトレーニングする 15 年の経験を経て、何かこれまでと異なるアプローチを取られたのでしょうか？それとも単に「データを集めて、以前と同じようにトレーニングを実行すればよい」という考えで、計算リソースが増えた分だけ性能が向上するというものなのでしょうか？

いいえ、実際にはかなり多くの違いがあります。まず言えるのは、データのキュレーションの方法です。私たちはスタックの最上位から始め、基本的に高品質で非常に保守的なデータセットを購入・取得し、そのデータに関連するノイズや気晴らし要素、低品質な部分、潜在的なセキュリティリスクを多く排除しました。このための手法は、実はかなり独自のものであると考えています。私たちは先ほど 109 ページにわたる非常に詳細な技術報告書を共有しましたが、これは Twitter で非常に好意的に受け止められ、どのようにこれを行っているかという多くの詳細を共有しています。二つ目の点は、アーキテクチャの選択については慎重であることが重要だと考えており、私たちもそのようにしてきた一方で、トレーニングの実行方法を組み立てる際に、いくつかのかなり大きな変化を加えた点があると考えています。

私たちのトレーニング実行は非常に安定しており、クラッシュや再起動の回数は極めて少ないです。インフラの安定性と MFU（モデルフロップス利用率）効率を示すために、多くのグラフを共有しました。これは基本的に、トレーニング実行の各ステップで、各チップに最新の FLOPS 数を投入できることを示しています。私はこれが非常に誤解されやすいと考えており、さまざまなラボから物事がどのように失敗するかという話をよく耳にします。

実際、高品質なモデルを生産するために、慎重かつ意図的な選択を行い、適切なアプローチを採用することは非常に困難です。なぜなら、私たちの仕事と野心は、この「ヒルクライミングマシン」を構築しようとするものだからです。つまり、シリコンとモデルの統合、超高品質なデータ、そして RLE（Reinforcement Learning Environments：強化学習環境）のスタックとの統合であり、これにより私たちは、選択したあらゆる目的に対して体系的にヒルクライミングを行うことが可能になります。

それが MAI-Thinking-1 です。これは汎用性があり、比較的中立な思考モデルで、コーディングにも非常に優れています。現在、ベンチマークでは Opus 4.6 とほぼ同等の性能を有しています。まだ大規模な生産環境への展開は行っていないため、そこにはさらに多くの作業が残されています。しかし、これは極めて強力な推論能力を持つモデルであり、推論性能の主要指標である AIME（American Invitational Mathematics Examination：アメリカ数学オリンピック）ではベンチマーク上で 97 パーセントというスコアを記録しています。

指示の追従においては非常に優れており、その目標は基本的に、この能力を多くの開発者や企業に利用可能にし、彼らが自社のユースケースのためにこれを利用できるようにすることです。各社には、自社のユースケースをサポートするエージェントなどを構築しようとする、それぞれがわずかに異なる目的があります。

MAI-Thinking-1 について語る際に指摘されたことのひとつは、既存のモデルから知識蒸留（distillation）を行わなかった点ですが、これは私にとって驚きでした。これは実現可能なことです。あなたは OpenAI の知的財産権にアクセスできます。誰もがあらゆるものを蒸化しています。この試行で判明したばかりのことですが、Grok は複数のモデルから蒸化されたものです。なぜここで蒸化を行わないのでしょうか？なぜ先へ進まないのでしょうか？

フロンティアへの近道は確かに多く存在します。もし超高品質なモデルを持ち、それを高品質な指示や回答、あるいは上位モデルからの出力でベースモデルを磨き上げれば、そのモデルがその分布に素早く適合することは事実です。しかし、その後、その教師モデルを上回る能力を持つことができるのかについては、非常に不透明です。

したがって、私たちは2つの理由から非常に意図的に行動してきました。第一の理由は、今後数年間でフロンティアを自ら設定するために、必ずや教師モデルを超えることを確実にしたいからです。第二の理由は、偉大な研究機関の一つを構築したいという強い願いであり、それにはおそらく今後2〜3年かかるでしょう。

しかし、それを実現するためには、すべてのコンポーネントを自前で構築できることを示さなければなりません。私たちは世界最高の人材を採用し、他社からの再実装やコピー、蒸留に頼るのではなく、実際の研究を通じてフロンティアを押し広げることができます。

私たちは、フロンティアを超える領域では Anthropic のモデルを購入するリソースがあり、Foundry には最大 11,000 種類の異なるモデルを搭載できるリソースがあるため、非常に優れた立場にあります。これにより、すべての開発者が純粋な選択肢を得られるのです。もちろん、現在もフロンティアに位置し、明らかに卓越した OpenAI のモデルを継続して展開するリソースも備えています。

これは自律化ミッションの自然な一部であり、真に絶対的なフロンティアに至るまでには時間がかかりますが、私たちは非常に優れた立場にあります。多くの進展を遂げました。これは非常に強力なモデルですが、リリースしたのはこのモデルだけではありません。同時に 7 つの新モデルを公開しました。

例えば、私たちがトランスクリプトしたモデルである MAI-Transcribe-1.5 は、文字通り世界でナンバーワンです。あらゆるハイパースケラーの中で最もコストパフォーマンスに優れ、精度も最高レベルにあります。画像モデルでは現在 2 位、画像編集モデルでは Google や OpenAI の直後に続く 3 位です。画像や音声の分野では、私たちは十分に上位に位置していると考えています。コードモデルである CodeFlash は非常に強力であり、VS Code に最適化されており、Sonnet 4.6 と同等の優れたモデルです。つまり、現時点で非常に素晴らしいポジションにあります。

知識蒸留（distillation）において法的または知的財産権に関する懸念はありましたか？これは世界中で現在進行中の問題です：Anthropic は他社が自社のモデルを蒸留していることを不満に思っています。中国企業がモデルを蒸留することへの懸念や、既存の知的財産権契約がそれをカバーできるかどうかについても議論されています。そのような懸念が、あなた方をこの行為から遠ざける要因となりましたか？

いいえ、私たちはそのような懸念はありませんでしたが、多くの人がなぜイライラするのかは理解できます。Anthropic は非常に不満を抱いており、xAI や Meta に関する噂、そして明らかにオープンソースモデルなどについても同様の不満があります。本質的には、これは別のチームが構築した知的財産や知識を奪い取り、それを自社のモデルに強制的に注入する行為だからです。短期的な勝利にはなるかもしれませんが、先ほど述べたように、私たちは研究所内で次なる大きな思考のブレークスルー、あるいは次なる大きなコーディングの突破、さらには次なる大きなアーキテクチャへの推進を実現できる文化を築きたいと考えています。

現在、私たちはループドトランスフォーマー（looped transformer）を実験中であり、これは現在のトランスフォーマーのわずかに異なるバリアントです。この分野の多くの人々もこれに注目しています。まだ誰も本格的な生産環境への導入には至っていないようですが、フロンティアを本当に押し広げる文化とチームを構築するためには、必要な時にフルスタックを理解し、所有し、創造できると同時に、必要に応じてサードパーティの製品も活用できることが不可欠です。例えば、私たちの論文は他の文献に基づいた数百件の引用を含んでおり、これは長年にわたり優れた出版物から学んだすべてのことに対する還元として、分野への大きな貢献となっています。

お尋ねしてもよろしいでしょうか — あなたが AI 業界の同業者やアントロピック（Anthropic）からの蒸留（distillation）に関する不満を理解されているなら、AI 企業がこれらのモデルを作るために彼らの作品を集団的にスクレイピングすることに対するクリエイター、出版社、YouTuber の不満も理解されていますか？なぜなら、その不満はますます大きくなっているからです。

はい。いいえ、その不満は理解できます。オープンウェブの課題は以前にも議論したものであり、私もそれを理解しており、人々が不満を抱いているのも見ています。当然ながら、この問題は法廷での議論にも影響を与えています。人々はオンライン上にコンテンツを投稿しますが、それらを公開することに対する契約内容について異なる期待を持っていたようです。これは非常に難しい問題です。

あなたはすべてのデータが慎重にキュレーションされているとおっしゃいましたが、新しいモデルのトレーニングに使用しているデータのすべてにお支払いされたのですか？

私たちのデータの多くは、当然ながら通常の形でオープンウェブから取得しています。慎重にキュレーションするとは、セキュリティ、品質、一部のオープンソースデータセットからのサードパーティ依存関係について極めて注意深くフィルタリングすることを意味し、中国の系統（ラインジ）を避けることです。私はこれらが非常に異なると考えています。私たちの企業顧客は、何かを生産環境に投入する際に、私たちが彼らのニーズを本当に念頭に置いて構築したと信頼できることを確認したいと考えています。そして、これは非常に意図的であり、忍耐強く、すべての詳細に注意を払うことの利点の一つだと私は思います。

あなたは企業について言及されましたが、これは非常に興味深いと思います。マイクロソフトは実際、大規模な形でエンタープライズ AI に全神経を注いでいます。私は、新しい Xbox 社の頭であるアシャ・シャルマ氏にまで直接線を引くこともできます。彼女は多くの場所で AI を排除する方針を打ち出しており、ゲーマーたちは喜んでおられますよね？消費者空間における AI への反応は一つありますが、企業空間では別の反応があります。AI は、AI のように急速に変化するものの中で、製品と市場の適合性（product-market fit）が最も近い状態にあるのは企業領域だと私は考えます。企業が管理しているデータベースがいくつかあり、それらにアクセスできるからです。なぜなら、それらを管理しているのは彼らだからです。それが彼らのデータなのです。

反復可能なプロセスやタスク、そしてモデルがより効率的に処理できるかもしれない古いシステムが存在します。企業にとって非常に重要なことが起こっています。同時に、AI に対する消費者の反感は増大する一方です。私の主張は、私たちは優れた消費者向け AI プロダクトを構築していないということです。この業界はそのようなものを生み出しておらず、普及させてもいません。これがすべて価値あるものであることを明確に示すことができていないのです。オープンウェブからのすべてのデータを使用し、大衆への出版契約を変更して、現在では企業に数兆ドルの価値をもたらすモデルのトレーニングに利用されているという事実に対して、これが価値あるものであると示すプロダクトが存在しません。

再び、サティア・ナデラ氏は最近 Axios とのインタビューで、「これには社会的な許可が必要です。そして、それが得られるまで、その価値を提供するまでは、人々はこのような感情を抱き続けるでしょう」と述べています。私たちは、大学での講演者がブーイングされたのを見てきました。データセンターが禁止される事態も目撃しています。トレーニングに関する不安や、データセンターに関する不安に値する消費者向け製品があると思いますか？

それがあなたの焦点でした; 今やその焦点は企業です。表面的に見る限り、マイクロソフトがもはや消費者向け製品に関心を持っているようには見えません。しかし、価値のあるもの、あるいは構築可能なものはあるとお考えですか？

消費者にとって何の価値もないというご意見には賛成できません。すべてのチャットボットを通じて、毎月数十億の人々が莫大な価値を得ています。

今少しの間、小規模な事業主や、子供の宿題を手伝っているお母さんの立場に立って共感してみてください。彼らは会話型 AI に頼るだけで、フィードバックを受けたり、指示を得たり、エッセイの課題を設定したりできます。「どうやって収益を上げるのか？」「キャッシュフロー予測はどう立てるか？」「どの大学に応募すべきか？」といった質問ができるようになるのです。

つまり、これらは高品質な事実に基づくアドバイスや情報をもたらす日常的なタスクです。だから、人々がこれらのものから利益を得ていないという主張には私はあまり賛成できません。彼らは確かに利益を得ていると思います。

私は、彼らが十分な利益を得られていないと明確に主張できると思いますよね？

わかりました。

「データセンターをこれ以上作るべきではない」と言っているのは彼らです。 卒業式で AI をブーイングするのも彼らです。世論調査は明確です、特に若者において：AI を使うほど、それに対する反感が強くなるという傾向がすべての調査で明らかになっています。私が主張しているのは、価値がないということではなく、価値の交換関係が十分に明確ではないということです。

なるほど、その通りですね。

特に Microsoft が、Google を踊らせるような Bing の再発明といった大規模な検索製品から離れ、エンタープライズ（企業向け）へと舵を切っているように見えます。それはもう終わりで、私たちは今、価値があるエンタープライズ分野に集中しています。ただ、消費者にとってこれらすべてが worthwhile となるだけの十分な価値があるのか疑問です。

理解できる理由から、多くの不安があると思います。今後5年から10年の間に何が起きるかについて、膨大な量の憶測が飛び交っています。それが特異点として語られようとも、あるいは職の終焉として語られようとも、これらの枠組みは有益ではありません。人々が恐れているのは、これが曖昧に定義されており、しばしば人々の頭上を覆う避けられない脅威の灰色の雲として描かれるからです。

重要なのは、私たちが技術とどう向き合うかです。私は長年、人間を最優先すべきだと主張してきました。分野内の一部の人々は、科学発見を最優先したり、銀河系などを探索できる知能の加速化を最優先したりし、私たち全員を合わせたものよりも強力なAIが必ず誕生すると述べています。つまり、それは人々にとって自然に怖いことなのです。

そして、私たちは基本的にその考え方を逆転させる必要があります。科学と技術の目的は、私たちがより健康で、より賢く、より幸せになるためのものであるべきです。これは人類が何千年もの間、発明を通じて追求してきた探求であり、超知能に対しても再び課すべき試練です。もしそれがこの試練を達成できないのであれば、人々はそれを拒絶するでしょうし、その拒絶は正当なものだと私は考えます。

私は、今後5年間で人々の関心がすべて、「これがどのようにして私の健康を改善し、より幸せにし、賢くし、能力を高め、生産性を向上させるか」という点に集まるようになると思います。もしそれが実現しないなら、当然ながら人々は怒り、抵抗し、反応するでしょう。そのことに驚きや問題があるとは思いませんが、それは避けられないことだと考えています。

だからこそ、私が長年情熱を注いできた分野の一つが医療です。そして数日前に、メイヨークリニックとの新たなパートナーシップを発表しました。これは世界で常にトップと報告されている病院です。あらゆるモダリティにおいて、最も質の高い縦断的な患者記録データセットを持っています。また、最高の臨床実践を行っています。

彼らは非営利団体であり、その患者人口の65％がメディケイド（米国政府による低所得者向け医療保険）を利用していることを、多くの人が知らないと思います。人々はしばしば、世界中から国際的な超エリート層が最良の治療を受けるために飛んでくる場所だと連想しますが、実際には患者の過半数がメディケイドを利用しています。彼らはどこでも最良の医療を提供するという信じられないほどの使命を持つ素晴らしい機関です。そして今、私たちは彼らのデータと私たちのモデルを用いて、ゼロから新しいヘルスケアのための基盤モデルを共同で訓練し、その病院に展開し、世界中に広げて、可能な限り多くの人々に最良の臨床ケアと医療を提供できるようになることを願っています。

だからこそ、私はこの分野に入ったのです。それが私がもともと動機づけられた理由であり、私が情熱を注いでいることです。そして、私が影響を与え、人々を助け、皆のために良い遺産を残すと考えられることだけに集中できるのです。それが私たちが取り組もうとしていることです。

ありがとうございます。医療という枠組みに感謝していますし、なぜこれが誰もが最初に思い浮かべる選択肢なのか理解できます。特にアメリカの医療において、もしそれをわずか 10% でも改善できれば、多くの人々の人生を非常に深い形で変えることになるでしょう。

実は、あなたとは全く異なり、はるかに攻撃的なアプローチをとっている非常に賢い人物を知っています。その人物こそが、4 ヶ月前のあなた自身です。ムスタファ・スレイマン Financial Times に対してこのように述べています：「ホワイトカラーの仕事、つまりコンピューターの前で座って弁護士や会計士、プロジェクトマネージャー、あるいはマーケティング担当として働く場合、これらのタスクの多くは、今後 12 か月から 18 ヶ月のうちに AI によって完全に自動化されるでしょう。」

それは 4 ヶ月前の話です。つまり、1 年後には弁護士、会計士、プロジェクトマネージャー、そしてマーケティング担当者の仕事はなくなるということです。彼らの職務は自動化されるのです。それでもなお、そのタイムラインは変わっていないのでしょうか？

いいえ、いいえ、いいえ。ちょっと待ってください。先ほどの引用の中で私が言ったのは「タスク」です。「仕事」とは言っていません。これは非常に重要な区別です。労働経済学では、組織内の役割や機能のサブコンポーネントに関する完全な分類体系が存在します。メールを送る、同僚と会話する、パワーポイント資料を作成するといったサブタスクは、ますますデジタル化・自動化され、私たちは基本的にそれらをより多く生成できるようになります。

それが必ずしもその役割がなくなることを意味するわけではありません。それは単に、作業がより速く、より効率的に行えるようになるということです。今日では多くの場合、こうした作業は非常に反復的で、手作業が多く、労働集約的であり、時間がかかるものです。そして技術の自然な進歩は、私たちの生活をより楽にし、より速くし、摩擦を減らし、よりシームレスにする方向に進むべきです。誰もがよく不満を言うように、それはあなたや私、そして他の人々をずっと忙しくさせてしまいました。

実際には、私たちはより利用可能になり、よりストレスを抱え、より多くの情報を得ることになりました。したがって、効率化に対する逆効果（リベンジ・エフェクト）は常に存在します。これは人々が忘れがちですが、私たちは狭い範囲の管理的な単純作業に費やす時間を減らすことで、はるかに生産性が高まる可能性が非常に高いです。その代わり、創造性や判断力を要する活動により多くの時間を割く必要があり、それが最終的により大きな価値を生み出すことになります。

私たちはもはるはるかに迅速に実験を行うことができます。そのため、実行コストが低下するにつれて、並行して多くの試みを試すことが可能になります。私の考えでは、これは全体的なものの質を高める可能性が高いです。なぜなら、ジャーナリズムでもビジネスでも、私たちが行うあらゆる分野において、より多くの仮説を試すことになるからです。

私はその発言が、仕事とタスクの間の自然な誤解によって文脈から少し切り離されたように感じられますが、それでもあなたは私に反論して、「では、5 年、10 年、あるいは 15 年後の状況はどうなるのか」と言うことができるでしょう。そしてそこで私は、私たちは再び戻らなければならないと考えています。

実際には、そのようにあなたに反論するつもりはありません。私は非常に具体的な方法で反論します。そして、これはあなたの引用であり、誤解されたと言っていることを理解しています。私はこの文字通りの文句だけを眺めているだけで、タスクとサブタスクの区別はどこにもありません。それは「ホワイトカラーの仕事」です。

例として、弁護士、会計士、プロジェクトマネージャー、マーケティング担当者が挙げられ、その後あなたは「これらのタスクの多くは、今後 12 か月から 18 ヶ月のうちに AI によって完全に自動化される」と言いました。そこにはサブタスクの区別はありません。あなたは、弁護士のほとんどがその仕事を完全に自動化され、法律の実践も 1 年以内に全く異なるものになるだろうと言っています。それは、その引用自体の言葉によってもそうです。

そして私はただ言っているのです、そのタイムラインは依然として有効ですか？エージェントが以前私たちが行っていたすべてのことを実行するようになることで、弁護士という職業の姿が全く異なるものになるというあの見解です。

さて、ほとんどのタスクとは、全体の職務を完了するためにあなたが行う作業を指します。私はこれが、より人間らしい側面や、判断力を要する業務の部分に集中できる余地を生むと考えています。ここには非常に重要な区別があります。……「職業・役割」はより広範なカテゴリであり、「タスク」とはその構成要素です。これは労働市場経済学の文献において、数十年にわたって確立された定義です。

おそらくそれは『フィナンシャル・タイムズ』にとってさえも微妙すぎたかもしれませんが、 nevertheless、それが意図したところです。さて、私は重要な問いがあると思います：では、長期的にはこれが私たちにどのような位置づけをもたらすのでしょうか？そしてこれは、ますます多くのこの種の事柄において困難を伴うことになるでしょう……数年か、10 年か、あるいは 20 年かというタイムラインについて細かく議論することはできますが、現実は私たちが行うすべてのこと、つまりこの仕事の多く、タスク、職業、役割、活動などを、ますます自動化していくことになります。

したがって、重要になるのはこれらの技術の周りに設けるガバナンスです。誰に対して責任を負うのか？誰が所有するのか？人々にとって実際に役立つようにし、調整と摩擦を導入して規制するフィードバックループとは何か？私は数ヶ月前に「ヒューマニスト型スーパーインテリジェンス」に関する論文を書きましたが、そこでは基本的に北極星となるべきもの、あるいは完全なフレームワークというよりは原則のセットを明確に示しました。それは「技術は私たちを支えるためにある」というものです。これが私たちが技術に課すべきテストです。人々が技術に課してきたテストでもあります。マイクロソフトが最も重視するテストでもあります。

私は、ますます多くの人がこの問いに真摯に取り組まざるを得なくなると思います。なぜなら、これらは莫大な善をもたらすからです。私たちはそれが引き続きそうあり続けることを望みますが、移行期間中に理不尽なほどの不安定さを引き起こさないような方法で実現することを望んでいます。

あなたの言うことは信じます。あなたがこの問題について長く考えてきたことも知っています。しかし、私の聴衆が私に答えてほしいと常に求めている通りに回答します。その姿とは、業界全体——あなたを含めて全員——が「すべての仕事を奪う」という方針に全振りし、大規模なデータセンターの建設を劇的に加速させ、大きな約束に対して多くの資源を要求しているという点です。

政治的な反発があり、現在はすべての立場が軟化しています。そして、「すべての仕事がなくなるわけではない」「仕事を再考する必要がある」とおっしゃることは、業界の他の CEO たちも同様のことを語り、医療について言及していること（これも今や毎回必ず話題に上ります）と一貫したものです。この政治的な反発が、実際にお話し方を変えたのかどうか気になります。

AI には単なるマーケティング上の問題があるだけだと考える同業者が多くいます。つまり、効果的に伝えきれておらず、AI の恩恵をより効果的に伝えるために podcasts に数億ドルを投じるべきだという意見です。これは実際にこの業界で起こっていることです

原文を表示

Today I’m talking with Mustafa Suleyman, the CEO of Microsoft AI. And I’m actually going to keep today’s intro short — I’m working from my wife’s family farm this week, as you’ll see in the video, but also this is a real burner of an episode.

We covered everything from Mustafa’s approach to training new models to his criticisms of Anthropic talking about Claude as though it is conscious. Of course, we also talked about Microsoft’s relationship with OpenAI, how Mustafa is thinking about all the negative polling and political pushback around AI right now, and whether any of the consumer products are good enough to overcome it.

Like I said, it’s a burner.

Okay: Mustafa Suleyman, CEO of Microsoft AI. Here we go.

*This interview has been lightly edited for length and clarity.*

Mustafa Suleyman, you are the CEO of Microsoft AI. Welcome back to *Decoder*.

Great to be with you again.

I’m very excited to talk to you. Our previous conversation was one of my favorite conversations — about AI, how it should make us feel, and what it’s for — that I’ve had in all the conversations we’ve had.

There are some big changes at Microsoft, maybe some very important recontextualization about how people feel about AI that I want to talk to you about in particular. And then there’s Microsoft Build, the big Microsoft developer conference, which featured lots of new announcements and lots of big ideas about what computers are for and maybe where they should be that I want to get into.

Let’s start at the very start. This is some deep *Decoder *stuff that is important to understand before all the rest of it. Since you joined Microsoft, you have restructured how AI works there. Your role has changed. The last time I talked to you, you were in charge of a bunch of consumer products. That has since been set aside. You’re now training new models; you’re on the frontier.

Explain how Microsoft AI is structured now and how it’s structured inside Microsoft.

I guess the last 15 to 18 months or so we’ve been on this journey to reestablish our relationship with OpenAI, and it’s taken a minute. I think it culminated in a new contract that we got done in October of last year. And there were lots and lots of different provisions in that, including cementing and extending the partnership, but crucially freeing us up to be able to pursue superintelligence independently as well as keep buying and licensing their models.

So since October, I’ve been assembling the Superintelligence team, building clusters of sufficient scale to train frontier models, and hiring a team focused on superintelligence. And so that was quite a big shift for us because it sort of enabled me to focus just on the superintelligence mission, and that has then culminated in a few things that we announced this week at Build. We have seven new models across all the modalities and so on. So it’s been a pretty big shift, and I think a long time in the planning, and a great relief for us to now be in the game and pursuing the absolute frontier over the next few years.

Was this the plan when you were hired at Microsoft?

It’s certainly been the plan for the last 18 months. I mean, I think the relationship with OpenAI has gone through lots of ups and downs. And in many ways, I think it is going to go down as one of the most successful partnerships in history. It’s been great for OpenAI, and it’s been great for Microsoft, and all good relationships evolve, and I think this is just the next stage in our evolution.

Let me ask you about that evolution specifically. We all just saw the trial between Elon Musk and OpenAI and Sam Altman. Microsoft was involved in that trial in the sense that every so often a lawyer from Microsoft would stand up and say, “And we weren’t around.” And someone would say yes, and that was that.

But obviously, what came out during that trial, what has been clear during this entire time, is that the original notion was that OpenAI would be a research lab and provide models, while Microsoft would build the products. Microsoft had expertise in going to market; it had expertise in enterprise, it was trying to regain a foothold in consumer in a variety of ways. This would be a platform shift, and the research work would be over at OpenAI, and the product work would be inside of Microsoft.

That’s the thing that changed: OpenAI wanted to make more and more consumer products. Obviously, given your new role and your new focus, Microsoft more and more wants to make its own models. Why the split? What didn’t work in that relationship?

I mean, I think OpenAI is led by an incredibly ambitious founding team, and Sam himself. And so naturally, as they started to get more traction and generate a ton of revenue, they saw opportunities to go full stack. So it wasn’t just that they started working on consumer products. Obviously, ChatGPT was incredibly successful. They also started working on their own data centers. They started creating their own chip. There are lots of rumors flying around about their own consumer hardware devices. They started taking models direct to market through ChatGPT Enterprise. So across the stack, they were kind of broadening way beyond research over the last two, three, four years. And naturally, the same is also true for Microsoft. I mean, I think the partnership’s now five or six years old, and still has another four, five, six years to run.

Likewise, we’re one of the largest technology companies in the world. We have 493 of the 500 largest companies that store and process most of their data on our systems, use Azure, use M365 and Teams. I think people often underappreciate how enormous we are and how big our distribution is in enterprise. And so, long term, and I do mean over five, six, seven, 10 years, we have to make sure that we’re completely sustainable, and we’re not just a recipient of somebody else’s IP that we then slightly modify and adapt and put into production for our products, but we actually can stand on our own two feet and create world-class models.

I mean, superintelligence is coming. I think it’s just around the corner. And so I think it’s going to be basically the most valuable technology of all time. There’s sort of no way that, long-term, we could be structurally dependent on a third party for providing that IP for all eternity.

So that’s been the transition that obviously was triggered when OpenAI and so on had their board issue. But then as I came in and my team came in, we started building that out, we’re on that transition. And I think we’re in a great spot because we can take a fairly steady, careful, long-term optimal position, both for OpenAI, which I think has done incredibly well out of this, and for us.

I want to spend some time on superintelligence. I just want to put a pin in it now because I just want to kind of understand the transition for one more turn here.

There’s a moment in the trial, sort of very funny message from Microsoft CEO, Satya Nadella, he says, “I don’t want to be Intel and have OpenAI be Microsoft,” which is very funny in the context of Microsoft CEO himself saying, “I don’t want to be the provider, and have them be the platform that provides all the value and collects all the value and maybe we’ll be swapped out. I don’t want ChatGPT to run on Azure, and then OpenAI will get all the value, and then maybe they can swap us out,” just as what happened with Windows and Intel over time.

Is that a realization? Did Nadella come to you? What was that meeting like where you said, “Okay, OpenAI had its board issues. We need to get back on the frontier and stand on our own two feet.” What did that conversation look like, and how was that decision made?

I mean, obviously that’s Satya’s decision as well as Amy, Brad, and many other people in the company. But I think it’s as with anything: these are slow-moving changes in the company, as it comes to realize that the direction that we’re taking needs a little bit of tweaking and adjustment. And so that was happening way before the November board incident, and I think it just builds up over time as you look at the kind of constellation of different fronts around which we’re competing directly, increasingly, and all the tension that comes from that. But also just knowing that partnerships like that don’t last forever.

I mean, OpenAI wants to be a trillion-dollar public company, has incredible revenues, and is growing like crazy. They want to have the freedom to operate and be able to buy compute from all sorts of other places, build their own compute, and partner with whoever they want. So the contract was formed at a time when the companies were very different in terms of size and scale and balance of needs and stuff. I think it made sense for that moment, but then it became pretty clear that this is something that we have to be able to own and control ourselves and do right by our own customers.

As I said, we have an incredible distribution on enterprise, which I think is just completely unrivaled in the world. And so we have to make sure we’re building the best things for our customers. That looks slightly different to a company that has been jointly optimizing both for the consumer, with ChatGPT, and for the enterprise, and also for the fundamental science mission of superintelligence, which includes a whole bunch of different directions which are overlapping but could arguably be said to be orthogonal to the consumer and the enterprise directions too. Naturally, I think that’s how partnerships evolve, and they get reset periodically.

Yeah, but building a frontier model is very expensive, I’m told. Reliably told, this is a very expensive project. At some point, Amy Hood, the CFO of Microsoft, has to say, “Yep, you’ve got the budget.” When did that happen? Was that just a text message? Was there a meeting? Tell me about the specifics there.

I think, look, we sort of made the decision in the early part of last year, which obviously informed all the contract negotiations, which then all got resolved and signed in October. And it is a significant investment, but we have a long time to make it. I mean, we’ve already made significant investments in our own self-sufficiency mission.

Our Maia 200 chip is actually an outstanding chip, as one example, right? We are now able to manufacture and ship a chip that is 30 percent cheaper than a GB200 inside of our own clusters. And now that we can co-design our own models with it, the MAI-Thinking-1 model that we’ve just released actually delivers 1.4x performance per watt improvement on top of the 30 percent improvement that you get from running on a Maia 200 once we co-optimize the models for our tasks.

So the value of making sure that you own and control your own stack and direct the entire co-design effort end-to-end for the use cases that are most important to us — which is obviously agentic coding, our developers, our enterprises — that clearly pays the dividends that justify the investment that we have to make over the next few years.

You said self-sufficiency mission, which is a very polite way of saying you want to stand on your own two feet; you want to do your own thing. I’m told there’s some controversy inside of Microsoft about a line my colleague Hayden Field wrote in a piece describing Build. I’m just going to read this. This is from Hayden. It’s a great line. She said, “This year’s Microsoft Build had the vibe of a freshly single divorcée posting a thirst trap on Instagram.”

The breakup is completed, and it’s time to flex. Here’s our new model. We’re going to stand on our two feet. You’re out there saying you’re going to build models at the frontier and compete with the leading labs. Is that the feeling inside of Microsoft that you’re free to be on your own?

Definitely not. No, not at all. Look, I mean, obviously that’s a cool headline and a fun phrase. But the reality is that we are in partnership with OpenAI for years and years to come. I mean, we’re running way north of 2030. They still produce the best models in the world. GPT-5.5 is an outstanding model. The Codex, the cybersecurity models that are coming through, are amazing, and they’re powering the majority of what we do.

So naturally, that’s going to continue. And so I think that’s just a natural course of these sorts of partnerships. I don’t think it’s anything untoward or surprising. I think OpenAI is very understanding and supportive of that. I mean, they’ve obviously been an incredibly fast-growing company, and they understand that we have to pursue our own agenda as well. So it’s very normal.

Let me ask you the other *Decoder *question, and then I want to get into the announcements at Build, and certainly superintelligence.

The last time we spoke, you said your framework for making decisions operated on a six-week cycle, given how fast AI was moving. That made sense then. Things have settled, maybe. Maybe some things are more in focus. What is your decision-making framework now?

We still operate by the same cycle rhythm. At the end of each cycle, we have a one-week meetup in person. I’m a real believer in this, even though we’re still an in-office culture, four days a week. In fact, the week after next, my entire Superintelligence team comes together in person in Boston for four days. That is for all of our retrospectives on how Build went, what we learned, what we didn’t get right, what we need to improve, our planning for the next cycle, which is going to run for eight weeks this time with a one-week meetup afterwards, and that’s all laid out for the entire year. So the whole organization knows that that’s the rhythm by which we operate.

And I think it’s actually really important to emphasize that timeframe, because quarterly planning gets a little bit blurry and a bit abstract. I think six to eight weeks, depending on where it falls in the calendar, is actually the optimal time for making very clear, fortifiable missions.

So we also, in addition to the rhythm of these six-to-eight-week cycles, operate by squads. The squads are mixed interdisciplinary subgroups that are focused on a specific mission, and they don’t necessarily ladder up to the manager. They actually are run by a DRI, and the DRI is often an IC, and their job is–

That’s “directly responsible individual” and “individual contributor.”

Yeah, exactly. Thank you. And I think we’ve taken the approach of separating the role of the manager from the role of the DRI that executes on a specific mission. I think that’s because being a great DRI is exhausting. You’re literally all-in 24 hours a day, and you’re pushing as hard as you possibly can. Being a manager is often about being a coach, offering support, giving guidance, feedback, unblocking all sorts of things, helping with people’s career growth. And so I think keeping those separate allows us to rotate DRIs every two or three cycles so that some people can try sort of different positions and have rotation. It’s a great, very flexible structure that allows us to be pretty nimble, I think.

Let’s talk about Build. I wanted to start with superintelligence. You’ve mentioned it several times now. I was just at Google IO. Demis Hassabis, who used to be your colleague when you were at Google, ended that keynote by saying that we were in “the foothills of the singularity, and that AGI was coming with all the power of Google.“

You’re saying superintelligence is here. Are these all the same things? Are we using different language to describe AGI? Are there differences? How would you define superintelligence in your context versus the singularity in Demis’s?

I mean, obviously I didn’t say it was here. I said it’s coming. And I think there’s a lot of fluidity around these phrases. But I think what we can clearly see that what’s happening right now is that there is log-linear hill climbing across all modalities, and that means that there is a very direct relationship between each order of magnitude of compute that we apply, each incremental increase in data, and climbing on benchmarks, whether they’re public benchmarks, internal benchmarks, they’re targets that we focus on with reinforcement learning environments. And that is a very important observation.

Those predictions that I think we’re all making — I understand why some people are sort of skeptical of them or raise questions, but they’re very grounded in the sort of empirical observations of over a decade of increase in performance of these models. I mean, essentially the same general-purpose architecture has seen 12 orders of magnitude more computation applied, a trillion-fold increase in FLOPS over 15 years, and basically has worked in audio, in image, in text, in code, and in many other time series prediction tasks. And so we’re basically extrapolating out that more orders of magnitude of compute will enable us to continue to climb in this log-linear way inside of other environments.

And then it raises the question of, are we going to be able to train models that can invent new knowledge, not just sort of extrapolate from existing data that we have, but actually teach us things that we don’t know, and make new discoveries? Then the second thing is, do they have the capacity to self-improve and accelerate the process of deciding which hypotheses should be set, which ones should be pursued, how to generate training data for each of those, how to factor those into new runs, or even innovate on the actual architecture itself?

So, I think both of those things need to be true to be able to see this compounding progress, but I think we’re going to continue to get massive gains just from applying the next few orders of magnitude of compute. That probably does achieve parity with human performance on many, many more tasks, just as we’ve seen that happen in the last six months on coding.

Coding is really interesting, because it’s easily validated, right? You write the code, you ask the computer to run it, it runs or fails. We’ve seen some of the downsides, certainly around security, right? The downsides are obvious, and we’re seeing that this sort of regulatory approach to coding security play out in lots of ways. I’ve probably vibe coded some security disasters on my own phone and computer, and maybe that’s a risk I’m willing to take.

Every other function doesn’t seem that easy. I always pick on law, because that’s my background. But a judge doesn’t validate legal writing the way a computer validates code. If you get it wrong, the judge can send you to jail, right? That is maybe the worst output validation error that you can probably run into.

How do you measure the effectiveness across domains as easily as you can measure the effectiveness in coding? Because this seems to me where the metaphor or the analogy from coding to other domains falls apart very quickly.

I’m not so sure. Coding, obviously, you can verify the correct execution of code. It runs, or it crashes. But there’s a ton of nuance in that. The quality of the code that gets written really matters: its extensibility, how reconfigurable it is, how useful it is in practice. It’s not just that a piece of code runs, but it’s also how a model actually uses it as a DevOps or an SRE in production to return to that piece of code that it’s written, and then use it in a practical and useful way.

And then, of course, you have to grade the quality of the output that has been produced. It may be high-quality, functioning code, but is it actually the app or the website that you wanted? And there are aesthetic judgments in that; there are commercial judgments in that. The challenge of internalizing non-verifiable rewards is present in code, even though code is still primarily a verifiable reward signal. I think the other thing to observe is that, like chat is also a non-verifiable space, and yet, we’ve managed to climb that to basically human-level performance through interaction with real-world usage that provides a very strong-

Wait. I’m very curious. How do you measure chat at human-level performance?

Well, I think many people are having long, meaningful conversations with AIs at human-level performance. The quality is exceptionally good. It has very good emotional intelligence. It’s broadly very accurate. We’ve minimized the hallucinations. We don’t talk so much about bias anymore. It’s grounded in real-world observations. I think by most people’s measures, we’ve reached human-level performance in conversation for quite a wide range of tasks now.

What are your measures, and actually, sure, most people’s measures? I would disagree with almost all of this, but those are my measures. What are your measures?

My measure is like when I turn to my assistant and ask it to provide me with a daily briefing summarizing all the conversations that have happened on Teams and on email, the updates that have happened to documents, and I get basically a synthesized summary with a set of actions that I should take next. That is basically better than what my chief of staff can produce. I would say that’s human-level performance in synthesis, analysis, proposed actions, and chat.

There are many, many millions of people every day that are using it for emotional support, for counseling, for therapy, for coaching, for advice. I think it’s one of the most popular use cases inside all of the chatbots. That’s a pretty robust measure, I would say, to make the claim.

I know you’ve spent a lot of time thinking about this, particularly the emotional connection to some of these chatbots. These are products that you have built and deployed. I would draw a pretty big distinction between this thing is really, really good at summarizing my email, task list, and providing me a brief about what things to prioritize, and this thing is an emotional coach for somebody undergoing some kind of crisis.

Those are not similar tasks. Those are not necessarily similar kinds of intelligence, even in people. I know some people who are very good at making lists, and are very bad at emotional support. How do you put that all together in your brain and say, “Okay, this is broadly human-level performance in chat?”

I think if you define chat as an interactive exchange between two parties, one of which in this case is an AI, that broadly satisfies some goal, you’re looking to learn the sports score, for advice on which restaurant to go to, for coaching and feedback on an essay that you’ve written, for suggestions about which job to take next, or some tough conversation you’re about to have with your manager. You get a response, you go back and forth, you have five or six exchanges, and you find that a useful output, which you might otherwise have to rely on an expert, friend, or even pay a coach.

There are, just objectively, empirically speaking, hundreds of millions of people that get that experience every day from these chatbots. Maybe we could quibble over whether that technically represents human-level performance. I think it’s a fairly reasonable thing to claim.

There’s no reason why that isn’t going to continue climbing, right? The rate of climbing in the last three years is the thing that I think is most staggering. And so, what we’re trying to do from this point is extrapolate: okay, what are the fundamental drivers of that climb — compute, data, interaction from real-world users — and those things look set to continue.

I think that they apply to many other domains too, not just chat, emotional support, and productivity and that kind of thing, but also many other domains beyond that/ Healthcare, live production deployments inside of education, assistants that are increasingly managing your home, looking at just everything that is in your everyday life basically to make you more productive. That is, I think, a trajectory that’s likely to continue.

You’ve mentioned now that it’s still the same fundamental architecture, transformers, and attention. We’ve been applying compute to that for 15 years, and we’re getting these big increases. You are in a fairly unique spot.

At Build, you announced your first flagship reasoning model, MAI-Thinking-1. You got to start from scratch. Is there anything you’ve done differently now after 15 years of architecting and training this model, or is it just, yep, we’re going to collect all the data and run the training just as we did, and we have more compute now, so it’s going to be better?

No, actually, I think there are quite a lot of differences. The first thing to say is that the way that you curate the data… We start right from the top of the stack; we have basically paid for and acquired an extremely high-quality, very conservative set of data, and extracted a lot of the noisy, distracting, low-quality, potentially security-risk issues to do with that data. And the methods that you do for that, I think, are actually quite proprietary. We just shared a 109-page, very detailed, technical report, which was very well received on Twitter, and shares a lot of the details on how we do this. I think the second thing is, whilst I think it’s important to be quite cautious with architectural choices, and we have been, there are also a number of pretty significant shifts that I think we’ve made in how we put together our training runs.

Our training runs have been incredibly stable, with very few crashes, and very few restarts. We shared a lot of those graphs to show infrastructure stability, and also MFU efficiency, so model FLOPS utilization, which basically shows that we can put a state-of-the-art number of FLOPS through each chip for every step in our training run. I think that this is extremely easy to get wrong, and we all hear lots of stories from different labs about how things do go wrong.

It is actually pretty hard to make the very careful and deliberate choices to get things right, and take the right approach to make sure we produce high-quality models, because our job and our ambition is to try and build this hill-climbing machine. That means the integration of the silicon with the models, with the super high-quality data, with a stack of RLEs, reinforcement learning environments, that allow us to basically, systematically hill climb against any objective that we choose.

And that’s what MAI-Thinking-1 is. It’s a general-purpose, fairly neutral, thinking model that is pretty good at coding. It’s now roughly on par with Opus 4.6, at least on the benchmarks. We haven’t deployed it at scale into production, so there’s still lots more work to do there. But it’s an extremely strong reasoner and scored 97 percent on AIME, which is the primary measure for its reasoning performance, at least on the benchmarks.

It’s very good at instruction following, and then the goal is basically to make that available to many, many developers and enterprises and allow them to climb on it for their use cases. Everybody has a sort of slightly different objective that they have in their company to try and build agents and so on that support their use case.

One of the things that you’ve noted in talking about MAI-Thinking-1 is that you didn’t distill any existing models, which actually struck me as surprising, right? This is a thing you could do. You have access to OpenAI’s IP. Everyone’s distilling everything. We just found out in this trial that Grok was distilled from a number of models. Why not do distillation here? Why not jump ahead?

There’s definitely lots of shortcuts to the frontier, and if you take a super high-quality model, and you polish your base model with high-quality instructions, or answers, or outputs from a superior model, then it’s true that the model might quickly fit to that distribution. But it’s very unclear that they would then be able to surpass that teacher.

So, we’ve been very deliberate for two reasons. The first is that we want to make sure that we can exceed the teacher in order to set the frontier ourselves over the next few years. And the second is that we really want to build one of the great labs, and it’s going to take us many years to come, probably the next two or three years.

But, in order to do that, we have to be able to show that we can actually build every component ourselves. We can hire the very best talent in the world. We can push the frontier with actual research, rather than just re-implementation, copying, or distillation from any other third party.

We’re in a great position where we’re able to really carefully and meticulously pursue that objective, knowing that we have the resources to buy Anthropic models where they exceed the frontier. We have the resources to put 11,000 different models inside of Foundry, so every one of our developers gets pure optionality. And of course, we have the resources to continue to deploy OpenAI models, which are obviously outstanding and are at the frontier today.

That’s just a natural part of the self-sufficiency mission, and it’ll take time for us to truly get to the absolute frontier on that. But I think we’re in a great spot. We made a ton of progress. This is a very, very strong model, and it wasn’t just that model that we released. We’ve released seven new models simultaneously.

Our transcribed model, for example, MAI-Transcribe-1.5 is literally the number one in the world. It’s the most cost-effective of any of the hyperscalers. It’s the highest on accuracy. Our image model is now number two. Our image editing model is number three right behind Google’s and OpenAI’s. I think we’re well up there with our image and audio. Our code model, CodeFlash, is incredibly strong, optimized for VS Code. and is a really, really a great model that’s on par with Sonnet 4.6. So it’s really in a great spot this minute.

Were there any legal or IP concerns with distillation? I know this is a live issue out in the world: Anthropic complains of other people distilling their models. There are concerns about Chinese companies distilling models, and whether our existing IP agreements can cover that. Did you have any of those concerns to keep you away from it?

Oh, we didn’t, but I think I understand why a lot of people get frustrated. Anthropic has been very frustrated, and some of the rumors around xAI, and Meta, and obviously, the open source models, and so on, because essentially, that’s basically taking the IP, and the knowledge that another team has put together, and then, literally force-feeding it into your own model. I think it’s a bit of a short-term win, and like I said, really, we want to create a culture in the lab where we can come up with the next big thinking breakthrough, or the next big coding breakthrough, or the next big architectural push.

Right now, we’re experimenting with the looped transformer, which is a slightly different variant on the current transformer. Lots of people in the field are looking at it too. No one seems to have quite got into production yet. But, in order to create a culture and a team that can really push the frontier, they have to understand, own, and create the full stack as and when they need to, and also use things from third parties whenever we need to too. And like our paper, for example, has hundreds of citations grounded in the rest of the literature, so it’s very much a contribution back to the field in return for everything that we’ve learned over the years from all the great publications that have been out there.

Can I ask you — if you understand that frustration from Anthropic and your peers in AI about distillation, do you also understand the frustration from creatives, publishers, and YouTubers about all the AI companies scraping their work as a collective to make these models? Because that frustration is only getting louder.

Yeah. No, I understand the frustration. The open web challenge is one we’ve talked about before, and I get it, and I see that people are frustrated, and obviously, that’s working its way through the conversation in the courts. And I see that people put things online, and they had different expectations about what the contract was with that being placed online, and it’s a tricky one.

You mentioned all your data was carefully curated. Did you pay for all the data that you’re using to train the new models?

A lot of our data we obviously take from the open web in the normal way. Carefully curated means that it’s extremely carefully filtered for security, for quality, for third-party dependencies from some of the open-source datasets, and keeping it away from a lot of the Chinese lineages, which I think are very different. Our enterprises want to make sure that when they put something into production, they can trust us that we’ve really built it with their needs in mind. And I think this is one of the benefits of being very, very deliberate, patient, and being attentive to all the details.

You mentioned enterprise. I think this is very interesting. Microsoft is all in on enterprise AI, in big ways, actually. I would even draw the line straight to Asha Sharma, the new head of Xbox, who is getting rid of AI in a bunch of places, and the gamers are happy, right? There’s one reaction to AI in consumer space, but there’s another in enterprise. I think AI has as close to product-market fit in enterprise as you can get with something changing as fast as AI. There are a bunch of databases that corporations control, and you can just go access them, because they control them. That’s their data.

There’s a bunch of repeatable processes and tasks, and old systems that maybe the models can just do more efficiently. There’s something very important happening to enterprise. At the same time, the consumer antipathy towards AI is just increasing. And my argument is we have not built great consumer AI products. This industry has not produced them. It has not shifted them. It has not made it obvious that all of this is worth it, that using all the data from the open web, and changing the contract of publishing to a mass audience of people, so now, it’s being used for training models that will deliver trillions of dollars of value to corporations. There isn’t a product that says this is worth it.

Again, Satya Nadella recently gave an interview with Axios, and he said, “We need social permission for this. And until we have it, until we deliver that value, people are going to feel this way.” We’ve seen college speakers get booed. We’ve seen data centers get banned. Do you think that there’s a consumer product that’s worth it, that’s worth the angst about training, that’s worth the angst about data centers?

That was your focus; now your focus is enterprise. I would say that just on the face of it, it doesn’t seem like Microsoft has interest in the consumer product anymore. But, do you see one that’s worth it, or that could be built?

I’m not sure I agree with you that there hasn’t been any value for the consumer out of this. Across all of the chatbots, there are billions of people a month that are getting immense value out of it.

Now, just for a moment, empathize a little bit with the small-scale business owner, or the kind of mom that’s helping her kid with the homework, and can now just turn to a conversational AI, and get feedback, get instructions, get essay questions set. Just being able to ask questions like how do I generate revenue? How do I put together a cash flow forecast? Which college should I apply to?

I mean, these are everyday tasks that are coming with some pretty high-quality factual advice and information. So I don’t really buy that people are not getting benefit out of these things. I think they are.

I think I can very clearly make the argument that they’re not getting enough benefit, right?

Okay.

They’re the ones saying that we should not have more data centers. They are the ones booing AI at the graduation speeches. The polling is clear, particularly young people: the more they use AI, the more antipathy they have towards it. That’s clear in every single poll. That’s the argument I’m making — not that there’s no value, but the value exchange is not clear enough.

Yeah. Fair enough.

I’m seeing Microsoft in particular pivot to enterprise, away from the big search product, the reinvention of Bing that would make Google dance. That’s over, and we’re all focused on enterprise, where the value is. I’m just wondering if there’s enough value for the consumer to make all of this worth it.

I think there’s understandably a lot of anxiety. There’s an enormous amount of speculation about what’s going to happen in the next five to 10 years. Whether it’s framed as the singularity or whether it’s framed as the job apocalypse, these are not helpful framings. I think that people are scared because it’s poorly defined and it’s often framed as an inevitable, threatening gray cloud over people’s heads.

I think that what matters is what we do with technology.I think that I’ve for a long time argued that we have to place the human first. Some people in the field have placed scientific discovery first or placed accelerating intelligences that can explore the galaxies and so on, and said that it’s inevitable that we’re going to have these AIs that are going to be more powerful than all of us combined. I mean, that’s naturally scary to people.

And I think that we have to basically flip it the other way around and say the purpose of science and technology is to make us all healthier and smarter and happier. That’s been the quest that we’ve been on as a species for thousands of years of invention, and it’s the test that we should put superintelligence to again. And if it doesn’t achieve that test, then I think people will reject it, and they’ll be right to reject it.

I think that everybody’s focus is now going to turn in the next five years to, how is this making me healthier and happier, smarter, more capable, more productive? And if it’s not doing that, then naturally people are going to be angry and resist and react. I don’t think there is anything unexpected about that or anything wrong about that — I think that’s inevitable.

So that’s why one of the things I’ve been passionate about for many, many years is healthcare. And just a couple of days ago we announced a new partnership with Mayo Clinic. This is the number one hospital in the world, consistently reported. They have the highest quality longitudinal patient record dataset across all the modalities. They have the best clinical practice.

They’re also a nonprofit, which I think a lot of people don’t realize, with 65 percent of their patient population on Medicaid. People often associate them with the international super elites flying in to get the best care in the world, but they actually have the majority on Medicaid. They’re an amazing institution with an incredible mission to deliver the best healthcare everywhere. And we now have a very long-term partnership to co-train from scratch with their data, with our models, a brand new foundation model for health, deploy it in their hospitals, and hopefully take it around the world to deliver the best clinical care and healthcare that we possibly can to as many people as possible.

That’s why I got into the field. That’s what I was originally motivated by, and it’s what I’m passionate about. And I can only focus on the things that I think are going to make a difference and that will help people and leave a good legacy for everybody, and that’s what we’re trying to do.

I appreciate that. I appreciate the healthcare framing, and I understand why that’s everyone’s go-to, right? Healthcare in America in particular, if you could make it even 10 percent better, you will have affected a lot of people’s lives in a particularly profound way.

The thing is, I know a very smart guy who has a very different and vastly more aggressive approach to all of this than you. That person is you, four months ago. This is what Mustafa Suleyman said to the Financial Times four months ago: “White-collar work when you’re sitting down at a computer, either being a lawyer or an accountant or a project manager, or a marketing person, most of those tasks will be fully automated by an AI within the next 12 to 18 months.”

That’s four months ago. That implies that a year from now, lawyers, accountants, project managers, and marketing people will not have jobs. Their jobs will be automated. Is that still your timeline?

No, no, no. Hold on a sec. So I said “tasks” in the quote that you’ve just said. I said tasks. So that does not mean jobs. It’s a very important distinction. In labor economics, there is an entire taxonomy of sub-components of a role or a function in an organization. Sending an email, having a conversation with a colleague, putting together a PowerPoint — sub-tasks will increasingly become digitized, automated, and we can basically generate more and more of them.

That does not necessarily mean that the role goes away at all. It just means that the work can be done faster and more efficiently, which is today often work that is quite rote, is quite manual, is quite labor-intensive, and is time-consuming. And so the natural progression of technology is to make your life easier, faster, less friction for more seamlessness. As everyone often complains, that has made you and me and everybody else much busier.

It’s actually made us more available, more stressed, and it’s given us more information. So there are always these revenge effects of efficiency, which I think people forget. It’s quite likely that we are going to get much, much more productive because we spend less time doing the kind of narrow administrative menial tasks, and we’ll have to spend more time doing creative, judgment- focused things, which ultimately create a lot more value.

We can also experiment much more quickly. So we’re able to try lots of things out in parallel because the cost of execution is going to get lower. In my mind, that’s likely to increase the overall quality of things, because we’re going to try out more hypotheses, whether in journalism or in business or in anything that we do.

I think that’s sort of slightly taken out of context because of a natural misunderstanding between jobs and tasks, but nevertheless, you could push back at me and say, “Okay, well then what does the landscape look like in five or 10 or 15 years’ time?” And that’s where I think we have to return–

Actually, I’m not going to push back on you in that way. I’m going to push back in a very specific way. And I realize this is your quote and you’re saying it was misinterpreted. I’m just looking at this literal sentence, and there is no distinction between tasks and sub-tasks. It is, “white-collar work.”

The examples are lawyer, accountant, project manager, marketing person, and then you said, “Most of these tasks will be fully automated by an AI within the next 12 to 18 months.” There’s no distinction of sub-tasks there. You’re saying most lawyers will have their jobs fully automated and the practice of law will look totally different within a year, even by the words of that quote.

And I’m just saying, are you still on that timeline, that being a lawyer will look totally different because agents will be running around doing everything that we were doing before?

Well, most of the tasks mean work that you do in order to get your overall job done, and that I think is going to free you up to do the more human-like and the more judgment parts of your work. There’s a very important distinction in... Jobs and roles are the broader category, and tasks are the components of that. And it’s an established definition in the literature, in labor market economics, for many, many decades.

It was maybe too nuanced even for the Financial Times, but nevertheless, that was the intent. Now I do think there’s an important question: where does that leave us in the longer term? And it is going to be challenging, like more and more of this stuff... We can quibble over the timelines of whether it’s a few years or whether it’s a decade, or whether it’s 20 years, but the reality is we are going to be automating more and more of this work, tasks, jobs, roles, activity, and everything that we do.

And so what’s going to matter more is the governance that we put around these technologies. Who are they accountable to? Who owns them? What are the feedback loops that regulate and introduce friction to make sure that they actually serve people? I mean, I wrote an essay on humanist superintelligence outlining quite directly, four or five months ago, what I think of as basically a north star, maybe not quite a framework, but a set of principles that basically says technology is here to serve us. That’s the test that we should put it to. It’s the test that people have put it to. It’s the test that we care about at Microsoft.

I think that more and more everyone’s going to have to really focus on that question, because it is going to deliver a tremendous amount of good, and we want it to continue doing that, but we want it to do it in a way that doesn’t sort of cause ridiculous amounts of instability during the transitional period.

I believe you. I know you’ve been thinking about this stuff for a long time, but I’m going to respond in the way that I know my audience wants me to respond, because I hear it from them all the time. And what it looks like is this whole industry — you, everybody included — went all in on “we’re going to replace all the jobs” and really accelerated building out data centers at massive capacity, and asking for a lot of resources against big promises.

There was political pushback, and now all of the stances have softened. And you saying it’s not all jobs are going away, we have to rethink jobs, is of a piece with all the other CEOs in this industry saying similar things, and talking about healthcare, that comes up every single time now. I’m wondering if that political pushback has actually changed how you are talking about this.

There are a lot of your peers who think AI simply has a marketing problem, that it hasn’t been communicated effectively enough, and they should spend hundreds of millions of dollars on podcasts to communicate the benefits of AI more effectively. This is a real thing that is happening in this i

この記事をシェア

Simon Willison Blog重要度42026年7月24日 07:53

AI エージェントの暴走か、悪質マーケティングか

The Register AI/ML2026年7月24日 05:48

ChatGPT 出力制限に代わるツール登場

One Useful Thing重要度42026年7月24日 03:05

AI活用ガイド：何に使うべきか

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む