読み込み中…

Smol AI News·2026年6月2日 14:44·約14分

Microsoft Build：MAI-Thinking-1 と MAI ファミリーモデル、Surface RTX Spark Dev Box、Windows の OpenClaw を発表

#Reasoning Models #MAI-Thinking-1 #Local AI #Model Transparency #Microsoft

TL;DR

Microsoft Build 2026 で、独自データとゼロディストillationを謳う推論モデル「MAI-Thinking-1」を含む新モデル群を発表し、技術報告書の公開やローカル AI 基盤の強化により業界に大きな影響を与えた。

AI深層分析2026年6月3日 18:05

重要/ 5段階

深度40%

キーポイント

MAI モデルファミリーの発表とフラグシップ推論モデル

Microsoft は「MAI-Thinking-1」を筆頭に、コード、画像、音声など 7 つの新モデルを発表し、特に MAI-Thinking-1 を独自データで学習した初の推論モデルとして位置付けた。

透明性を重視した技術報告書の公開

MAI-Thinking-1 に関する 109 ページに及ぶ詳細な技術レポートを公開し、データ系列の明確化と他社モデルからのディストillation（蒸留）を行っていないことを強調して信頼を得た。

ローカル AI と Windows エージェント基盤の強化

Windows 上の GPU インストールベースを活用したローカル AI や、エージェント実行のための安全なレイヤー構築、および「Surface RTX Spark Dev Box」などの新ハードウェアを発表した。

重要な引用

Microsoft used Build to position itself as both an AI platform company and a frontier-model lab

built with clean data lineage and zero distillation from third-party models

Microsoft released a 109-page technical report for MAI-Thinking-1

影響分析・編集コメントを表示

影響分析

Microsoft は単なるプラットフォーム企業から、最先端モデルを開発するラボとしての側面も強く打ち出すことで、AI エコシステムにおける主導権を強化しました。特に「ゼロディストillation」という技術的アプローチと、詳細なレポート公開は業界の透明性基準を高める契機となり、他社にも同様の開示を求める圧力となる可能性があります。また、Windows 端末でのローカル AI 実行基盤を整備したことで、クラウド依存からの脱却やデータプライバシー重視のユースケースへの対応力を大幅に高めています。

編集コメント

他社モデルへの依存を断ち、独自データで学習した推論モデルを発表し、技術報告書を公開する姿勢は業界の透明性基準を高める画期的な動きです。ローカル AI とエージェント実行基盤の強化により、Microsoft はクラウドとエッジの両軸で主導権を握ろうとしています。

静かな一日。

2026年6月1日〜6月2日のAIニュース。12 のサブレディット、544 件のツイート、および追加の Discord サーバーを確認しました。AINews のウェブサイトでは過去のすべての号を検索できます。念のため、AINews は現在 Latent Space のセクションの一部となっています。メール配信頻度のオプトイン・オプトアウトが可能です！

AI Twitter リキャップ

トップストーリー：Microsoft Build の振り返りと、新しい MAI モデルの技術詳細**

何が起きたか

Microsoft は Build を通じて、AI プラットフォーム企業であると同時にフロンティアモデル研究所としての立場を確立し、広範な製品発表と、新しい MAI モデルファミリーに関する異例に詳細な開示を組み合わせています。

Microsoft AI は、推論、コード生成、画像処理、音声文字起こし、音声合成の各分野をカバーする 7 つの新規 MAI モデルを発表しました。これらは @MicrosoftAI と @mustafasuleyman によると、MAI-Thinking-1、MAI-Code-1-Flash、MAI-Image-2.5、MAI-Transcribe-1.5、MAI-Voice-2 を中心に構成されています。

フラグシップとなる推論モデル「MAI-Thinking-1」は、Microsoft の初の推論モデルとして紹介されました。@mustafasuleyman、@baseten、@tuhinone、@HannaHajishirzi による投稿によると、このモデルはクリーンなデータ系譜（data lineage）に基づいて構築され、第三者のモデルからの蒸留（distillation）を一切行っていないことが強調されています。

Microsoft は MAI-Thinking-1 に関する 109 ページにわたる技術報告書を公開し、その透明性の高さから @eliebakouch, @ethanCaballero, @nrehiew_, @yacinelearning, @stochasticchasm といった技術志向の読者たちから強い肯定的な反応を引き出しました。

Microsoft はまた、ローカル AI とエージェントネイティブな Windows を強調しました。Build のメッセージでは、エージェント向けの安全な実行レイヤー、新しい Surface RTX Spark Dev Box、Windows GPU インストールベース全体への Windows AI アクセス、そして Project Solara/Scout といったコンセプトハードウェアが取り上げられ、これらは @yusuf_i_mehdi, @TheTuringPost, @kimmonismus によって要約されました。

Build ではまた、キャンバス機能、デバイス間での継続性、より緊密な GitHub エージェントワークフローを特徴とする「エージェントネイティブなソフトウェア開発のためのデスクトップホーム」として、GitHub Copilot アプリの主要な推進も含まれており、これは @pierceboggan, @lukehoban によるもので、@techgirl1908 からの反応もありました。

Microsoft は Web IQ を導入しました。これは AI エージェント向けの新しいグラウンディング/検索 API スタックであり、同社は「業界のほぼすべての AI エージェントやチャットボット（Copilot や ChatGPT など）を既に支えている」と主張しています。この情報は @JordiRib1 経由で伝えられました。

Satya Nadella は Build を単一の製品発表ではなくエコシステム全体の転換点として位置づけ、Mustafa Suleyman はそれを Microsoft の内部における「ヒルクライミングマシン」の出力であると捉えました。これらは @satyanadella, @mustafasuleyman によるもので、@nrehiew_ からの反応もありました。

MAI モデルファミリー：開示された事実と技術詳細

MAI-Thinking-1

Microsoft は、@mustafasuleyman の投稿において、MAI-Thinking-1 を 350 億のアクティブパラメータを持つ MoE（Mixture of Experts）モデルであり、コンテキストウィンドウは 256K と説明しました。

@scaling01 による別の要約では、このモデルは 1T@35B パラメータ構成で、30T トークンで事前学習され、8192 台の GB200 GPU を使用してトレーニングされたとしています。これは Microsoft のマーケティングコピーではなく、技術レポートの解釈であるようです。

@kimmonismus も同様に、アクティブパラメータが 45B の中規模 MoE と要約していますが、これは Mustafa 自身が示した 35B という数値と矛盾します。この投稿セットにおいてより権威ある数値は、公式の 35B アクティブという数字です。

Microsoft は、AIME 2025 で 97%、SWE-Bench Pro で 53% のスコアを達成したと主張しており、Surge での盲検ヒューマン評価では Sonnet 4.6 を全体的に上回っていると @mustafasuleyman と @asadovsky が伝えています。

Microsoft は、このモデルが MAIA 200 で最適化されており、MAI モデルをエンドツーエンドで実行する際、GB200 に比べてドルあたりのパフォーマンスが 30% 向上し、ワットあたりのパフォーマンスが 1.4 倍になると述べています。これは @mustafasuleyman の情報です。

Microsoft とパートナー企業は繰り返し、第三者による蒸留を行っていないこと、「クリーンなデータ系譜」の維持、および Baseten を通じたトレーニング後のデータで「100% 目を通さない（非人間介入）」状態でのエンタープライズ制御型ファインチューニングを強調しました。これは @baseten、@tuhinone、@MicrosoftAI の投稿における内容です。

MAI-Code-1-Flash

Microsoft は、VS Code や GitHub Copilot CLI 向けの高速コーディングモデルとして MAI-Code-1-Flash を紹介しました。これは最初に @pierceboggan によって発表され、その後 @mariorod1 によって注目されました。

@mustafasuleyman による公式 Microsoft のメッセージによると、Code-1-Flash はわずか 5B パラメータでありながら SWE-Bench Pro で 51% を達成し、Haiku クラスのサイズ/コストに近い位置づけとなっています

@scaling01 による競合する要約では、これは 137B パラメータの MoE（Mixture of Experts：専門家混合モデル）であり、256K のコンテキスト長を持ち、10T トークン以上でトレーニングされたものであり、「Claude 4.5 Haiku よりも強力かつ効率的である」とされています。これはおそらく総パラメータ数ではなく、アクティブなパラメータ数が 5B であることを示唆しており、ツイートはこの区別を完全に整合させるものではありませんが、はるかに大きな MoE の内部に小さなアクティブ・フットプリントがあることを合わせて示唆しています

@scaling01 と @mariorod1 によると、ローンチ時の利用可能性は GitHub Copilot および VS Code を第一とするものとして強調されました

MAI-Image-2.5

Microsoft は MAI-Image-2.5 とその Flash バリアントを発表し、両方ともリーダーボードで 2 位を獲得したと主張しています。@mustafasuleyman はこれらが画像編集において Nano Banana 2 を上回ると述べています

独立したリーダーボードアカウントがその高い順位を支持しました：@arena は Image Edit Arena で 1401 のスコアで 2 位となり、Nano Banana 2、Grok Imagine、ChatGPT Image Latest HF よりも 10 ポイント上回ったと報告しています

@arena はさらに、MAI-Image-2.5 が「パレートフロンティアを前進させた」と述べており、これはその価格帯のモデルの中で、そのベンチマークでより高いスコアを出すものがないことを意味します

配布パートナーはすぐに続き、@OpenRouter や @fal などを含んでいます

MAI-Transcribe-1.5

@ArtificialAnlys は、MAI-Transcribe-1.5 が STT（Speech-to-Text：音声認識）の最前線において、異常に強力な速度と精度のポイントであると報告しました：リアルタイムの約 276 倍、AA-WER（Average Word Error Rate：平均単語誤り率）は 2.4%、リーダーボード全体で 3 位です

このモデルは英語、フランス語、アラビア語、日本語、中国語を含む 43 か国語に対応しており、@ArtificialAnlys によると、名前や医療用語などの稀な用語に対するキーワードバイアス機能もサポートしています。

@ArtificialAnlys によると、Microsoft Foundry を通じた料金は、音声 1,000 分あたり 6 ドルと報告されています。

OpenRouter も、@OpenRouter で同日に稼働させた 3 つの MAI 関連モデルの一つとしてこのモデルをリストアップしています。

MAI-Voice-2

MAI-Voice-2 は、Microsoft の「7 つのモデル」の傘下および、@OpenRouter の利用可能状況投稿に含まれています。

このツイートセットには、ローンチや可用性に関する情報以外に、Voice-2 自体の詳細な技術情報はほとんど含まれていません。

研究者にとって重要な技術レポートの詳細

なぜこの報告書が注目されたのか

主要な技術的な反応は、Microsoft が通常よりも詳細なフロンティアモデルの報告書を公開したという点でした。@eliebakouch はこれを「この規模のモデルとしては最も透明性の高いものの一つ」と呼び、@nrehiew_ は「今日の LLM（大規模言語モデル）トレーニングのための更新された教科書として本当に役立つだろう」と述べ、@stochasticchasm はこれを「金鉱」と表現しました。

複数の読者が、この報告書がパイプラインの詳細、スケーリングラダーの手法、データキュレーション、インフラ指標、MFU（モデルフロップル利用率）の数値を明らかにした点を強調しました。@ethanCaballero、@eliebakouch、@nrehiew_ からの称賛を引き出したのは、このレベルの具体性です。

プリートレーニングとデータ

コメント全体を通じて繰り返される主要な技術的主張は、MAI-Thinking-1 がポストトレーニングだけでなく、開示されたパイプライン全体において合成データも蒸留も使用していないという点であり、これは @eliebakouch、@stochasticchasm、@HannaHajishirzi によって指摘されています。

@eliebakouch は、レポートが Common Crawl および非公開ソースからのデータを明記しており、異なるドメイン向けのターゲットサブパイプライン、大規模な抽出・重複除去作業、そして合成データを使用しないという意図的な選択を記載していると述べています。

スケーリング判断に使用されたレポート内の非公開 NLL 集合は、@eliebakouch によって以下のように要約されました:

50% コード

17.5% STEM（科学・技術・工学・数学）

17.5% 数学

10% 一般知識

5% 多言語

@eliebakouch によると、スケーリングラダーにおけるアーキテクチャの昇格は、効率性向上（Efficiency Gain: EG）指標に基づいて行われました。これは、ベースラインが候補モデルの損失に追いつくために必要な追加計算リソースの量を指します。

同じスレッドでは、約 100/200 トークン per パラメータにおけるアブレーション（除去実験）が言及されており、この設定においては「Chinchilla optimal」と概ね一致すると説明されています。ただし、@eliebakouch は MoE（Mixture of Experts：専門家混合）構造のため、これは密結合モデルのヒューリスティックとは異なるとも指摘しています。

ポストトレーニング / RL

最も議論された技術的選択は、Microsoft が事前の推論経験のないチェックポイントから強化学習（RL: Reinforcement Learning）を開始した点であり、これを読者の多くが注目すべき点として挙げています。@stochasticchasm はこれを「非常に興味深い決定」と呼び、また @stochasticchasm はグラフに基づき AIME25 のスコアが 20% から 95% 以上に急上昇しているという示唆に対して反応しました。

@HannaHajishirzi は、「ゼロから積み上げる」レシピを、シンプルなレシピ、厳密な科学、自己蒸留、忍耐、そして優れたインフラと説明しました

@soldni はこのプロセスを「蒸留なしで登る、ビッグボーイズのように」と特徴づけました

一部の独立した読者は、マイクロソフトがここではあえてこれを避けたとしても、合成データは広範な分野におけるエージェント性能において依然として非常に価値があると報告から推測しました。詳細は @stochasticchasm を参照してください

データキュレーション / ジャッジ / DSPy GEPA

DSPy や後期相互作用のコミュニティから大きな注目を集めた詳細の一つ：マイクロソフトは、事前学習データのキュレーションと品質スコアリングにおいて、GEPA/DSPy 最適化された LLM（大規模言語モデル）ジャッジを使用したと報じられています

これは @bj2rn、@LakshyAAAgrawal、@lateinteraction によって強調されました

インフラ / 利用率 / ハードウェア共設計

マイクロソフトは反復ごとの MFU（モデルフロップス利用率）の正確な数値を明らかにしたと報じられており、複数の読者はこれがこの規模ではめったに共有されないものであると述べています。@eliebakouch によるものです

@scaling01 は今回の実行が 8192 GB200 GPU を使用したと要約しました

@eliebakouch は、ワットあたりのスループットが約 40% 高いという報告された数値を「マイクロソフトのチップに対して非常に印象的で楽観的である」と指摘しましたが、これはラックレベルの予算またはサービング構成を指している可能性があり、ツイート内で完全に解明されていませんでした

マイクロソフトの公式な見解では、モデル設計を MAIA 200 カスタムシリコンに結びつけ、@mustafasuleyman の発言において NVIDIA GB200 と比較してドルあたりのパフォーマンスおよびワットあたりのパフォーマンスが向上したことを強調しました

Build のより広範な Windows/ローカル AI の物語は、ハードウェアの詳細にも焦点を当てていました。例として：

DGX Station でローカルに実行される 1 兆パラメータのモデル

128GB の統合メモリ
110 TOPS の AI パフォーマンス（AI performance）
20 コアの CPU
70 以上の PowerToys ユーティリティ

@TheTuringPost より

反応はまた、RTX Spark 上でローカルに 120B パラメータのモデルを実行する @kimmonismus のような、大規模モデルのローカル実行にも言及していました。

モデルを超えた Build の製品/プラットフォーム要約

GitHub Copilot アプリとエージェントネイティブ開発

GitHub は、@pierceboggan によってエージェントネイティブなソフトウェア開発のためのデスクトップサーフェスとして紹介された「GitHub Copilot app」を公開しました。

主要なテーマには以下が含まれます：

ユーザーとエージェント間の双方向作業のためのキャンバス（canvases）、@Techmeme より

CLI、モバイル、Web、ローカル、クラウドにわたる継続性、@lukehoban より
@techgirl1908 や @OrenMe に反映されるように、GitHub がエージェントワークフローの中心として果たす役割の拡大

Copilot CLI には、タブ機能、組み込みフィードバック/ラバーダック（rubber duck）、プロンプトスケジューリング、音声入力機能を備えた実験的なターミナル UI が追加されました。@GHchangelog より。

エージェントランタイムとしての Windows

Microsoft の Windows 組織は、Build を「より高速な開発者実行、エージェントのための安全な実行レイヤー、デバイス上でローカルで動作する無制限のインテリジェンス」という枠組みで位置づけました。@yusuf_i_mehdi より。

いくつかの投稿では、Microsoft が Windows を Azure だけでなく、エージェントに対する信頼できる実行プラットフォームにしたいと考えていることが強調されました。

@TheTuringPost は、Project Solara を「エージェントファースト」デバイスのためのプラットフォームとして説明し、以下のような概念を含んでいます：

デスクトップ AI コンパニオン

カメラ、マイク、センサー、そして安全な認証機能を備えたウェアラブルバッジ

@kimmonismus はこれらをエージェントを制御するためのハンドヘルド型またはデスクトップ型デバイスと捉え、スタンドアロンの OpenAI ハードウェアに対する人々の期待と比較しました

また、@kimmonismus は Microsoft Scout を「仕事のための常時稼働型のパーソナルエージェント」として強調しました

エージェント向けの Web IQ と検索

@JordiRib1 は、ウェブページ、ニュース、画像、動画を対象としたAI ネイティブなグラウンディング APIのスイートであるMicrosoft Web IQを発表しました

原文を表示

a quiet day.

AI News for 06/1/2026-6/2/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews' website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Top Story: Microsoft Build recap, and new MAI model technical details

What happened

Microsoft used Build to position itself as both an AI platform company and a frontier-model lab, pairing broad product launches with unusually detailed disclosures about its new MAI model family.

Microsoft AI announced seven new MAI models spanning reasoning, code, image, speech transcription, and voice, led by MAI-Thinking-1, MAI-Code-1-Flash, MAI-Image-2.5, MAI-Transcribe-1.5, and MAI-Voice-2 according to @MicrosoftAI and @mustafasuleyman

The flagship reasoning model MAI-Thinking-1 was presented as Microsoft’s first reasoning model, built with clean data lineage and zero distillation from third-party models in posts from @mustafasuleyman, @baseten, @tuhinone, and @HannaHajishirzi

Microsoft released a 109-page technical report for MAI-Thinking-1, which drew strong positive reactions from technically oriented readers for its level of transparency, including @eliebakouch, @ethanCaballero, @nrehiew_, @yacinelearning, and @stochasticchasm

Microsoft also emphasized local AI and agent-native Windows: Build messaging highlighted secure execution layers for agents, a new Surface RTX Spark Dev Box, Windows AI access to the broader Windows GPU install base, and concept hardware such as Project Solara/Scout, summarized by @yusuf_i_mehdi, @TheTuringPost, @kimmonismus, and @kimmonismus

Build also included a major GitHub Copilot app push as the “desktop home for agent-native software development,” with canvases, cross-device continuity, and tighter GitHub agent workflows, from @pierceboggan, @lukehoban, and reactions from @techgirl1908

Microsoft introduced Web IQ, a new grounding/search API stack for AI agents, claiming the APIs already power “nearly all AI agents and chatbots in the industry today, including Copilot and ChatGPT,” via @JordiRib1

Satya Nadella framed Build as an ecosystem moment rather than a single-product launch, while Mustafa Suleyman framed it as the output of Microsoft’s internal “hill-climbing machine,” in @satyanadella, @mustafasuleyman, and reaction from @nrehiew_

MAI model family: disclosed facts and technical details

MAI-Thinking-1

Microsoft described MAI-Thinking-1 as a 35B active parameter MoE with a 256K context window in @mustafasuleyman

A separate summary from @scaling01 says the model is a 1T@35B parameter model, pre-trained on 30T tokens, and trained using 8192 GB200 GPUs; this appears to be a reading of the technical report rather than Microsoft marketing copy

@kimmonismus similarly summarized it as a mid-size MoE with 45B active params, but this conflicts with Mustafa’s own 35B active figure; the more authoritative figure in the tweet set is the official 35B active number

Microsoft claims 97% on AIME 2025 and 53% on SWE-Bench Pro, with blind human raters on Surge preferring it overall to Sonnet 4.6, from @mustafasuleyman and @asadovsky

Microsoft says the model is optimized on MAIA 200, with 30% better performance per dollar and 1.4x performance-per-watt gain versus GB200 when running MAI models end-to-end, per @mustafasuleyman

Microsoft and partners repeatedly stressed no third-party distillation, “clean data lineage,” and enterprise-controlled fine-tuning with “100% eyes-off” post-training data through Baseten, in @baseten, @tuhinone, and @MicrosoftAI

MAI-Code-1-Flash

Microsoft introduced MAI-Code-1-Flash as a fast coding model for VS Code and GitHub Copilot CLI, first announced by @pierceboggan and later highlighted by @mariorod1

Official Microsoft messaging via @mustafasuleyman says Code-1-Flash achieves 51% on SWE-Bench Pro despite having just 5B parameters, positioning it near Haiku-class size/cost

A competing summary from @scaling01 describes it as a 137B parameter MoE, 256K context, trained on 10T+ tokens, and “stronger and more efficient than Claude 4.5 Haiku.” That likely indicates 5B active parameters rather than total parameters; the tweets do not fully reconcile this distinction, but together imply small active footprint within a much larger MoE

Availability at launch was highlighted as GitHub Copilot / VS Code-first, per @scaling01 and @mariorod1

MAI-Image-2.5

Microsoft launched MAI-Image-2.5 and a Flash variant, claiming both reached #2 on leaderboards, with @mustafasuleyman saying they surpass Nano Banana 2 on image editing

Independent leaderboard accounts supported the high ranking: @arena reported #2 in Image Edit Arena with score 1401, +10 points over Nano Banana 2, Grok Imagine, and ChatGPT Image Latest HF

@arena further said MAI-Image-2.5 “advances the Pareto frontier,” meaning no model at its price tier scores higher on that benchmark

Distribution partners quickly followed, including @OpenRouter and @fal

MAI-Transcribe-1.5

@ArtificialAnlys reported MAI-Transcribe-1.5 as an unusually strong speed/accuracy point on the STT frontier: ~276x realtime, 2.4% AA-WER, #3 overall on its leaderboard

The model supports 43 languages, including English, French, Arabic, Japanese, and Chinese, and supports keyword biasing for rarer terms such as names and medical terminology, per @ArtificialAnlys

Pricing was reported as $6 per 1,000 minutes of audio via Microsoft Foundry in @ArtificialAnlys

OpenRouter also listed the model among the three MAI launches it brought live the same day in @OpenRouter

MAI-Voice-2

MAI-Voice-2 appears in Microsoft’s “seven models” umbrella and in OpenRouter’s availability post at @OpenRouter

The tweet set contains little technical detail on Voice-2 itself beyond launch/availability

Technical-report details that mattered to researchers

Why the report stood out

The dominant technical reaction was that Microsoft released an unusually detailed frontier-model report: @eliebakouch called it “one of the most transparent for a model at this scale,” @nrehiew_ said it “could really serve as an updated textbook for LLM training today,” and @stochasticchasm called it a “gold mine”

Multiple readers highlighted that the report disclosed pipeline details, scaling ladder methodology, data curation, infra metrics, and MFU numbers; this level of specificity is what drew praise from @ethanCaballero, @eliebakouch, and @nrehiew_

Pretraining and data

A major technical claim repeated across commentary is that MAI-Thinking-1 used no synthetic data and no distillation, not only in post-training but throughout the disclosed pipeline, from @eliebakouch, @stochasticchasm, and @HannaHajishirzi

@eliebakouch says the report explicitly notes data from Common Crawl plus private sources, with targeted sub-pipelines for different domains, heavy extraction/dedup work, and an intentional choice of no synthetic data

The report’s internal private NLL set used for scaling decisions was summarized by @eliebakouch as:

50% code

17.5% STEM

17.5% math

10% general knowledge

5% multilingual

@eliebakouch says architecture promotion in the scaling ladder was based on an Efficiency Gain (EG) metric: how much extra compute the baseline would need to match the candidate’s loss

The same thread notes ablations at roughly 100/200 tokens per parameter, described as around “Chinchilla optimal” for the setup, while also remarking this differs from dense-model heuristics due to MoE structure in @eliebakouch

Post-training / RL

The most discussed technical choice was that Microsoft appears to have started RL from a checkpoint with no prior reasoning exposure, which several readers found notable. @stochasticchasm called this a “very interesting decision,” while @stochasticchasm reacted to graphs suggesting a jump from <20% AIME25 to >95%

@HannaHajishirzi described the “climbing from scratch” recipe as simple recipes, rigorous science, self-distillation, patience, and great infra

@soldni characterized the process as “climbing with no distillation, like the big boys do”

Some independent readers inferred from the report that synth data remains very valuable for agentic performance in the broader field, even if Microsoft deliberately avoided it here; see @stochasticchasm

Data curation / judges / DSPy GEPA

A detail that got substantial attention from the DSPy/late-interaction crowd: Microsoft reportedly used GEPA / DSPy-optimized LLM judges in pretraining data curation and quality scoring

This was highlighted by @bj2rn, @LakshyAAAgrawal, and @lateinteraction

Infra / utilization / hardware co-design

Microsoft reportedly disclosed exact MFU across iterations, which multiple readers said is rarely shared at this scale, per @eliebakouch

@scaling01 summarized the run as using 8192 GB200 GPUs

@eliebakouch singled out a reported ~40% higher throughput per watt-type figure as “pretty impressive and bullish on microsoft chips,” though this may refer to rack-level budget or serving configuration and was not fully unpacked in-tweet

Microsoft’s official framing connected model design to MAIA 200 custom silicon and emphasized better performance-per-dollar and performance-per-watt vs NVIDIA GB200 in @mustafasuleyman

Build’s broader Windows/local-AI narrative also centered on hardware specifics such as:

1 trillion parameters running locally on DGX Station

128GB unified memory

110 TOPS AI performance

20 CPU cores

70+ PowerToys utilities

from @TheTuringPost

Reactions also pointed to local runs of large models, e.g. @kimmonismus on RTX Spark running a 120B parameter model locally

Build product/platform recap beyond the models

GitHub Copilot app and agent-native development

GitHub unveiled the GitHub Copilot app, pitched as a desktop surface for agent-native software development by @pierceboggan

Key themes included:

canvases for bidirectional work between users and agents, per @Techmeme

continuity across CLI, mobile, web, local, and cloud, per @lukehoban

a growing role for GitHub as the center of agent workflows, reflected in @techgirl1908 and @OrenMe

Copilot CLI also got an experimental terminal UI with tabs, built-in feedback/rubber duck, prompt scheduling, and voice input, per @GHchangelog

Windows as an agent runtime

Microsoft’s Windows org framed Build around “faster developer execution, a secure execution layer for agents, and unmetered intelligence that runs locally on device,” per @yusuf_i_mehdi

Several posts stressed that Microsoft wants Windows to be the trusted execution platform for agents, not just Azure

@TheTuringPost described Project Solara as a platform for agent-first devices, with concepts including:

a desktop AI companion

a wearable badge with cameras, microphones, sensors, and secure authentication

@kimmonismus saw these as handheld/desktop devices for controlling agents and compared them to expectations people had for standalone OpenAI hardware

@kimmonismus separately highlighted Microsoft Scout as an “always-on personal agent for work”

Web IQ and search for agents

@JordiRib1 announced Microsoft Web IQ as a suite of AI-native grounding APIs for web pages, news, images, and videos

この記事をシェア

Sebastian Raschka重要度42026年7月18日 20:16

OpenAI、GPT-5.6 で推論コストを制御可能に

Smol AI News重要度42026年7月17日 14:44

AI ニュース：今日も静かな日

Smol AI News重要度52026年7月16日 14:44

AI ニュース：今日も静かな一日

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Smol AI News·2026年6月2日 14:44·約14分

Microsoft Build：MAI-Thinking-1 と MAI ファミリーモデル、Surface RTX Spark Dev Box、Windows の OpenClaw を発表

#Reasoning Models #MAI-Thinking-1 #Local AI #Model Transparency #Microsoft

TL;DR

AI深層分析2026年6月3日 18:05

重要/ 5段階

深度40%

キーポイント

MAI モデルファミリーの発表とフラグシップ推論モデル

透明性を重視した技術報告書の公開

ローカル AI と Windows エージェント基盤の強化

重要な引用

Microsoft used Build to position itself as both an AI platform company and a frontier-model lab

built with clean data lineage and zero distillation from third-party models

Microsoft released a 109-page technical report for MAI-Thinking-1

影響分析・編集コメントを表示

影響分析

編集コメント

静かな一日。

AI Twitter リキャップ

トップストーリー：Microsoft Build の振り返りと、新しい MAI モデルの技術詳細**

何が起きたか

Microsoft AI は、推論、コード生成、画像処理、音声文字起こし、音声合成の各分野をカバーする 7 つの新規 MAI モデルを発表しました。これらは @MicrosoftAI と @mustafasuleyman によると、MAI-Thinking-1、MAI-Code-1-Flash、MAI-Image-2.5、MAI-Transcribe-1.5、MAI-Voice-2 を中心に構成されています。

フラグシップとなる推論モデル「MAI-Thinking-1」は、Microsoft の初の推論モデルとして紹介されました。@mustafasuleyman、@baseten、@tuhinone、@HannaHajishirzi による投稿によると、このモデルはクリーンなデータ系譜（data lineage）に基づいて構築され、第三者のモデルからの蒸留（distillation）を一切行っていないことが強調されています。

Microsoft は MAI-Thinking-1 に関する 109 ページにわたる技術報告書を公開し、その透明性の高さから @eliebakouch, @ethanCaballero, @nrehiew_, @yacinelearning, @stochasticchasm といった技術志向の読者たちから強い肯定的な反応を引き出しました。

Microsoft はまた、ローカル AI とエージェントネイティブな Windows を強調しました。Build のメッセージでは、エージェント向けの安全な実行レイヤー、新しい Surface RTX Spark Dev Box、Windows GPU インストールベース全体への Windows AI アクセス、そして Project Solara/Scout といったコンセプトハードウェアが取り上げられ、これらは @yusuf_i_mehdi, @TheTuringPost, @kimmonismus によって要約されました。

Build ではまた、キャンバス機能、デバイス間での継続性、より緊密な GitHub エージェントワークフローを特徴とする「エージェントネイティブなソフトウェア開発のためのデスクトップホーム」として、GitHub Copilot アプリの主要な推進も含まれており、これは @pierceboggan, @lukehoban によるもので、@techgirl1908 からの反応もありました。

Microsoft は Web IQ を導入しました。これは AI エージェント向けの新しいグラウンディング/検索 API スタックであり、同社は「業界のほぼすべての AI エージェントやチャットボット（Copilot や ChatGPT など）を既に支えている」と主張しています。この情報は @JordiRib1 経由で伝えられました。

Satya Nadella は Build を単一の製品発表ではなくエコシステム全体の転換点として位置づけ、Mustafa Suleyman はそれを Microsoft の内部における「ヒルクライミングマシン」の出力であると捉えました。これらは @satyanadella, @mustafasuleyman によるもので、@nrehiew_ からの反応もありました。

MAI モデルファミリー：開示された事実と技術詳細

MAI-Thinking-1

Microsoft は、@mustafasuleyman の投稿において、MAI-Thinking-1 を 350 億のアクティブパラメータを持つ MoE（Mixture of Experts）モデルであり、コンテキストウィンドウは 256K と説明しました。

@scaling01 による別の要約では、このモデルは 1T@35B パラメータ構成で、30T トークンで事前学習され、8192 台の GB200 GPU を使用してトレーニングされたとしています。これは Microsoft のマーケティングコピーではなく、技術レポートの解釈であるようです。

@kimmonismus も同様に、アクティブパラメータが 45B の中規模 MoE と要約していますが、これは Mustafa 自身が示した 35B という数値と矛盾します。この投稿セットにおいてより権威ある数値は、公式の 35B アクティブという数字です。

Microsoft は、AIME 2025 で 97%、SWE-Bench Pro で 53% のスコアを達成したと主張しており、Surge での盲検ヒューマン評価では Sonnet 4.6 を全体的に上回っていると @mustafasuleyman と @asadovsky が伝えています。

Microsoft は、このモデルが MAIA 200 で最適化されており、MAI モデルをエンドツーエンドで実行する際、GB200 に比べてドルあたりのパフォーマンスが 30% 向上し、ワットあたりのパフォーマンスが 1.4 倍になると述べています。これは @mustafasuleyman の情報です。

Microsoft とパートナー企業は繰り返し、第三者による蒸留を行っていないこと、「クリーンなデータ系譜」の維持、および Baseten を通じたトレーニング後のデータで「100% 目を通さない（非人間介入）」状態でのエンタープライズ制御型ファインチューニングを強調しました。これは @baseten、@tuhinone、@MicrosoftAI の投稿における内容です。

MAI-Code-1-Flash

Microsoft は、VS Code や GitHub Copilot CLI 向けの高速コーディングモデルとして MAI-Code-1-Flash を紹介しました。これは最初に @pierceboggan によって発表され、その後 @mariorod1 によって注目されました。

@mustafasuleyman による公式 Microsoft のメッセージによると、Code-1-Flash はわずか 5B パラメータでありながら SWE-Bench Pro で 51% を達成し、Haiku クラスのサイズ/コストに近い位置づけとなっています

@scaling01 による競合する要約では、これは 137B パラメータの MoE（Mixture of Experts：専門家混合モデル）であり、256K のコンテキスト長を持ち、10T トークン以上でトレーニングされたものであり、「Claude 4.5 Haiku よりも強力かつ効率的である」とされています。これはおそらく総パラメータ数ではなく、アクティブなパラメータ数が 5B であることを示唆しており、ツイートはこの区別を完全に整合させるものではありませんが、はるかに大きな MoE の内部に小さなアクティブ・フットプリントがあることを合わせて示唆しています

@scaling01 と @mariorod1 によると、ローンチ時の利用可能性は GitHub Copilot および VS Code を第一とするものとして強調されました

MAI-Image-2.5

Microsoft は MAI-Image-2.5 とその Flash バリアントを発表し、両方ともリーダーボードで 2 位を獲得したと主張しています。@mustafasuleyman はこれらが画像編集において Nano Banana 2 を上回ると述べています

独立したリーダーボードアカウントがその高い順位を支持しました：@arena は Image Edit Arena で 1401 のスコアで 2 位となり、Nano Banana 2、Grok Imagine、ChatGPT Image Latest HF よりも 10 ポイント上回ったと報告しています

@arena はさらに、MAI-Image-2.5 が「パレートフロンティアを前進させた」と述べており、これはその価格帯のモデルの中で、そのベンチマークでより高いスコアを出すものがないことを意味します

配布パートナーはすぐに続き、@OpenRouter や @fal などを含んでいます

MAI-Transcribe-1.5

@ArtificialAnlys は、MAI-Transcribe-1.5 が STT（Speech-to-Text：音声認識）の最前線において、異常に強力な速度と精度のポイントであると報告しました：リアルタイムの約 276 倍、AA-WER（Average Word Error Rate：平均単語誤り率）は 2.4%、リーダーボード全体で 3 位です

このモデルは英語、フランス語、アラビア語、日本語、中国語を含む 43 か国語に対応しており、@ArtificialAnlys によると、名前や医療用語などの稀な用語に対するキーワードバイアス機能もサポートしています。

@ArtificialAnlys によると、Microsoft Foundry を通じた料金は、音声 1,000 分あたり 6 ドルと報告されています。

OpenRouter も、@OpenRouter で同日に稼働させた 3 つの MAI 関連モデルの一つとしてこのモデルをリストアップしています。

MAI-Voice-2

MAI-Voice-2 は、Microsoft の「7 つのモデル」の傘下および、@OpenRouter の利用可能状況投稿に含まれています。

このツイートセットには、ローンチや可用性に関する情報以外に、Voice-2 自体の詳細な技術情報はほとんど含まれていません。

研究者にとって重要な技術レポートの詳細

なぜこの報告書が注目されたのか

主要な技術的な反応は、Microsoft が通常よりも詳細なフロンティアモデルの報告書を公開したという点でした。@eliebakouch はこれを「この規模のモデルとしては最も透明性の高いものの一つ」と呼び、@nrehiew_ は「今日の LLM（大規模言語モデル）トレーニングのための更新された教科書として本当に役立つだろう」と述べ、@stochasticchasm はこれを「金鉱」と表現しました。

複数の読者が、この報告書がパイプラインの詳細、スケーリングラダーの手法、データキュレーション、インフラ指標、MFU（モデルフロップル利用率）の数値を明らかにした点を強調しました。@ethanCaballero、@eliebakouch、@nrehiew_ からの称賛を引き出したのは、このレベルの具体性です。

プリートレーニングとデータ

コメント全体を通じて繰り返される主要な技術的主張は、MAI-Thinking-1 がポストトレーニングだけでなく、開示されたパイプライン全体において合成データも蒸留も使用していないという点であり、これは @eliebakouch、@stochasticchasm、@HannaHajishirzi によって指摘されています。

@eliebakouch は、レポートが Common Crawl および非公開ソースからのデータを明記しており、異なるドメイン向けのターゲットサブパイプライン、大規模な抽出・重複除去作業、そして合成データを使用しないという意図的な選択を記載していると述べています。

スケーリング判断に使用されたレポート内の非公開 NLL 集合は、@eliebakouch によって以下のように要約されました:

50% コード

17.5% STEM（科学・技術・工学・数学）

17.5% 数学

10% 一般知識

5% 多言語

@eliebakouch によると、スケーリングラダーにおけるアーキテクチャの昇格は、効率性向上（Efficiency Gain: EG）指標に基づいて行われました。これは、ベースラインが候補モデルの損失に追いつくために必要な追加計算リソースの量を指します。

同じスレッドでは、約 100/200 トークン per パラメータにおけるアブレーション（除去実験）が言及されており、この設定においては「Chinchilla optimal」と概ね一致すると説明されています。ただし、@eliebakouch は MoE（Mixture of Experts：専門家混合）構造のため、これは密結合モデルのヒューリスティックとは異なるとも指摘しています。

ポストトレーニング / RL

最も議論された技術的選択は、Microsoft が事前の推論経験のないチェックポイントから強化学習（RL: Reinforcement Learning）を開始した点であり、これを読者の多くが注目すべき点として挙げています。@stochasticchasm はこれを「非常に興味深い決定」と呼び、また @stochasticchasm はグラフに基づき AIME25 のスコアが 20% から 95% 以上に急上昇しているという示唆に対して反応しました。

@HannaHajishirzi は、「ゼロから積み上げる」レシピを、シンプルなレシピ、厳密な科学、自己蒸留、忍耐、そして優れたインフラと説明しました

@soldni はこのプロセスを「蒸留なしで登る、ビッグボーイズのように」と特徴づけました

一部の独立した読者は、マイクロソフトがここではあえてこれを避けたとしても、合成データは広範な分野におけるエージェント性能において依然として非常に価値があると報告から推測しました。詳細は @stochasticchasm を参照してください

データキュレーション / ジャッジ / DSPy GEPA

DSPy や後期相互作用のコミュニティから大きな注目を集めた詳細の一つ：マイクロソフトは、事前学習データのキュレーションと品質スコアリングにおいて、GEPA/DSPy 最適化された LLM（大規模言語モデル）ジャッジを使用したと報じられています

これは @bj2rn、@LakshyAAAgrawal、@lateinteraction によって強調されました

インフラ / 利用率 / ハードウェア共設計

マイクロソフトは反復ごとの MFU（モデルフロップス利用率）の正確な数値を明らかにしたと報じられており、複数の読者はこれがこの規模ではめったに共有されないものであると述べています。@eliebakouch によるものです

@scaling01 は今回の実行が 8192 GB200 GPU を使用したと要約しました

@eliebakouch は、ワットあたりのスループットが約 40% 高いという報告された数値を「マイクロソフトのチップに対して非常に印象的で楽観的である」と指摘しましたが、これはラックレベルの予算またはサービング構成を指している可能性があり、ツイート内で完全に解明されていませんでした

マイクロソフトの公式な見解では、モデル設計を MAIA 200 カスタムシリコンに結びつけ、@mustafasuleyman の発言において NVIDIA GB200 と比較してドルあたりのパフォーマンスおよびワットあたりのパフォーマンスが向上したことを強調しました

Build のより広範な Windows/ローカル AI の物語は、ハードウェアの詳細にも焦点を当てていました。例として：

DGX Station でローカルに実行される 1 兆パラメータのモデル

128GB の統合メモリ
110 TOPS の AI パフォーマンス（AI performance）
20 コアの CPU
70 以上の PowerToys ユーティリティ

@TheTuringPost より

反応はまた、RTX Spark 上でローカルに 120B パラメータのモデルを実行する @kimmonismus のような、大規模モデルのローカル実行にも言及していました。

モデルを超えた Build の製品/プラットフォーム要約

GitHub Copilot アプリとエージェントネイティブ開発

GitHub は、@pierceboggan によってエージェントネイティブなソフトウェア開発のためのデスクトップサーフェスとして紹介された「GitHub Copilot app」を公開しました。

主要なテーマには以下が含まれます：

ユーザーとエージェント間の双方向作業のためのキャンバス（canvases）、@Techmeme より

CLI、モバイル、Web、ローカル、クラウドにわたる継続性、@lukehoban より
@techgirl1908 や @OrenMe に反映されるように、GitHub がエージェントワークフローの中心として果たす役割の拡大

エージェントランタイムとしての Windows

Microsoft の Windows 組織は、Build を「より高速な開発者実行、エージェントのための安全な実行レイヤー、デバイス上でローカルで動作する無制限のインテリジェンス」という枠組みで位置づけました。@yusuf_i_mehdi より。

@TheTuringPost は、Project Solara を「エージェントファースト」デバイスのためのプラットフォームとして説明し、以下のような概念を含んでいます：

デスクトップ AI コンパニオン

カメラ、マイク、センサー、そして安全な認証機能を備えたウェアラブルバッジ

@kimmonismus はこれらをエージェントを制御するためのハンドヘルド型またはデスクトップ型デバイスと捉え、スタンドアロンの OpenAI ハードウェアに対する人々の期待と比較しました

また、@kimmonismus は Microsoft Scout を「仕事のための常時稼働型のパーソナルエージェント」として強調しました

エージェント向けの Web IQ と検索

原文を表示

a quiet day.

AI News for 06/1/2026-6/2/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews' website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Top Story: Microsoft Build recap, and new MAI model technical details

What happened

Microsoft used Build to position itself as both an AI platform company and a frontier-model lab, pairing broad product launches with unusually detailed disclosures about its new MAI model family.

Microsoft AI announced seven new MAI models spanning reasoning, code, image, speech transcription, and voice, led by MAI-Thinking-1, MAI-Code-1-Flash, MAI-Image-2.5, MAI-Transcribe-1.5, and MAI-Voice-2 according to @MicrosoftAI and @mustafasuleyman

The flagship reasoning model MAI-Thinking-1 was presented as Microsoft’s first reasoning model, built with clean data lineage and zero distillation from third-party models in posts from @mustafasuleyman, @baseten, @tuhinone, and @HannaHajishirzi

Microsoft released a 109-page technical report for MAI-Thinking-1, which drew strong positive reactions from technically oriented readers for its level of transparency, including @eliebakouch, @ethanCaballero, @nrehiew_, @yacinelearning, and @stochasticchasm

Microsoft also emphasized local AI and agent-native Windows: Build messaging highlighted secure execution layers for agents, a new Surface RTX Spark Dev Box, Windows AI access to the broader Windows GPU install base, and concept hardware such as Project Solara/Scout, summarized by @yusuf_i_mehdi, @TheTuringPost, @kimmonismus, and @kimmonismus

Build also included a major GitHub Copilot app push as the “desktop home for agent-native software development,” with canvases, cross-device continuity, and tighter GitHub agent workflows, from @pierceboggan, @lukehoban, and reactions from @techgirl1908

Microsoft introduced Web IQ, a new grounding/search API stack for AI agents, claiming the APIs already power “nearly all AI agents and chatbots in the industry today, including Copilot and ChatGPT,” via @JordiRib1

Satya Nadella framed Build as an ecosystem moment rather than a single-product launch, while Mustafa Suleyman framed it as the output of Microsoft’s internal “hill-climbing machine,” in @satyanadella, @mustafasuleyman, and reaction from @nrehiew_

MAI model family: disclosed facts and technical details

MAI-Thinking-1

Microsoft described MAI-Thinking-1 as a 35B active parameter MoE with a 256K context window in @mustafasuleyman

A separate summary from @scaling01 says the model is a 1T@35B parameter model, pre-trained on 30T tokens, and trained using 8192 GB200 GPUs; this appears to be a reading of the technical report rather than Microsoft marketing copy

@kimmonismus similarly summarized it as a mid-size MoE with 45B active params, but this conflicts with Mustafa’s own 35B active figure; the more authoritative figure in the tweet set is the official 35B active number

Microsoft claims 97% on AIME 2025 and 53% on SWE-Bench Pro, with blind human raters on Surge preferring it overall to Sonnet 4.6, from @mustafasuleyman and @asadovsky

Microsoft says the model is optimized on MAIA 200, with 30% better performance per dollar and 1.4x performance-per-watt gain versus GB200 when running MAI models end-to-end, per @mustafasuleyman

Microsoft and partners repeatedly stressed no third-party distillation, “clean data lineage,” and enterprise-controlled fine-tuning with “100% eyes-off” post-training data through Baseten, in @baseten, @tuhinone, and @MicrosoftAI

MAI-Code-1-Flash

Microsoft introduced MAI-Code-1-Flash as a fast coding model for VS Code and GitHub Copilot CLI, first announced by @pierceboggan and later highlighted by @mariorod1

Official Microsoft messaging via @mustafasuleyman says Code-1-Flash achieves 51% on SWE-Bench Pro despite having just 5B parameters, positioning it near Haiku-class size/cost

A competing summary from @scaling01 describes it as a 137B parameter MoE, 256K context, trained on 10T+ tokens, and “stronger and more efficient than Claude 4.5 Haiku.” That likely indicates 5B active parameters rather than total parameters; the tweets do not fully reconcile this distinction, but together imply small active footprint within a much larger MoE

Availability at launch was highlighted as GitHub Copilot / VS Code-first, per @scaling01 and @mariorod1

MAI-Image-2.5

Microsoft launched MAI-Image-2.5 and a Flash variant, claiming both reached #2 on leaderboards, with @mustafasuleyman saying they surpass Nano Banana 2 on image editing

Independent leaderboard accounts supported the high ranking: @arena reported #2 in Image Edit Arena with score 1401, +10 points over Nano Banana 2, Grok Imagine, and ChatGPT Image Latest HF

@arena further said MAI-Image-2.5 “advances the Pareto frontier,” meaning no model at its price tier scores higher on that benchmark

Distribution partners quickly followed, including @OpenRouter and @fal

MAI-Transcribe-1.5

@ArtificialAnlys reported MAI-Transcribe-1.5 as an unusually strong speed/accuracy point on the STT frontier: ~276x realtime, 2.4% AA-WER, #3 overall on its leaderboard

The model supports 43 languages, including English, French, Arabic, Japanese, and Chinese, and supports keyword biasing for rarer terms such as names and medical terminology, per @ArtificialAnlys

Pricing was reported as $6 per 1,000 minutes of audio via Microsoft Foundry in @ArtificialAnlys

OpenRouter also listed the model among the three MAI launches it brought live the same day in @OpenRouter

MAI-Voice-2

MAI-Voice-2 appears in Microsoft’s “seven models” umbrella and in OpenRouter’s availability post at @OpenRouter

The tweet set contains little technical detail on Voice-2 itself beyond launch/availability

Technical-report details that mattered to researchers

Why the report stood out

The dominant technical reaction was that Microsoft released an unusually detailed frontier-model report: @eliebakouch called it “one of the most transparent for a model at this scale,” @nrehiew_ said it “could really serve as an updated textbook for LLM training today,” and @stochasticchasm called it a “gold mine”

Multiple readers highlighted that the report disclosed pipeline details, scaling ladder methodology, data curation, infra metrics, and MFU numbers; this level of specificity is what drew praise from @ethanCaballero, @eliebakouch, and @nrehiew_

Pretraining and data

A major technical claim repeated across commentary is that MAI-Thinking-1 used no synthetic data and no distillation, not only in post-training but throughout the disclosed pipeline, from @eliebakouch, @stochasticchasm, and @HannaHajishirzi

@eliebakouch says the report explicitly notes data from Common Crawl plus private sources, with targeted sub-pipelines for different domains, heavy extraction/dedup work, and an intentional choice of no synthetic data

The report’s internal private NLL set used for scaling decisions was summarized by @eliebakouch as:

50% code

17.5% STEM

17.5% math

10% general knowledge

5% multilingual

@eliebakouch says architecture promotion in the scaling ladder was based on an Efficiency Gain (EG) metric: how much extra compute the baseline would need to match the candidate’s loss

The same thread notes ablations at roughly 100/200 tokens per parameter, described as around “Chinchilla optimal” for the setup, while also remarking this differs from dense-model heuristics due to MoE structure in @eliebakouch

Post-training / RL

The most discussed technical choice was that Microsoft appears to have started RL from a checkpoint with no prior reasoning exposure, which several readers found notable. @stochasticchasm called this a “very interesting decision,” while @stochasticchasm reacted to graphs suggesting a jump from <20% AIME25 to >95%

@HannaHajishirzi described the “climbing from scratch” recipe as simple recipes, rigorous science, self-distillation, patience, and great infra

@soldni characterized the process as “climbing with no distillation, like the big boys do”

Some independent readers inferred from the report that synth data remains very valuable for agentic performance in the broader field, even if Microsoft deliberately avoided it here; see @stochasticchasm

Data curation / judges / DSPy GEPA

A detail that got substantial attention from the DSPy/late-interaction crowd: Microsoft reportedly used GEPA / DSPy-optimized LLM judges in pretraining data curation and quality scoring

This was highlighted by @bj2rn, @LakshyAAAgrawal, and @lateinteraction

Infra / utilization / hardware co-design

Microsoft reportedly disclosed exact MFU across iterations, which multiple readers said is rarely shared at this scale, per @eliebakouch

@scaling01 summarized the run as using 8192 GB200 GPUs

@eliebakouch singled out a reported ~40% higher throughput per watt-type figure as “pretty impressive and bullish on microsoft chips,” though this may refer to rack-level budget or serving configuration and was not fully unpacked in-tweet

Microsoft’s official framing connected model design to MAIA 200 custom silicon and emphasized better performance-per-dollar and performance-per-watt vs NVIDIA GB200 in @mustafasuleyman

Build’s broader Windows/local-AI narrative also centered on hardware specifics such as:

1 trillion parameters running locally on DGX Station

128GB unified memory

110 TOPS AI performance

20 CPU cores

70+ PowerToys utilities

from @TheTuringPost

Reactions also pointed to local runs of large models, e.g. @kimmonismus on RTX Spark running a 120B parameter model locally

Build product/platform recap beyond the models

GitHub Copilot app and agent-native development

GitHub unveiled the GitHub Copilot app, pitched as a desktop surface for agent-native software development by @pierceboggan

Key themes included:

canvases for bidirectional work between users and agents, per @Techmeme

continuity across CLI, mobile, web, local, and cloud, per @lukehoban

a growing role for GitHub as the center of agent workflows, reflected in @techgirl1908 and @OrenMe

Copilot CLI also got an experimental terminal UI with tabs, built-in feedback/rubber duck, prompt scheduling, and voice input, per @GHchangelog

Windows as an agent runtime

Microsoft’s Windows org framed Build around “faster developer execution, a secure execution layer for agents, and unmetered intelligence that runs locally on device,” per @yusuf_i_mehdi

Several posts stressed that Microsoft wants Windows to be the trusted execution platform for agents, not just Azure

@TheTuringPost described Project Solara as a platform for agent-first devices, with concepts including:

a desktop AI companion

a wearable badge with cameras, microphones, sensors, and secure authentication

@kimmonismus saw these as handheld/desktop devices for controlling agents and compared them to expectations people had for standalone OpenAI hardware

@kimmonismus separately highlighted Microsoft Scout as an “always-on personal agent for work”

Web IQ and search for agents

@JordiRib1 announced Microsoft Web IQ as a suite of AI-native grounding APIs for web pages, news, images, and videos

この記事をシェア

Sebastian Raschka重要度42026年7月18日 20:16

OpenAI、GPT-5.6 で推論コストを制御可能に

Smol AI News重要度42026年7月17日 14:44

AI ニュース：今日も静かな日

Smol AI News重要度52026年7月16日 14:44

AI ニュース：今日も静かな一日

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

キーポイント

重要な引用

影響分析

編集コメント

AI Twitter リキャップ

何が起きたか

MAI モデルファミリー：開示された事実と技術詳細

MAI-Thinking-1

MAI-Code-1-Flash

MAI-Image-2.5

MAI-Transcribe-1.5

MAI-Voice-2

研究者にとって重要な技術レポートの詳細

なぜこの報告書が注目されたのか

プリートレーニングとデータ

ポストトレーニング / RL

データキュレーション / ジャッジ / DSPy GEPA

インフラ / 利用率 / ハードウェア共設計

モデルを超えた Build の製品/プラットフォーム要約

GitHub Copilot アプリとエージェントネイティブ開発

エージェントランタイムとしての Windows

エージェント向けの Web IQ と検索

AI Twitter Recap

What happened

MAI model family: disclosed facts and technical details

MAI-Thinking-1

MAI-Code-1-Flash

MAI-Image-2.5

MAI-Transcribe-1.5

MAI-Voice-2

Technical-report details that mattered to researchers

Why the report stood out

Pretraining and data

Post-training / RL

Data curation / judges / DSPy GEPA

Infra / utilization / hardware co-design

Build product/platform recap beyond the models

GitHub Copilot app and agent-native development

Windows as an agent runtime

Web IQ and search for agents

関連記事

キーポイント

重要な引用

影響分析

編集コメント

AI Twitter リキャップ

何が起きたか

MAI モデルファミリー：開示された事実と技術詳細

MAI-Thinking-1

MAI-Code-1-Flash

MAI-Image-2.5

MAI-Transcribe-1.5

MAI-Voice-2

研究者にとって重要な技術レポートの詳細

なぜこの報告書が注目されたのか

プリートレーニングとデータ

ポストトレーニング / RL

データキュレーション / ジャッジ / DSPy GEPA

インフラ / 利用率 / ハードウェア共設計

モデルを超えた Build の製品/プラットフォーム要約

GitHub Copilot アプリとエージェントネイティブ開発

エージェントランタイムとしての Windows

エージェント向けの Web IQ と検索

AI Twitter Recap

What happened

MAI model family: disclosed facts and technical details

MAI-Thinking-1

MAI-Code-1-Flash

MAI-Image-2.5

MAI-Transcribe-1.5

MAI-Voice-2

Technical-report details that mattered to researchers

Why the report stood out

Pretraining and data

Post-training / RL

Data curation / judges / DSPy GEPA

Infra / utilization / hardware co-design

Build product/platform recap beyond the models

GitHub Copilot app and agent-native development

Windows as an agent runtime