Smol AI News·2026年6月1日 14:44·約16分で読める

今日は何も大きな出来事はありませんでした

#物理 AI #世界モデル #Mixture-of-Transformers #オープンソース LLM #NVIDIA Cosmos

TL;DR

NVIDIA は物理 AI のためのオープンモデル「Cosmos 3」と大規模言語モデル「Nemotron 3 Ultra」を発表し、業界全体を巻き込んだオープンエコシステムの構築を加速させた。

AI深層分析2026年6月2日 14:48

重要/ 5段階

深度40%

キーポイント

NVIDIA のオープンモデル戦略と新製品発表

物理 AI（ロボットなど）向けに言語・画像・動画・音声を統合する「Cosmos 3」および、米国製オープンモデルとして最強クラスの「Nemotron 3 Ultra (550B)」をリリースし、重み・コード・データセット・学習レシピを全て公開した。

技術的革新：Mixture-of-Transformers と統合アーキテクチャ

Cosmos 3 は自己回帰推論器と拡散生成器を組み合わせたユニークな設計を採用し、テキスト・画像・動画の両方でオープンモデル界でトップクラスの性能を記録した。

エコシステム構築：Cosmos Coalition の発足

NVIDIA は Runway などのパートナーと共に「Cosmos Coalition」を設立し、OpenMDW フレームワークの採用や fal などのプラットフォーム連携を通じて、オープンな物理 AI エコシステムの基盤作りを進めている。

影響分析・編集コメントを表示

影響分析

この記事は、物理 AI（ロボティクスや世界モデル）におけるオープンソースの基準が NVIDIA の主導で再定義されたことを示しています。特に、単一のモデルで多様なモダリティを統合し、かつ完全な開発リソースを公開した点は、研究コミュニティと産業界の両方に即座に実装可能な新しいパラダイムを提供するものであり、今後の AI 開発の方向性を大きく左右する重要な転換点です。

編集コメント

「静かな一日」というタイトルとは裏腹に、物理 AI とオープンソースの両面で業界の地殻変動を引き起こす極めて重要なニュースでした。NVIDIA が単なるハードウェアベンダーから、オープンな世界モデルのエコシステム構築者へと役割を明確に変化させた点が見逃せません。

静かな一日。

2026年5月30日〜6月1日のAIニュース。私たちは12のサブレッド、544件のツイート、そしてDiscordはさらに確認していませんでした。AINewsのウェブサイトでは過去のすべての号を検索できます。念のため、AINewsは現在Latent Spaceの一部門となっています。メール配信頻度のオプトイン・オプトアウトも可能です！

AI Twitterリキャップ

NVIDIAのCosmos 3、Nemotron 3 Ultra、そしてオープン物理AIへの推進力

NVIDIAのオープンソースウィーク：NVIDIAは、物理AI向けのオムニモーダルな世界モデルのオープンファミリーであるCosmos 3と、数人の投稿者が「これまでの米国製オープンモデルの中で最強」と評した550BパラメータのオープンウェイトモデルであるNemotron 3 Ultraの発表により、オープンモデルに関する議論を主導しました。Cosmos 3は、重み（weights）、コード、データセット、ファインチューニングレシピを含むフルスタックリリースとして位置づけられ、NVIDIAはまたRunwayなどのパートナーと共に「Cosmos Coalition」を立ち上げ、世界モデルのためのオープンエコシステムを構築しています @NVIDIAAI エコシステムコンテキスト、@runwayml コーリション発表、@kimmonismus Cosmosスレッド、@ClementDelangueによるNVIDIAのHugging Face（HF）フットプリントについて。

なぜ Cosmos 3 が技術的に重要だったのか：ロボティクスに関する修辞を超えて、より具体的な詳細は、Cosmos 3 が言語、画像、動画、音声、そしてアクションを単一の Mixture-of-Transformers（混合トランスフォーマー）設計で統合し、自己回帰型推論器と拡散生成器を組み合わせた点にあります。Artificial Analysis は、Cosmos 3 が Text-to-Image（テキストから画像へ）および Image-to-Video（画像から動画へ）のリーダーボードにおいてオープンウェイトモデルの中で第 1 位に到達したと報告し、その生成器が構造化された JSON プロンプトを使用しており、外部のプロンプアップサンプリングハーンチスによって駆動されるか、あるいは自身の推論器ブランチによって駆動され得ると指摘しています。一方、NVIDIA のハードウェアおよびソフトウェアによる推進は、OpenMDW フレームワークの採用や、fal @ArtificialAnlys、@fal などのプラットフォームにおけるパートナーエコシステムとの統合へと拡大しました。

Nemotron 3 Ultra の受容：Nemotron 3 Ultra に対するコミュニティの反応は、新しいオープンリリースとしては異例に強固なものでした。投稿者たちは、その能力とサービング特性の両方を強調し、すでにいくつかのオープン評価で首位を占めているという主張や、一部のセットアップでは 300 トークン/秒以上で動作している可能性を示唆するものを含んでいました。これは、大規模な DeepSeek/Kimi クラスのモデル @scaling01、@ctnzr、@caspar_br と比較してはるかに高速です。また、Nemotron は Kimi K2 や DeepSeek V4 などの競合他社と比較してスパース性が低い（約 10% がアクティブであるのに対し、競合は約 3%）という技術的な議論もあり、これは経済性や動作特性の両方に影響を与える可能性があります @eliebakouch。

MiniMax M3、Qwen3.7-Plus、JetBrains Mellum2 がオープンエージェントモデル分野を拡大

MiniMax M3 の発表は、本日の最大のモデルリリースでした。M3 は、100 万トークンのコンテキスト長とネイティブなマルチモーダル性を備え、競争力のあるエージェントベンチマークを達成するオープンウェイトのマルチモーダルエージェント/コーディングモデルとして紹介されました。発表パートナー間で繰り返し強調された主要数値は、SWE-Bench Pro で 59.0%、Terminal Bench 2.1 で 66.0%、MCP Atlas で 74.2% です（@MiniMax_AI, @PBDTokenRouter, @kimmonismus）。Novita、Vercel AI Gateway、Cloudflare AI Gateway、OpenClaude、Flowith など複数のインフラベンダーが当日サポートを提供し、@MiniMax_AI の Novita 上での採用や @rauchg、@gitlawb による動向から、極めて迅速なエコシステムへの浸透が示唆されています。

ベンチマークと実務経験の比較は賛否両論でした：M3 はフロントエンド生成、視覚・ゲーム関連タスク、価格対性能において称賛を集め、並列デモではワンショットでの UI/ゲーム出力が強く、Next.js エージェント評価（Next.js agent evals）におけるベンチマーク順位も目立っていました @notjazii, @lostinlatencyX, @rauchg。しかし複数の評価者からは、トークン消費量が非常に多いこと、自己検証ループが冗長であること、長時間のタスクでは要件が時折ずれることが報告され、M3 は「品質優先で効率はその次」というモデルのように見えるとの指摘もありました @ZhihuFrontier によるレビュー、@teortaxesTex の懐疑論。

Qwen3.7-Plus：アリババは、GUI と CLI の操作、視覚的推論、コーディング、検索拡張 QA を統合したマルチモーダル・インタラクティブ・ハイブリッドエージェントとして「Qwen3.7-Plus」をリリースしました。これはアリババクラウドの Model Studio を経由して API 利用可能となり、Cline @Alibaba_Qwen のようなツールにも迅速に追加されました（@cline）。今回の発表は、オープン志向のアジア系ラボがもはや「チャットモデルだけ」をリリースするのではなく、エージェント機能を備えたフルマルチモーダルシステムをリリースするというトレンドを強化するものです。

JetBrains Mellum2：JetBrains は、12B の MoE モデルで 2.5B のアクティブパラメータを持ち、約 11T トークンでトレーニングされ、RLVR（Reinforcement Learning from Verifiable Rewards）によるポストトレーニングを経て、ベースモデル/SFT/RL チェックポイントおよび技術レポートをリリースしました（@nv_pavlichenko, @jetbrains）。特に興味深いのはその狙い市場です：ルーティング、RAG（Retrieval-Augmented Generation）、サブエージェント、IDE 利用における超低遅延推論であり、vLLM に即座に実装されました（@vllm_project）。これはベンチマーク追従の最先端リリースというよりは、「開発者ワークフロー向けの小規模で高速なオープンモデル」という真剣な取り組みに見えます。

エージェント、サンドボックス、メモリ、検索が真のプロダクト表面となりつつある

スタックはモデル呼び出しからエージェントランタイムへとシフトしています：複数のローンチで、主要なエンジニアリングのレバレッジはモデルではなくハネスにあるという考えに収束しました。Perplexity の「Search as Code」が最も明確な例です：反復的な検索ツール呼び出しの代わりに、モデルが検索 SDK に対して Python を記述し、カスタムランキングパイプライン、インデックス上のマップ・リデュース、バッチ処理、集約、およびトークンオーバーヘッドの削減を可能にします。Perplexity はこのアーキテクチャにより、内部 WANDR ベンチマークで 0.152 から 0.386 への飛躍的な向上を報告しています @perplexity_ai, @AravSrinivas。

マネージドエージェントとサンドボックスが標準化されつつあります：Google は Gemini API で Managed Agents を詳細に説明し、単一の API 呼び出しで推論を行い、コードの記述・実行、ファイル管理を行い、ホストされた Linux サンドボックス内で動作するエージェントを起動できる機能を提示しました @_philschmid, @GoogleAIStudio。LangChain も Deep Agents、Context Hub、LangSmith Sandboxes/Engine 周辺で同様のアイデアを推進し、永続的なコンテキスト、エージェントライフサイクルツール、自動化された障害トリアージを強調しています @LangChain, @hwchase17。

メモリは依然として欠落したプリミティブである：繰り返される不満の一つに、巨大なコンテキストウィンドウでもセッション間メモリを解決できないという点がある。HydraDB のスレッドでは、「RAG + 手動コンテキスト注入」がメモリと誤って呼ばれてきた一方で、実際の永続的なセッション知識は依然として不十分であると @kimmonismus が指摘している。関連する研究スレッドでは、AdaCoM のような再利用可能なコンテキスト管理ポリシーが紹介された。これは凍結されたエージェントのために RL（強化学習）を用いて別の LLM を訓練し、コンテキストの剪定・保持を行うものである @dair_ai。

セキュリティはエンタープライズエージェントにおけるボトルネックとなっている：Microsoft Security Intelligence からは、90 以上の redhat-cloud-services パッケージに影響を与える大規模な npm サプライチェーン侵害に関する重要な警告が出された。これには、npm/GitHub/AWS/SSH の認証情報を盗む自己増殖型ワームも含まれている @MsftSecIntel。同時に、エンタープライズエージェントベンダーは、デプロイの前提条件としてサンドボックス化、ランタイム分離、セキュリティスタックとの統合を強調した。これには NVIDIA OpenShell や LangChain のサンドボックス基調講演に関する議論も含まれており、@shannholmberg、@LangChain が言及している。

Codex, Claude Code, および競争的なコーディングエージェントの戦い

OpenAI は Codex をより多くの場所に拡張しました：OpenAI は、フロンティアモデルと Codex が現在 AWS / Amazon Bedrock で一般利用可能になったと発表し、既存の AWS セキュリティ/コンプライアンスワークフロー内に OpenAI の機能を導入したい企業を明確に狙っています @OpenAI, @OpenAIDevs。また、スレッド、ターン、ストリーミング、再開、画像、サンドボックス制御をサポートする Codex Python SDK をリリースし、Bedrock 設定における Bedrock ベースの Codex ワークフローもサポートしました @reach_vb, @reach_vv。

Claude Code で実際の運用インシデントが発生しました：Anthropic は、一部の Opus 4.8 セッションで並列サブエージェント/ツール呼び出しが過剰に生成され、予期せぬ使用量が消費されるバグを修正した後、Pro および Max ユーザーに対して 5 時間および週間のレート制限をリセットしました @ClaudeDevs, follow-up。これは、コーディング・エージェント製品の品質が、単なるモデルの知能（IQ）だけでなく、オーケストレーションの動作によってますます決定されるという重要な reminder です。

コーディングモデル間での振る舞いの違いは依然として重要です：開発者は、ProgramBench や WeirdML などのベンチマークにおいて、GPT、Claude、および他のモデル間に大きな質的な差異があると指摘しました。Opus は場合によってはスコア最大化よりも探索を好む傾向を示したり、ベンチマーク固有の癖を見せたりします @OfirPress, @htihle。別の長いスレッドでは、新しい Claude Opus 4.6–4.8 バリアントが非コーディング領域において、妥当だが架空の概念を捏造する可能性があり、これは通常のハルシネーションではなく、真実性やアライメントの退行を示唆しているという主張がありました @distributionat。

インフラ、ハードウェア、およびローカル AI システム

NVIDIA が PC に本格的に参入：最も議論を呼んだハードウェア発表は、Grace と Blackwell を基盤とした NVIDIA/Microsoft 共同の「パーソナル AI コンピューター」RTX Spark です。最大 128GB の統合メモリを搭載し、FP4 で 1 PFLOP の性能を謳っています。重要な戦略的示唆：NVIDIA はもはやアクセラレーターを販売するだけでなく、Apple Silicon、x86 PC、Qualcomm と同時に競合するエンドツーエンドのローカル AI システムを提供するようになりました @kimmonismus, @swyx。

クラスター/ネットワーク関連の最新情報：データセンター側では、Lambda が NVIDIA Quantum-X InfiniBand Photonics Q3450-LD スイッチを採用した最初の企業となり、大規模 AI クラスターにおけるネットワーク電力と障害を削減するためにコパッケージド光学（co-packaged optics）を推進しています @LambdaAPI。また OpenAI は、密閉型冷却システムを採用し、人材育成・教育へのコミットメントとセットで計画されている 1GW データセンター「Stargate Michigan」を発表しました @OpenAINewsroom。

ローカル向けオープンモデルツールの進化が加速：MLX-VLM v0.6.0 のリリースは、推論およびツールリングに関する実りあるローカルアップデートの一つでした。これにはスペキュラティブ・ディコーディング（speculative decoding）、Anthropic 風およびレスポンススタイルの API、ツール呼び出し機能、多数の新規マルチモーダルモデルへの対応、画像・音声機能の追加が含まれており、Apple デバイスを「真のローカルエージェントマシン」へと変えることを明確な狙いとしています @Prince_Canuma。これは、ローカル NVFP4 MoE（Mixture of Experts）推論サービスにおける DGX Spark と vLLM の実験動向と相性が良く、@vllm_project が推進しています。

エンゲージメント上位のツイート（技術的関連性をフィルタリング）

アンソロピックの IPO への道筋：アンソロピックは、SEC に非公開でドラフト S-1 を提出したと発表し、審査次第で IPO（株式公開）への扉を開いたと述べています @AnthropicAI。

クロード・コード利用に関するインシデント：Opus 4.8 の並列サブエージェント/ツール呼び出しバグによりクォータが過剰に消費されたため、アンソロピックはユーザーのレート制限をリセットしました @ClaudeDevs。

Qwen3.7-Plus：アリババは、GUI/CLI 操作、コーディング、視覚タスクを跨ぐマルチモーダルエージェントモデル「Qwen3.7-Plus」を発表しました @Alibaba_Qwen。

OpenAI の Bedrock 対応：OpenAI のモデルおよび Codex が、エンタープライズワークフロー向けに Amazon Bedrock で利用可能になりました @OpenAI。

ARC-AGI-3 の動向：Claude Opus 4.8 は ARC-AGI-3 ベンチマークで新たな SOTA（最良の性能）を記録し、1.5% を達成しました。絶対値としては依然として微小ですが、このベンチマーク上では意味のある飛躍です @arcprize。

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. New Frontier Model Releases and Early Tests

MiniMax M3 - コーディングとエージェントの最前線、1M コンテキスト、マルチモーダル（アクティビティ：1090）: MiniMax M3 は、コーディングおよびエージェント機能に焦点を当てたオープンウェイトのフロンティアモデルとして発表されました。ネイティブのマルチモーダル性・ビジョン機能を備え、MiniMax Sparse Attention により最大 1M トークンのコンテキスト（最低でも 512K が保証）をサポートします（MiniMax M3）。主張されている長期的なエージェント機能の結果には、ICLR の論文を 12 時間かけて再現した事例や、Hopper FP8 GEMM CUDA/Triton 最適化において 147 回の反復後に 9.4 倍の高速化を達成したもの、PostTrainBench で Opus 4.7 と GPT-5.5 に次いで第 3 位を獲得した結果が含まれます。現在は API や MiniMax Code を通じてアクセス可能ですが、HuggingFace/GitHub でのウェイト公開やローカルデプロイは計画されています。コメント投稿者たちは、安価で効率的なビジョン機能と長文コンテキストを活用したエージェント型コーディングの組み合わせに慎重に関心を示していますが、「オープンウェイト」と呼びながらまだウェイトもパラメータ数すら公開されていないという点については懐疑的です。技術的な議論の一つとして、これらの結果が約 250B を大幅に超える巨大モデルを示唆しているのか、極端なベンチマーク最適化の結果なのか、それとも真のオープンウェイトにおける画期的突破なのかという点が挙げられています。

コメント投稿者たちは、リリースの詳細が欠落している点に焦点を当てました。「3 つのフロンティア機能を備えた最初のオープンウェイトモデル」という主張にもかかわらず、ユーザーは MiniMax M3 の実際のウェイトやパラメータ数、サイズに関する情報を発見できませんでした。ある投稿者は発表からプレビュー画像（Reddit 画像）へのリンクを貼りましたが、スレッド内では依然としてモデルの規模確認やダウンロード可能なアーティファクトの確認はなされていません。

技術的に実質的な懸念点は、宣伝されている能力レベルが3つの可能性のいずれかを暗示していることです：予想よりはるかに大きなモデル、異常に強いベンチマーク最適化、あるいは主要なオープンウェイトのブレークスルーです。議論は、MiniMax M3 が実際には約250Bパラメータ程度なのか、それともさらに大幅に大きいのかが焦点であり、また重みと独立したベンチマークが利用可能になった時点で、そのコーディング/エージェント/マルチモーダルに関する主張が成立するかどうかでした。

NVIDIA announces Nemotron 3 Ultra (Activity: 621): この画像は、NVIDIA Nemotron 3 Ultra の技術発表スライドであり、コメントではMoE 550B-A55モデルとして説明されています。このスライドは、「Frontier Smart」ベンチマークカテゴリ（エージェントの生産性、コーディングなど）において、Nemotron 3 Ultra を GLM 5.1, Kimi K2.6, Qwen3.5といったオープン/オープンウェイトの競合他社と比較しています。

原文を表示

a quiet day.

AI News for 5/30/2026-6/1/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews' website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

NVIDIA’s Cosmos 3, Nemotron 3 Ultra, and the Push for Open Physical AI

NVIDIA’s open-source week: NVIDIA dominated the open-model conversation with Cosmos 3, an open family of omnimodal world models for physical AI, plus the announcement of Nemotron 3 Ultra, a 550B open-weight model that several posters called the strongest U.S. open model so far. Cosmos 3 was framed as a full-stack release—weights, code, datasets, and fine-tuning recipes—with NVIDIA also launching the Cosmos Coalition alongside partners including Runway to build an open ecosystem for world models @NVIDIAAI ecosystem context, @runwayml coalition announcement, @kimmonismus Cosmos thread, @ClementDelangue on NVIDIA’s HF footprint.

Why Cosmos 3 mattered technically: Beyond robotics rhetoric, the more concrete details were that Cosmos 3 unifies language, image, video, audio, and action in a single Mixture-of-Transformers design pairing an autoregressive reasoner with a diffusion generator. Artificial Analysis said Cosmos 3 reached #1 among open-weight models on both their Text-to-Image and Image-to-Video leaderboards, noting the generator uses structured JSON prompts and can be driven either by an external prompt-upsampling harness or its own reasoner branch. Separately, NVIDIA’s hardware + software push extended to adoption of the OpenMDW framework and partner ecosystem integrations on platforms like fal @ArtificialAnlys, @fal.

Nemotron 3 Ultra reception: Community reaction to Nemotron 3 Ultra was unusually strong for a fresh open release. Posters highlighted both capability and serving characteristics, including claims that it is already topping some open evals and may be serving at 300+ tok/s in some setups—far faster than large DeepSeek/Kimi-class models @scaling01, @ctnzr, @caspar_br. There was also some technical discussion that Nemotron appears less sparse than peers like Kimi K2 / DeepSeek V4—roughly ~10% active vs ~3%—which could affect both economics and behavior @eliebakouch.

MiniMax M3, Qwen3.7-Plus, and JetBrains Mellum2 Expand the Open Agent Model Field

MiniMax M3’s launch was the day’s biggest model release: M3 was presented as an open-weight multimodal agent/coding model with 1M context, native multimodality, and competitive agent benchmarks. The headline figures repeated across launch partners were 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, and 74.2% MCP Atlas @MiniMax_AI, @PBDTokenRouter, @kimmonismus. Multiple infra vendors shipped day-0 support—Novita, Vercel AI Gateway, Cloudflare AI Gateway, OpenClaude, Flowith, and others—suggesting unusually fast ecosystem adoption @MiniMax_AI on Novita, @rauchg, @gitlawb.

Benchmarks vs practical experience were mixed: M3 earned praise for frontend generation, visual/game tasks, and price-performance, with side-by-side demos showing strong one-shot UI/game outputs and notable benchmark placement for Next.js agent evals @notjazii, @lostinlatencyX, @rauchg. But several evaluators also reported high token consumption, verbose self-check loops, and occasional requirement drift on long tasks, making M3 look more like a “quality first, efficiency later” model @ZhihuFrontier review, @teortaxesTex skepticism.

Qwen3.7-Plus: Alibaba launched Qwen3.7-Plus as a multimodal interactive hybrid agent that unifies GUI and CLI operation, visual reasoning, coding, and search-augmented QA. It is API-available via Alibaba Cloud Model Studio and was quickly added to tools like Cline @Alibaba_Qwen launch, @cline. The launch reinforces the trend that open-ish Asian labs are no longer releasing “just chat models,” but full agent-capable multimodal systems.

JetBrains Mellum2: JetBrains released Mellum2, a 12B MoE model with 2.5B active parameters, trained on roughly 11T tokens and post-trained with RLVR, shipping base / SFT / RL checkpoints and a technical report @nv_pavlichenko, @jetbrains. The intended niche is especially interesting: ultra-low-latency inference for routing, RAG, sub-agents, and IDE use, and it landed in vLLM immediately @vllm_project. This looks like a serious “small fast open model for developer workflows” play rather than a benchmark-chasing frontier release.

Agents, Sandboxes, Memory, and Search Are Becoming the Real Product Surface

The stack is shifting from model calls to agent runtimes: Several launches converged on the idea that the main engineering leverage is now in the harness rather than the model. Perplexity’s “Search as Code” is the clearest example: instead of iterative search tool calls, the model writes Python against a search SDK, enabling custom ranking pipelines, map-reduce over indexes, batching, aggregation, and lower token overhead. Perplexity reports a jump on its internal WANDR benchmark from 0.152 to 0.386 with this architecture @perplexity_ai, @AravSrinivas.

Managed agents + sandboxes are becoming standard: Google detailed Managed Agents in the Gemini API, where a single API call can spin up an agent that reasons, writes/runs code, manages files, and operates inside a hosted Linux sandbox @_philschmid, @GoogleAIStudio. LangChain pushed similar ideas around Deep Agents, Context Hub, and LangSmith Sandboxes/Engine, emphasizing persistent context, agent lifecycle tooling, and automated failure triage @LangChain, @hwchase17.

Memory remains a missing primitive: One recurring complaint was that enormous context windows still don’t solve cross-session memory. A thread on HydraDB argued that “RAG + manual context injection” has been misnamed as memory, while actual persistent session knowledge remains underserved @kimmonismus. Related research threads pointed to reusable context management policies like AdaCoM, which trains a separate LLM via RL to prune/preserve context for frozen agents @dair_ai.

Security remains the gating issue for enterprise agents: There was a notable warning from Microsoft Security Intelligence about a major npm supply chain compromise affecting 90+ redhat-cloud-services packages, including a self-propagating worm stealing npm/GitHub/AWS/SSH credentials @MsftSecIntel. At the same time, enterprise agent vendors highlighted sandboxing, runtime isolation, and security stack integration as prerequisites for deployment, including discussion of NVIDIA OpenShell and LangChain’s sandbox keynote @shannholmberg, @LangChain.

Codex, Claude Code, and the Competitive Coding-Agent Race

OpenAI extended Codex into more places: OpenAI announced that frontier models and Codex are now generally available on AWS / Amazon Bedrock, aimed squarely at enterprises that want OpenAI capabilities inside existing AWS security/compliance workflows @OpenAI, @OpenAIDevs. OpenAI also shipped a Codex Python SDK supporting threads, turns, streaming, resume, images, and sandbox control @reach_vb, plus support for Bedrock-backed Codex workflows @reach_vb on Bedrock config.

Claude Code had a real ops incident: Anthropic reset 5-hour and weekly rate limits for Pro and Max users after fixing a bug where some Opus 4.8 sessions spawned too many parallel subagents/tool calls, burning usage unexpectedly @ClaudeDevs, follow-up. That’s a notable reminder that coding-agent product quality is increasingly determined by orchestration behavior, not just raw model IQ.

Behavioral differences across coding models remain material: Developers highlighted large qualitative differences between GPT, Claude, and other models on benchmarks like ProgramBench and WeirdML, with Opus sometimes preferring exploration over score-maximization or showing benchmark-specific quirks @OfirPress, @htihle. A separate long thread argued newer Claude Opus 4.6–4.8 variants can fabricate plausible but fictional concepts in non-coding domains, suggesting possible truthfulness/alignment regressions rather than ordinary hallucinations @distributionat.

Infra, Hardware, and Local AI Systems

NVIDIA is coming for the PC: The most-discussed hardware launch was RTX Spark, an NVIDIA/Microsoft “personal AI computer” built around Grace + Blackwell, with up to 128GB unified memory and claimed 1 PFLOP FP4. The key strategic read: NVIDIA is no longer just selling accelerators, but an end-to-end local AI system that competes with Apple Silicon, x86 PCs, and Qualcomm simultaneously @kimmonismus, @swyx.

Cluster/networking updates: On the datacenter side, Lambda said it is first to adopt NVIDIA Quantum-X InfiniBand Photonics Q3450-LD switches, pushing co-packaged optics to reduce network power and failures in large AI clusters @LambdaAPI. OpenAI also announced Stargate Michigan, a planned 1GW data center using closed-loop cooling and paired with workforce/education commitments @OpenAINewsroom.

Local open-model tooling is improving fast: The MLX-VLM v0.6.0 release was one of the more substantive local inference/tooling updates, adding speculative decoding, Anthropic-style and responses-style APIs, tool calls, support for many new multimodal models, and image/audio features with the explicit pitch of turning Apple devices into “real local agent machines” @Prince_Canuma. That pairs well with growing DGX Spark + vLLM experimentation for local NVFP4 MoE serving @vllm_project.

Top Tweets (by engagement, filtered for technical relevance)

Anthropic’s IPO path: Anthropic said it has confidentially submitted a draft S-1 to the SEC, opening the door to an IPO pending review @AnthropicAI.

Claude Code usage incident: Anthropic reset user rate limits after an Opus 4.8 parallel subagent/tool-call bug caused excessive quota burn @ClaudeDevs.

Qwen3.7-Plus: Alibaba launched a multimodal agent model spanning GUI/CLI operation, coding, and visual tasks @Alibaba_Qwen.

OpenAI on Bedrock: OpenAI models and Codex are now available through Amazon Bedrock for enterprise workflows @OpenAI.

ARC-AGI-3 movement: Claude Opus 4.8 posted a new SOTA on ARC-AGI-3 at 1.5%, still tiny in absolute terms but a meaningful jump on that benchmark @arcprize.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. New Frontier Model Releases and Early Tests

MiniMax M3 - Coding & Agentic Frontier, 1M Context, Multimodal (Activity: 1090): MiniMax M3 is announced as an open-weight frontier model with coding/agentic focus, native multimodality/vision, and MiniMax Sparse Attention for up to 1M tokens of context with a guaranteed 512K minimum (MiniMax M3). Claimed long-horizon agentic results include 12-hour ICLR paper reproduction, Hopper FP8 GEMM CUDA/Triton optimization reaching 9.4× speedup after 147 iterations, and PostTrainBench ranking third behind Opus 4.7 and GPT-5.5; access is currently via API/MiniMax Code, with HuggingFace/GitHub weights/local deployment planned. Commenters are cautiously interested in the combination of cheap/efficient vision plus long-context agentic coding, but skeptical because the announcement calls it “open-weight” while not yet exposing weights or even parameter count. One technical debate is whether the results imply a much larger-than-~250B model, extreme benchmark optimization, or a genuine open-weight breakthrough.

Commenters focused on the missing release details: despite the claim of being “the first open-weight model with three frontier capabilities”, users could not find actual weights, parameter count, or sizing information for MiniMax M3. One commenter linked a preview image from the announcement (Reddit image), but the thread still lacked confirmation of model scale or downloadable artifacts.

A technically substantive concern was that the advertised capability level implies one of three possibilities: a much larger-than-expected model, unusually strong benchmark optimization, or a major open-weights breakthrough. The speculation centered on whether MiniMax M3 is actually around ~250B parameters or significantly larger, and whether its coding/agentic/multimodal claims will hold once weights and independent benchmarks are available.

NVIDIA announces Nemotron 3 Ultra (Activity: 621): The image is a technical announcement slide for NVIDIA Nemotron 3 Ultra, described in comments as a MoE 550B-A55 model. The slide positions Nemotron 3 Ultra against open/open-weight competitors including GLM 5.1, Kimi K2.6, and Qwen3.5 across “Frontier Smart” benchmark categories such as agent productivity, coding

この記事をシェア

TLDR AI★52026年6月2日 09:00

NVIDIA、物理AI向けオープンフロンティアモデル「Cosmos 3」を発表

NVIDIA は、テキスト・画像・動画・音声・動作を扱う完全なオープン型多機能モデル「Cosmos 3」を発表した。この新モデルは推論と生成の両機能を備え、開発者が少ないデータ量で物理AIシステムを構築できる基盤を提供する。

NVIDIA Developer Blog★42026年3月14日 01:00

NVIDIA Cosmos World Foundation Modelsによる合成データのスケーリングと物理AI推論

NVIDIAは、人型ロボットや自動運転車などの次世代AI駆動ロボット向けに、高忠実度で物理法則を考慮した合成データを生成する「Cosmos World Foundation Models」を発表した。

The Decoder★32026年4月12日 21:09

研究者が世界モデルの定義を明確化、テキスト動画生成AIは除外

国際研究チームがOpenWorldLibで世界モデル研究の統一を図り、Soraなどのテキスト動画生成モデルを定義から除外した。

ニュース一覧に戻る元記事を読む

Smol AI News·2026年6月1日 14:44·約16分で読める

今日は何も大きな出来事はありませんでした

#物理 AI #世界モデル #Mixture-of-Transformers #オープンソース LLM #NVIDIA Cosmos

TL;DR

AI深層分析2026年6月2日 14:48

重要/ 5段階

深度40%

キーポイント

NVIDIA のオープンモデル戦略と新製品発表

技術的革新：Mixture-of-Transformers と統合アーキテクチャ

エコシステム構築：Cosmos Coalition の発足

影響分析・編集コメントを表示

影響分析

編集コメント

静かな一日。

AI Twitterリキャップ

NVIDIAのCosmos 3、Nemotron 3 Ultra、そしてオープン物理AIへの推進力

NVIDIAのオープンソースウィーク：NVIDIAは、物理AI向けのオムニモーダルな世界モデルのオープンファミリーであるCosmos 3と、数人の投稿者が「これまでの米国製オープンモデルの中で最強」と評した550BパラメータのオープンウェイトモデルであるNemotron 3 Ultraの発表により、オープンモデルに関する議論を主導しました。Cosmos 3は、重み（weights）、コード、データセット、ファインチューニングレシピを含むフルスタックリリースとして位置づけられ、NVIDIAはまたRunwayなどのパートナーと共に「Cosmos Coalition」を立ち上げ、世界モデルのためのオープンエコシステムを構築しています @NVIDIAAI エコシステムコンテキスト、@runwayml コーリション発表、@kimmonismus Cosmosスレッド、@ClementDelangueによるNVIDIAのHugging Face（HF）フットプリントについて。

なぜ Cosmos 3 が技術的に重要だったのか：ロボティクスに関する修辞を超えて、より具体的な詳細は、Cosmos 3 が言語、画像、動画、音声、そしてアクションを単一の Mixture-of-Transformers（混合トランスフォーマー）設計で統合し、自己回帰型推論器と拡散生成器を組み合わせた点にあります。Artificial Analysis は、Cosmos 3 が Text-to-Image（テキストから画像へ）および Image-to-Video（画像から動画へ）のリーダーボードにおいてオープンウェイトモデルの中で第 1 位に到達したと報告し、その生成器が構造化された JSON プロンプトを使用しており、外部のプロンプアップサンプリングハーンチスによって駆動されるか、あるいは自身の推論器ブランチによって駆動され得ると指摘しています。一方、NVIDIA のハードウェアおよびソフトウェアによる推進は、OpenMDW フレームワークの採用や、fal @ArtificialAnlys、@fal などのプラットフォームにおけるパートナーエコシステムとの統合へと拡大しました。

Nemotron 3 Ultra の受容：Nemotron 3 Ultra に対するコミュニティの反応は、新しいオープンリリースとしては異例に強固なものでした。投稿者たちは、その能力とサービング特性の両方を強調し、すでにいくつかのオープン評価で首位を占めているという主張や、一部のセットアップでは 300 トークン/秒以上で動作している可能性を示唆するものを含んでいました。これは、大規模な DeepSeek/Kimi クラスのモデル @scaling01、@ctnzr、@caspar_br と比較してはるかに高速です。また、Nemotron は Kimi K2 や DeepSeek V4 などの競合他社と比較してスパース性が低い（約 10% がアクティブであるのに対し、競合は約 3%）という技術的な議論もあり、これは経済性や動作特性の両方に影響を与える可能性があります @eliebakouch。

MiniMax M3、Qwen3.7-Plus、JetBrains Mellum2 がオープンエージェントモデル分野を拡大

MiniMax M3 の発表は、本日の最大のモデルリリースでした。M3 は、100 万トークンのコンテキスト長とネイティブなマルチモーダル性を備え、競争力のあるエージェントベンチマークを達成するオープンウェイトのマルチモーダルエージェント/コーディングモデルとして紹介されました。発表パートナー間で繰り返し強調された主要数値は、SWE-Bench Pro で 59.0%、Terminal Bench 2.1 で 66.0%、MCP Atlas で 74.2% です（@MiniMax_AI, @PBDTokenRouter, @kimmonismus）。Novita、Vercel AI Gateway、Cloudflare AI Gateway、OpenClaude、Flowith など複数のインフラベンダーが当日サポートを提供し、@MiniMax_AI の Novita 上での採用や @rauchg、@gitlawb による動向から、極めて迅速なエコシステムへの浸透が示唆されています。

ベンチマークと実務経験の比較は賛否両論でした：M3 はフロントエンド生成、視覚・ゲーム関連タスク、価格対性能において称賛を集め、並列デモではワンショットでの UI/ゲーム出力が強く、Next.js エージェント評価（Next.js agent evals）におけるベンチマーク順位も目立っていました @notjazii, @lostinlatencyX, @rauchg。しかし複数の評価者からは、トークン消費量が非常に多いこと、自己検証ループが冗長であること、長時間のタスクでは要件が時折ずれることが報告され、M3 は「品質優先で効率はその次」というモデルのように見えるとの指摘もありました @ZhihuFrontier によるレビュー、@teortaxesTex の懐疑論。

Qwen3.7-Plus：アリババは、GUI と CLI の操作、視覚的推論、コーディング、検索拡張 QA を統合したマルチモーダル・インタラクティブ・ハイブリッドエージェントとして「Qwen3.7-Plus」をリリースしました。これはアリババクラウドの Model Studio を経由して API 利用可能となり、Cline @Alibaba_Qwen のようなツールにも迅速に追加されました（@cline）。今回の発表は、オープン志向のアジア系ラボがもはや「チャットモデルだけ」をリリースするのではなく、エージェント機能を備えたフルマルチモーダルシステムをリリースするというトレンドを強化するものです。

JetBrains Mellum2：JetBrains は、12B の MoE モデルで 2.5B のアクティブパラメータを持ち、約 11T トークンでトレーニングされ、RLVR（Reinforcement Learning from Verifiable Rewards）によるポストトレーニングを経て、ベースモデル/SFT/RL チェックポイントおよび技術レポートをリリースしました（@nv_pavlichenko, @jetbrains）。特に興味深いのはその狙い市場です：ルーティング、RAG（Retrieval-Augmented Generation）、サブエージェント、IDE 利用における超低遅延推論であり、vLLM に即座に実装されました（@vllm_project）。これはベンチマーク追従の最先端リリースというよりは、「開発者ワークフロー向けの小規模で高速なオープンモデル」という真剣な取り組みに見えます。

エージェント、サンドボックス、メモリ、検索が真のプロダクト表面となりつつある

スタックはモデル呼び出しからエージェントランタイムへとシフトしています：複数のローンチで、主要なエンジニアリングのレバレッジはモデルではなくハネスにあるという考えに収束しました。Perplexity の「Search as Code」が最も明確な例です：反復的な検索ツール呼び出しの代わりに、モデルが検索 SDK に対して Python を記述し、カスタムランキングパイプライン、インデックス上のマップ・リデュース、バッチ処理、集約、およびトークンオーバーヘッドの削減を可能にします。Perplexity はこのアーキテクチャにより、内部 WANDR ベンチマークで 0.152 から 0.386 への飛躍的な向上を報告しています @perplexity_ai, @AravSrinivas。

マネージドエージェントとサンドボックスが標準化されつつあります：Google は Gemini API で Managed Agents を詳細に説明し、単一の API 呼び出しで推論を行い、コードの記述・実行、ファイル管理を行い、ホストされた Linux サンドボックス内で動作するエージェントを起動できる機能を提示しました @_philschmid, @GoogleAIStudio。LangChain も Deep Agents、Context Hub、LangSmith Sandboxes/Engine 周辺で同様のアイデアを推進し、永続的なコンテキスト、エージェントライフサイクルツール、自動化された障害トリアージを強調しています @LangChain, @hwchase17。

メモリは依然として欠落したプリミティブである：繰り返される不満の一つに、巨大なコンテキストウィンドウでもセッション間メモリを解決できないという点がある。HydraDB のスレッドでは、「RAG + 手動コンテキスト注入」がメモリと誤って呼ばれてきた一方で、実際の永続的なセッション知識は依然として不十分であると @kimmonismus が指摘している。関連する研究スレッドでは、AdaCoM のような再利用可能なコンテキスト管理ポリシーが紹介された。これは凍結されたエージェントのために RL（強化学習）を用いて別の LLM を訓練し、コンテキストの剪定・保持を行うものである @dair_ai。

セキュリティはエンタープライズエージェントにおけるボトルネックとなっている：Microsoft Security Intelligence からは、90 以上の redhat-cloud-services パッケージに影響を与える大規模な npm サプライチェーン侵害に関する重要な警告が出された。これには、npm/GitHub/AWS/SSH の認証情報を盗む自己増殖型ワームも含まれている @MsftSecIntel。同時に、エンタープライズエージェントベンダーは、デプロイの前提条件としてサンドボックス化、ランタイム分離、セキュリティスタックとの統合を強調した。これには NVIDIA OpenShell や LangChain のサンドボックス基調講演に関する議論も含まれており、@shannholmberg、@LangChain が言及している。

Codex, Claude Code, および競争的なコーディングエージェントの戦い

OpenAI は Codex をより多くの場所に拡張しました：OpenAI は、フロンティアモデルと Codex が現在 AWS / Amazon Bedrock で一般利用可能になったと発表し、既存の AWS セキュリティ/コンプライアンスワークフロー内に OpenAI の機能を導入したい企業を明確に狙っています @OpenAI, @OpenAIDevs。また、スレッド、ターン、ストリーミング、再開、画像、サンドボックス制御をサポートする Codex Python SDK をリリースし、Bedrock 設定における Bedrock ベースの Codex ワークフローもサポートしました @reach_vb, @reach_vv。

Claude Code で実際の運用インシデントが発生しました：Anthropic は、一部の Opus 4.8 セッションで並列サブエージェント/ツール呼び出しが過剰に生成され、予期せぬ使用量が消費されるバグを修正した後、Pro および Max ユーザーに対して 5 時間および週間のレート制限をリセットしました @ClaudeDevs, follow-up。これは、コーディング・エージェント製品の品質が、単なるモデルの知能（IQ）だけでなく、オーケストレーションの動作によってますます決定されるという重要な reminder です。

コーディングモデル間での振る舞いの違いは依然として重要です：開発者は、ProgramBench や WeirdML などのベンチマークにおいて、GPT、Claude、および他のモデル間に大きな質的な差異があると指摘しました。Opus は場合によってはスコア最大化よりも探索を好む傾向を示したり、ベンチマーク固有の癖を見せたりします @OfirPress, @htihle。別の長いスレッドでは、新しい Claude Opus 4.6–4.8 バリアントが非コーディング領域において、妥当だが架空の概念を捏造する可能性があり、これは通常のハルシネーションではなく、真実性やアライメントの退行を示唆しているという主張がありました @distributionat。

インフラ、ハードウェア、およびローカル AI システム

NVIDIA が PC に本格的に参入：最も議論を呼んだハードウェア発表は、Grace と Blackwell を基盤とした NVIDIA/Microsoft 共同の「パーソナル AI コンピューター」RTX Spark です。最大 128GB の統合メモリを搭載し、FP4 で 1 PFLOP の性能を謳っています。重要な戦略的示唆：NVIDIA はもはやアクセラレーターを販売するだけでなく、Apple Silicon、x86 PC、Qualcomm と同時に競合するエンドツーエンドのローカル AI システムを提供するようになりました @kimmonismus, @swyx。

クラスター/ネットワーク関連の最新情報：データセンター側では、Lambda が NVIDIA Quantum-X InfiniBand Photonics Q3450-LD スイッチを採用した最初の企業となり、大規模 AI クラスターにおけるネットワーク電力と障害を削減するためにコパッケージド光学（co-packaged optics）を推進しています @LambdaAPI。また OpenAI は、密閉型冷却システムを採用し、人材育成・教育へのコミットメントとセットで計画されている 1GW データセンター「Stargate Michigan」を発表しました @OpenAINewsroom。

ローカル向けオープンモデルツールの進化が加速：MLX-VLM v0.6.0 のリリースは、推論およびツールリングに関する実りあるローカルアップデートの一つでした。これにはスペキュラティブ・ディコーディング（speculative decoding）、Anthropic 風およびレスポンススタイルの API、ツール呼び出し機能、多数の新規マルチモーダルモデルへの対応、画像・音声機能の追加が含まれており、Apple デバイスを「真のローカルエージェントマシン」へと変えることを明確な狙いとしています @Prince_Canuma。これは、ローカル NVFP4 MoE（Mixture of Experts）推論サービスにおける DGX Spark と vLLM の実験動向と相性が良く、@vllm_project が推進しています。

エンゲージメント上位のツイート（技術的関連性をフィルタリング）

アンソロピックの IPO への道筋：アンソロピックは、SEC に非公開でドラフト S-1 を提出したと発表し、審査次第で IPO（株式公開）への扉を開いたと述べています @AnthropicAI。

クロード・コード利用に関するインシデント：Opus 4.8 の並列サブエージェント/ツール呼び出しバグによりクォータが過剰に消費されたため、アンソロピックはユーザーのレート制限をリセットしました @ClaudeDevs。

Qwen3.7-Plus：アリババは、GUI/CLI 操作、コーディング、視覚タスクを跨ぐマルチモーダルエージェントモデル「Qwen3.7-Plus」を発表しました @Alibaba_Qwen。

OpenAI の Bedrock 対応：OpenAI のモデルおよび Codex が、エンタープライズワークフロー向けに Amazon Bedrock で利用可能になりました @OpenAI。

ARC-AGI-3 の動向：Claude Opus 4.8 は ARC-AGI-3 ベンチマークで新たな SOTA（最良の性能）を記録し、1.5% を達成しました。絶対値としては依然として微小ですが、このベンチマーク上では意味のある飛躍です @arcprize。

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. New Frontier Model Releases and Early Tests

MiniMax M3 - コーディングとエージェントの最前線、1M コンテキスト、マルチモーダル（アクティビティ：1090）: MiniMax M3 は、コーディングおよびエージェント機能に焦点を当てたオープンウェイトのフロンティアモデルとして発表されました。ネイティブのマルチモーダル性・ビジョン機能を備え、MiniMax Sparse Attention により最大 1M トークンのコンテキスト（最低でも 512K が保証）をサポートします（MiniMax M3）。主張されている長期的なエージェント機能の結果には、ICLR の論文を 12 時間かけて再現した事例や、Hopper FP8 GEMM CUDA/Triton 最適化において 147 回の反復後に 9.4 倍の高速化を達成したもの、PostTrainBench で Opus 4.7 と GPT-5.5 に次いで第 3 位を獲得した結果が含まれます。現在は API や MiniMax Code を通じてアクセス可能ですが、HuggingFace/GitHub でのウェイト公開やローカルデプロイは計画されています。コメント投稿者たちは、安価で効率的なビジョン機能と長文コンテキストを活用したエージェント型コーディングの組み合わせに慎重に関心を示していますが、「オープンウェイト」と呼びながらまだウェイトもパラメータ数すら公開されていないという点については懐疑的です。技術的な議論の一つとして、これらの結果が約 250B を大幅に超える巨大モデルを示唆しているのか、極端なベンチマーク最適化の結果なのか、それとも真のオープンウェイトにおける画期的突破なのかという点が挙げられています。

技術的に実質的な懸念点は、宣伝されている能力レベルが3つの可能性のいずれかを暗示していることです：予想よりはるかに大きなモデル、異常に強いベンチマーク最適化、あるいは主要なオープンウェイトのブレークスルーです。議論は、MiniMax M3 が実際には約250Bパラメータ程度なのか、それともさらに大幅に大きいのかが焦点であり、また重みと独立したベンチマークが利用可能になった時点で、そのコーディング/エージェント/マルチモーダルに関する主張が成立するかどうかでした。

原文を表示

a quiet day.

AI News for 5/30/2026-6/1/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews' website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

NVIDIA’s Cosmos 3, Nemotron 3 Ultra, and the Push for Open Physical AI

NVIDIA’s open-source week: NVIDIA dominated the open-model conversation with Cosmos 3, an open family of omnimodal world models for physical AI, plus the announcement of Nemotron 3 Ultra, a 550B open-weight model that several posters called the strongest U.S. open model so far. Cosmos 3 was framed as a full-stack release—weights, code, datasets, and fine-tuning recipes—with NVIDIA also launching the Cosmos Coalition alongside partners including Runway to build an open ecosystem for world models @NVIDIAAI ecosystem context, @runwayml coalition announcement, @kimmonismus Cosmos thread, @ClementDelangue on NVIDIA’s HF footprint.

Why Cosmos 3 mattered technically: Beyond robotics rhetoric, the more concrete details were that Cosmos 3 unifies language, image, video, audio, and action in a single Mixture-of-Transformers design pairing an autoregressive reasoner with a diffusion generator. Artificial Analysis said Cosmos 3 reached #1 among open-weight models on both their Text-to-Image and Image-to-Video leaderboards, noting the generator uses structured JSON prompts and can be driven either by an external prompt-upsampling harness or its own reasoner branch. Separately, NVIDIA’s hardware + software push extended to adoption of the OpenMDW framework and partner ecosystem integrations on platforms like fal @ArtificialAnlys, @fal.

Nemotron 3 Ultra reception: Community reaction to Nemotron 3 Ultra was unusually strong for a fresh open release. Posters highlighted both capability and serving characteristics, including claims that it is already topping some open evals and may be serving at 300+ tok/s in some setups—far faster than large DeepSeek/Kimi-class models @scaling01, @ctnzr, @caspar_br. There was also some technical discussion that Nemotron appears less sparse than peers like Kimi K2 / DeepSeek V4—roughly ~10% active vs ~3%—which could affect both economics and behavior @eliebakouch.

MiniMax M3, Qwen3.7-Plus, and JetBrains Mellum2 Expand the Open Agent Model Field

MiniMax M3’s launch was the day’s biggest model release: M3 was presented as an open-weight multimodal agent/coding model with 1M context, native multimodality, and competitive agent benchmarks. The headline figures repeated across launch partners were 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, and 74.2% MCP Atlas @MiniMax_AI, @PBDTokenRouter, @kimmonismus. Multiple infra vendors shipped day-0 support—Novita, Vercel AI Gateway, Cloudflare AI Gateway, OpenClaude, Flowith, and others—suggesting unusually fast ecosystem adoption @MiniMax_AI on Novita, @rauchg, @gitlawb.

Benchmarks vs practical experience were mixed: M3 earned praise for frontend generation, visual/game tasks, and price-performance, with side-by-side demos showing strong one-shot UI/game outputs and notable benchmark placement for Next.js agent evals @notjazii, @lostinlatencyX, @rauchg. But several evaluators also reported high token consumption, verbose self-check loops, and occasional requirement drift on long tasks, making M3 look more like a “quality first, efficiency later” model @ZhihuFrontier review, @teortaxesTex skepticism.

Qwen3.7-Plus: Alibaba launched Qwen3.7-Plus as a multimodal interactive hybrid agent that unifies GUI and CLI operation, visual reasoning, coding, and search-augmented QA. It is API-available via Alibaba Cloud Model Studio and was quickly added to tools like Cline @Alibaba_Qwen launch, @cline. The launch reinforces the trend that open-ish Asian labs are no longer releasing “just chat models,” but full agent-capable multimodal systems.

JetBrains Mellum2: JetBrains released Mellum2, a 12B MoE model with 2.5B active parameters, trained on roughly 11T tokens and post-trained with RLVR, shipping base / SFT / RL checkpoints and a technical report @nv_pavlichenko, @jetbrains. The intended niche is especially interesting: ultra-low-latency inference for routing, RAG, sub-agents, and IDE use, and it landed in vLLM immediately @vllm_project. This looks like a serious “small fast open model for developer workflows” play rather than a benchmark-chasing frontier release.

Agents, Sandboxes, Memory, and Search Are Becoming the Real Product Surface

The stack is shifting from model calls to agent runtimes: Several launches converged on the idea that the main engineering leverage is now in the harness rather than the model. Perplexity’s “Search as Code” is the clearest example: instead of iterative search tool calls, the model writes Python against a search SDK, enabling custom ranking pipelines, map-reduce over indexes, batching, aggregation, and lower token overhead. Perplexity reports a jump on its internal WANDR benchmark from 0.152 to 0.386 with this architecture @perplexity_ai, @AravSrinivas.

Managed agents + sandboxes are becoming standard: Google detailed Managed Agents in the Gemini API, where a single API call can spin up an agent that reasons, writes/runs code, manages files, and operates inside a hosted Linux sandbox @_philschmid, @GoogleAIStudio. LangChain pushed similar ideas around Deep Agents, Context Hub, and LangSmith Sandboxes/Engine, emphasizing persistent context, agent lifecycle tooling, and automated failure triage @LangChain, @hwchase17.

Memory remains a missing primitive: One recurring complaint was that enormous context windows still don’t solve cross-session memory. A thread on HydraDB argued that “RAG + manual context injection” has been misnamed as memory, while actual persistent session knowledge remains underserved @kimmonismus. Related research threads pointed to reusable context management policies like AdaCoM, which trains a separate LLM via RL to prune/preserve context for frozen agents @dair_ai.

Security remains the gating issue for enterprise agents: There was a notable warning from Microsoft Security Intelligence about a major npm supply chain compromise affecting 90+ redhat-cloud-services packages, including a self-propagating worm stealing npm/GitHub/AWS/SSH credentials @MsftSecIntel. At the same time, enterprise agent vendors highlighted sandboxing, runtime isolation, and security stack integration as prerequisites for deployment, including discussion of NVIDIA OpenShell and LangChain’s sandbox keynote @shannholmberg, @LangChain.

Codex, Claude Code, and the Competitive Coding-Agent Race

OpenAI extended Codex into more places: OpenAI announced that frontier models and Codex are now generally available on AWS / Amazon Bedrock, aimed squarely at enterprises that want OpenAI capabilities inside existing AWS security/compliance workflows @OpenAI, @OpenAIDevs. OpenAI also shipped a Codex Python SDK supporting threads, turns, streaming, resume, images, and sandbox control @reach_vb, plus support for Bedrock-backed Codex workflows @reach_vb on Bedrock config.

Claude Code had a real ops incident: Anthropic reset 5-hour and weekly rate limits for Pro and Max users after fixing a bug where some Opus 4.8 sessions spawned too many parallel subagents/tool calls, burning usage unexpectedly @ClaudeDevs, follow-up. That’s a notable reminder that coding-agent product quality is increasingly determined by orchestration behavior, not just raw model IQ.

Behavioral differences across coding models remain material: Developers highlighted large qualitative differences between GPT, Claude, and other models on benchmarks like ProgramBench and WeirdML, with Opus sometimes preferring exploration over score-maximization or showing benchmark-specific quirks @OfirPress, @htihle. A separate long thread argued newer Claude Opus 4.6–4.8 variants can fabricate plausible but fictional concepts in non-coding domains, suggesting possible truthfulness/alignment regressions rather than ordinary hallucinations @distributionat.

Infra, Hardware, and Local AI Systems

NVIDIA is coming for the PC: The most-discussed hardware launch was RTX Spark, an NVIDIA/Microsoft “personal AI computer” built around Grace + Blackwell, with up to 128GB unified memory and claimed 1 PFLOP FP4. The key strategic read: NVIDIA is no longer just selling accelerators, but an end-to-end local AI system that competes with Apple Silicon, x86 PCs, and Qualcomm simultaneously @kimmonismus, @swyx.

Cluster/networking updates: On the datacenter side, Lambda said it is first to adopt NVIDIA Quantum-X InfiniBand Photonics Q3450-LD switches, pushing co-packaged optics to reduce network power and failures in large AI clusters @LambdaAPI. OpenAI also announced Stargate Michigan, a planned 1GW data center using closed-loop cooling and paired with workforce/education commitments @OpenAINewsroom.

Local open-model tooling is improving fast: The MLX-VLM v0.6.0 release was one of the more substantive local inference/tooling updates, adding speculative decoding, Anthropic-style and responses-style APIs, tool calls, support for many new multimodal models, and image/audio features with the explicit pitch of turning Apple devices into “real local agent machines” @Prince_Canuma. That pairs well with growing DGX Spark + vLLM experimentation for local NVFP4 MoE serving @vllm_project.

Top Tweets (by engagement, filtered for technical relevance)

Anthropic’s IPO path: Anthropic said it has confidentially submitted a draft S-1 to the SEC, opening the door to an IPO pending review @AnthropicAI.

Claude Code usage incident: Anthropic reset user rate limits after an Opus 4.8 parallel subagent/tool-call bug caused excessive quota burn @ClaudeDevs.

Qwen3.7-Plus: Alibaba launched a multimodal agent model spanning GUI/CLI operation, coding, and visual tasks @Alibaba_Qwen.

OpenAI on Bedrock: OpenAI models and Codex are now available through Amazon Bedrock for enterprise workflows @OpenAI.

ARC-AGI-3 movement: Claude Opus 4.8 posted a new SOTA on ARC-AGI-3 at 1.5%, still tiny in absolute terms but a meaningful jump on that benchmark @arcprize.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. New Frontier Model Releases and Early Tests

MiniMax M3 - Coding & Agentic Frontier, 1M Context, Multimodal (Activity: 1090): MiniMax M3 is announced as an open-weight frontier model with coding/agentic focus, native multimodality/vision, and MiniMax Sparse Attention for up to 1M tokens of context with a guaranteed 512K minimum (MiniMax M3). Claimed long-horizon agentic results include 12-hour ICLR paper reproduction, Hopper FP8 GEMM CUDA/Triton optimization reaching 9.4× speedup after 147 iterations, and PostTrainBench ranking third behind Opus 4.7 and GPT-5.5; access is currently via API/MiniMax Code, with HuggingFace/GitHub weights/local deployment planned. Commenters are cautiously interested in the combination of cheap/efficient vision plus long-context agentic coding, but skeptical because the announcement calls it “open-weight” while not yet exposing weights or even parameter count. One technical debate is whether the results imply a much larger-than-~250B model, extreme benchmark optimization, or a genuine open-weight breakthrough.

A technically substantive concern was that the advertised capability level implies one of three possibilities: a much larger-than-expected model, unusually strong benchmark optimization, or a major open-weights breakthrough. The speculation centered on whether MiniMax M3 is actually around ~250B parameters or significantly larger, and whether its coding/agentic/multimodal claims will hold once weights and independent benchmarks are available.

この記事をシェア

TLDR AI★52026年6月2日 09:00

NVIDIA、物理AI向けオープンフロンティアモデル「Cosmos 3」を発表

NVIDIA Developer Blog★42026年3月14日 01:00

NVIDIA Cosmos World Foundation Modelsによる合成データのスケーリングと物理AI推論

The Decoder★32026年4月12日 21:09

研究者が世界モデルの定義を明確化、テキスト動画生成AIは除外

国際研究チームがOpenWorldLibで世界モデル研究の統一を図り、Soraなどのテキスト動画生成モデルを定義から除外した。

ニュース一覧に戻る元記事を読む

今日は何も大きな出来事はありませんでした

キーポイント

影響分析

編集コメント

AI Twitterリキャップ

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. New Frontier Model Releases and Early Tests

AI Twitter Recap

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. New Frontier Model Releases and Early Tests

関連記事

今日は何も大きな出来事はありませんでした

キーポイント

影響分析

編集コメント

AI Twitterリキャップ

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. New Frontier Model Releases and Early Tests

AI Twitter Recap

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. New Frontier Model Releases and Early Tests

関連記事