Smol AI News·2026年6月10日 14:44·約16分で読める

今日は何も大きな出来事はありませんでした

#LLM #Anthropic #モデルガバナンス #データプライバシー #再現性

TL;DR

Anthropic の新モデル「Fable/Mythos」における性能の非明示的低下とデータ保持ポリシーが、技術者コミュニティから信頼性や再現性の観点で強い批判を浴びている。

AI深層分析2026年6月20日 17:03

重要/ 5段階

深度40%

キーポイント

AI R&D 支援機能の非明示的劣化への批判

Anthropic が研究関連プロンプトに対して明確な拒否表示なしに性能を低下させる「サイレント・デグレード」を実行しており、これが再現性の欠如や信頼損傷を引き起こしていると研究者から指摘されている。

エンタープライズにおけるデータ保持とロックイン懸念

新モデルが 30 日間のプロンプト/データ保持を義務付け、オプトアウト不可のケースがあるため、ゼロデータ保持環境や欧州の一部市場での利用が排除されるリスクが示唆されている。

明示的拒否またはモデルダウングレードの必要性

業界関係者は、フロンティアユースを制限する場合は「サイレントなサボタージュ」ではなく、明確な拒否メッセージや性能 downgrade の明示がより正当化されると主張している。

影響分析・編集コメントを表示

影響分析

このニュースは、AI モデルの開発者がセキュリティやコンプライアンスのために機能制限を行う際、その手法が透明性を欠く場合、技術的信頼性と市場受容性に深刻なダメージを与えることを示しています。特に研究開発分野や厳格なデータ規制下にある企業にとって、プロバイダーのポリシー変更は即座に業務継続リスクとなるため、ベンダー選定基準の見直しを迫る重要な示唆となっています。

編集コメント

「サイレントな性能低下」という表現は、AI の透明性に対する信頼を揺るがす重大な問題提起です。開発者は単に機能制限を行うだけでなく、その意図と影響を明確に伝えることが、長期的な技術生態系の健全性に不可欠であると言えます。

静かな一日。

2026年6月9日〜10日のAIニュース。12のサブレッド、544 のツイート、および追加のDiscordサーバーを確認しました。AINews のウェブサイトでは過去のすべての号を検索できます。念のため、AINews は現在 Latent Space のセクションの一部となっています。メール配信頻度のオプトイン/オプトアウトも可能です！

AI ツイートリキャップ

Anthropic の Fable/Mythos ロールアウト、サイレントな能力制限、そして信頼への反発**

AI 研究開発支援におけるAIの性能低下（silent degradation）が議論を支配しました：技術系ツイートの多くは、Anthropic が明確な事前開示なしにAI研究関連のプロンプトに対するモデルのパフォーマンスを意図的に低下させているという点に集中しており、単にリクエストを拒否するのではなく、その行為自体が問題視されました。批判は非常に広範で、研究者や開発者はこれが観測された能力と実際のモデル能力の間に検証不可能なギャップを生み出し、再現性を損ない、コーディング、生物学、システム関連などの隣接ドメインにおけるモデル出力への信頼を損なうと主張しました。代表的な批判者には @natolambert, @martin_casado, @drfeifei, @antirez, @ClementDelangue, @deanwball がいます。いくつかの投稿では、より限定的な視点として、Anthropic がフロンティアユースケースを制限したい場合でも、明示的な拒否やモデルのダウングレードの方が、サイレントな妨害行為よりも正当化されやすいという点が指摘されました（例：@hlntnr, @arohan, @DBahdanau）。

エンタープライズにおける懸念は安全性だけでなく、データ保持とロックインにも及んでいた：ビルダーたちは、Fable/Mythos は報告によると 30 日間のプロンプト・データ保持を伴い、一部の設定ではオプトアウトが不可能であると指摘し、これによりゼロ保持環境や欧州の一部の地域が即座に除外されると述べている。@GergelyOrosz のプロンプト履歴保持と不透明なモデル変更に関する見解、および @scaling01 のゼロデータ保持との互換性欠如に関する指摘を参照のこと。複数の実務者によって繰り返された第 2 次の教訓：フロンティア API を不安定な依存関係として扱い、モデルの移植性を維持し、@dbreunig、@omarsar0、@yacineMTB が主張するように、評価（evals）とハーン（harnesses）を用いて出力を継続的に検証すること。

アンソロピックは論争に政策推進を組み合わせた：批判の高まりの中、ダリオ・アモダイが「AI 指数関数的成長に関するポリシー」を発表し、AI の進展が制度の追従を超えていると主張してより強力なフロンティア監督を呼びかけると同時に、アンソロピックは関連するイニシアチブの発表と、不安全なリリースを阻止するための政府の役割の提案を行った。@DarioAmodei と @AnthropicAI を参照のこと。コミュニティにとってこの緊張関係は明白だった：不透明な私的制御で批判されている同じ企業が、今度はより強力な公的制御を主張しているのだ。

論争にもかかわらず、Fable 5 のベンチマークにおける強さと製品パフォーマンス

Fable 5 は、エージェント型およびコーディングのワークロードにおいて本格的に強力であることが示されています：Anthropic の方針に対する批判者の中にも、モデル自体は優秀であると認める者が多くいます。コミュニティからの報告では、幅広い評価で首位またはそれに準ずる位置を占めているとされており、Agent Arena では総合 1 位となり、特にタスクの成功確認やユーザーからの称賛において大きな差をつけましたが、操作性についてはやや劣りました。@mchlhess は「自分のベンチマークを完全に打ち破った」と述べており、@JasonBotterill は SimpleBench で 81.9% のスコアを記録したと指摘し、@lvwerra は CADGenBench で 1 位だと報告しました。また、@scaling01 はコンピュータ操作に関する結果の強さを強調し、@LechMazur は PACT 交渉タスクで 1 位であると指摘しています。

ビルダーからは実世界での大幅な成果が報告されましたが、一律ではありませんでした。多くの実践者は、ゲーム生成や難易度の高いバグ修正など、長期にわたるコーディングおよび創造的タスクにおいて大きな生産性向上を体験したと述べています（例：@kimmonismus, @walden_yan, @hrishioa）。一方で、@Sentdex や @QuixiAI などは、特定のタスクにおいて挙動が脆い、消費コストが高い、あるいは GPT-5.5 よりも性能が劣ると報告しています。タイムライン全体からの結論として、Fable 5 は多くのエージェント型コーディングタスクにおいて最先端である可能性がありますが、信頼性や製品上の制約が採用に実質的な影響を与えていることがわかります。

配布と統合は迅速に進みました：Perplexity は @perplexity_ai と @AravSrinivas を通じて、Pro/Max ユーザー向けに Computer に Claude Fable 5 をオーケストレーターモデルとして追加しました。Apple の開発者たちは、@ClaudeDevs を介して、多段階推論、長いコンテキスト、およびコード利用のための Foundation Models フレームワークサポートを Claude で得ました。コミュニティの行動もまた、反発の後、OpenAI/Codex への置換圧力を示唆しており、@dylan522p が Anthropic から OpenAI へ利用シェアが移動しているとの報告が含まれています。

Google の DiffusionGemma リリースと拡散型大規模言語モデル（Diffusion LLMs）への関心の再燃

Google は Apache 2.0 ライセンスの下で DiffusionGemma をリリースしました：このセットにおける最も重要なオープンモデルの発表は、Gemma 4 に基づいて構築された実験的な 26B モデル（MoE: Mixture of Experts）である DiffusionGemma で、Apache 2.0 の下でオープンウェイトとして公開されました。自己回帰的な次トークン生成ではなく、テキストブロックを同時に生成・精緻化し、適切なハードウェアでは最大 4 倍の高速な出力と約 1,000+ トークン/秒の実現を謳っています。@Google、@GoogleDeepMind、@googlegemma、および @sundarpichai を参照してください。

システムの話題は即座に広まりました：今回のリリースは研究用アーティファクトとしての意義だけでなく、インフラストラクチャの進展にも寄与するものとして捉えられています。@vllm_project は、DiffusionGemma が vLLM でネイティブサポートされる初の拡散型大規模言語モデル（diffusion LLM）であると指摘し、単一の H200 GPU 上で FP8 精度を用いたバッチサイズ 1 の場合、出力速度が秒間 1200 トークン以上であることを示しました。@danielhanchen は llama.cpp を介して GGUF 形式でローカル環境で動作している様子を紹介し、@UnslothAI は 18GB クラスのハードウェア上でのローカル実行を強調しました。また @_philschmid は、推論に必要なリソースとしてアクティブパラメータが 38 億個、ブロックサイズは 256 トークンのノイズ除去（denoising）であると要約しています。

なぜ研究者たちが注目したのか：拡散型テキスト生成は、反復的な精緻化（iterative refinement）、制約付き編集、途中埋め込み（fill-in-the-middle）、および誤り修正に関する問いを再び浮上させました。複数の反応は、これを製品化された競合他社としてではなく、非逐次的なデコーディングや精緻化重視のタスクにおける有望な研究方向性として捉えています。詳細は @omarsar0、@mervenoyann、@dbreunig の投稿を参照してください。

エージェントのツールリング、インフラ、ベンチマーク：実 workload に関する構造がさらに明確に

ベンチマークの評価基準が「人間の好み」から「トレースベースのエージェント指標」へとシフトしています：@arena は Agent Arena の背後にある手法を詳細に説明しました。これは、各ステップで人間の好みに頼るのではなく、bash エラー、ツールの幻覚（hallucination）、そして「狂気」といった客観的な信号を捉えるために、長期のトレースデータを掘り起こすものです。数十回のツール呼び出しと 30 分にも及ぶトレースにわたるタスクを対象とするエージェント評価において、これは重要な方向性です。

メモリ、オーケストレーション、環境制御の成熟が進んでいる：エージェントを取り巻く欠落していたシステム層を対象とした複数のローンチが行われた。@Teknium は GUI ベースの Hermes エージェントプロファイルと、メモリ/スキル更新のための Write Gate 承認コントロールを @Teknium を通じて提供した。@weaviate_io は Engram において、グループ、トピック、スコープを用いた構造化エージェントメモリについて説明した。@bromann は、クライアントサイドやブラウザの機能をエージェントループに組み込むべきだと主張した。@FactoryAI は Factory Desktop で Missions をローンチした。

検出、ルーティング、コミュニティハネス：@perceptroninc は Agentic Detection をローンチし、単発型検出器ではなく、多段階のズーム/推論ループを用いて密度の高い曖昧な視覚検出を行うアプローチを採用した。@vllm_project は、推論経済性を中核に最適化されたコミュニティエージェントハネスである Inferoa を紹介した。また @Azaliamirh は、分散型マルチエージェントフレームワーク DeLM を導入し、これは集中型の代替案と比較してコストが半分以下で Gemini 3-Flash を用いた SWE-bench Verified で 65.7% の性能を達成したと報告されている。

追跡に値する最適化、検索、科学モデル化の取り組み

Distributed Shampoo と Muon の比較は最適化の議論として継続中：技術的に興味深いサブスレッドでは、ハイパーパラメータの調整と擬似逆行列安定化機能を有効にした後、Meta 製の DistributedShampoo が速度競走スタイルのタスクにおいて強力な Muon ベースラインに匹敵する結果を示しました。@arohan は、標準パッケージ＋調整後の検証損失が約 3.2766 であると報告しましたが、@kellerjordan0 は重要な安定化フラグが文書化されていないため「標準」と呼ぶことに異議を唱えました。ここで得られる有用な信号は「勝者が宣言された」ことではなく、最適器の比較がいかに隠された実装詳細や数値計算に敏感であるかという点です。

後期相互作用型検索（late-interaction retrieval）のためのより高性能なカーネルが公開されました：@tonywu_71 が ColBERT/ColPali/LateOn で使用される MaxSim 向けの融合された Triton カーネル「late-interaction-kernels」をリリースし、メモリフットプリントはピクチャーの几分の一でありながら PyTorch と数値的に等価であると主張しています。これは、マルチベクトル検索モデルのトレーニングと推論（サービング）の両方において重要な意味を持ちます。

科学計算および多モーダルモデリング：@giffmana は、拡散動画モデルが V-JEPA や VideoMAE に比べて一部のプローブで物理情報をより線形的に符号化していることを示す新しい研究を指摘し、「動画生成モデルは物理シミュレータとして愚かである」という一般的な通説に異議を唱えました。バイオテック分野では、@edunov が DeCAF-Pearl を紹介しました。これはフローマップ共折りたたみ（flow-map cofolding）モデルであり、品質を維持しつつ Pearl よりも約 5 倍高速であると報告されています。アーキテクチャ研究については、@ZyphraAI が Apache 2.0 ライセンスの下で Zamba2-VL をリリースし、ハイブリッド SSM-Transformer のアイデアを VLM（Vision-Language Model）に拡張しました。

エンゲージメント上位のツイート

ポリシー/ガバナンス：@DarioAmodei による「AI の指数関数的成長に関するポリシー」が、最もエンゲージメントの高い技術・政策投稿となり、フロンティア AI が機関の対応速度を超えて急速に進展しているという枠組みを示しました。

セキュリティ/安全性の失敗モード：@jsrailton は、マルウェア作成者が LLM の拒絶反応をトリガーして AI マルウェア分析を回避するために、核兵器や生物学的テキストを組み込む事例に大きな注目を集めました。これは攻撃者が安全性機能を悪用する具体的な例です。

オープンモデル：@googlegemma と @Google による DiffusionGemma の投稿が、純粋なモデル公開に関する最も大きな投稿となりました。

リサーチアクセスの規範：@drfeifei は、学界からの広範な合意を簡潔に表明しました。すなわち、科学の進展には AI を含む最良のツールへのアクセスが必要であるという点です。

モデル能力のシグナル：@mchlhess が Fable 5 のベンチマーク結果について「完全に打ち破る」と発言したことは、最も引用された能力評価の一つとなりました。

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. オープンウェイトモデルのリリース：North Mini Code と DiffusionGemma

Cohere North Mini Code のリリース（アクティビティ：388）：Cohere は公式に North Mini Code 1.0 をリリースしました。重みは Hugging Face に公開され、FP8 バリアントも用意されています。OpenCode を通じて無料アクセスが可能で、技術詳細は HF ブログおよび発表記事に記載されています。デプロイについては、Cohere は vLLM のメインブランチに cohere_melody>=0.9.0 を組み合わせることを推奨しており、--max-model-len 320000、--tool-call-parser cohere_command4、--reasoning-parser cohere_command4、--enable-auto-tool-choice というパラメータでサービングを行うよう案内しています。また、LocalLLaMA のフィードバックに基づいて PR がプッシュされたことも明記されています。エコシステムサポートとしては、Unsloth による GGUF 変換が追加され、MLX サポートも報告されています。一方、Cohere は llama.cpp や量子化（quantization）に関する要望は内部でフラグ付けされていると述べています。コメント投稿者は、Cohere が LocalLLaMA スタイルの早期アクセスを提供した点については概ね好意的でしたが、今後のリリースでは Day-0 の llama.cpp/GGUF サポートを強く求める声が上がりました。ある投稿者は、公開されたベンチマークが Qwen 3.6 35B A3B と比較してほとんどの指標で劣っているように見えると指摘し、他の投稿者たちは主に GGUF の利用可能性や、より大きな「Maxi Code」モデルの存在について質問していました。

コメント投稿者は、今後の Cohere リリースにおいて Day-0 の llama.cpp / GGUF サポートを求めました。これは、即座にローカル推論との互換性が確保されることで、LocalLLaMA エコシステムにおける採用が促進される可能性が高いと指摘したためです。ある投稿者は、North Mini Code に対する llama.cpp サポートは「進行中」であるように見えると述べています。

ベンチマークに焦点を当てたコメント投稿者は、Cohere North Mini Code がリストされたほぼすべての指標において Qwen 3.6 35B A3B よりも劣っていると指摘し、新しいオープンモデルとして歓迎されているにもかかわらず、純粋なベンチマーク性能では競争力がない可能性があると示唆しました。

Apache-2.0 ライセンスは特に称賛され、これはモデルの商用または許容された下流利用を評価する開発者にとって重要な要素です。

DeepMind が「DiffusionGemma」を発表 — 画像スタイル拡散モデルによるテキスト生成（アクティビティ：355）: ****Google DeepMind は、Gemma 4/Gemini Diffusion の研究に基づいた Apache 2.0 オープンウェイトの 26B モデル（MoE: モジュール型エキスパート）である「DiffusionGemma」をリリースしました。このモデルは、3.8B パラメータのみを活性化し、自己回帰的なトークンごとのデコードではなく、256 トークンのブロックを並列でノイズ除去します。Google によると、H100 では 1000+ トークン/秒、RTX 5090 では 700+ トークン/秒の速度を実現し、量子化されたデプロイでは約 18GB の VRAM で収まります。この設計は、低同時実行数のローカル推論を、メモリ帯域幅に依存する逐次デコードから、計算集約型の並列リファインメントへとシフトさせるものであり、Hugging Face、vLLM、Unsloth でのサポートも提供されています。コメント投稿者はこれをリアルタイム/ローカルアプリケーションにとって重要な進展と捉えましたが、いくつかの意見では、品質は標準的な自己回帰型 Gemma モデルに劣る可能性があることが強調されました。「愚かなものになるなら、超高速である必要はない」という声もありました。また、Google の最近のオープンモデルリリースペースに対する全般的な驚きと称賛の声も聞かれました。

コメント投稿者たちは、報告されている秒間700以上のトークン生成速度が、エージェント型ワークフローにおいて重要である可能性を指摘しています。これは、拡散テキストモデルが候補となるアクションを生成し、より小さな自己回帰モデルが同じレイテンシ予算内でそれらを検証できるようなケースです。技術的な観点として、双方向アテンションにより、特別なFIMトークンを必要とせずにコードの埋め込み（code infilling）がより自然に行えるようになる可能性が挙げられており、これはローカルコーディングエージェントにとって有益となるでしょう。

いくつかのコメントでは、DiffusionGemmaを標準的な自己回帰モデルとの速度と品質のトレードオフとして位置づけています。

原文を表示

a quiet day.

AI News for 6/9/2026-6/10/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews' website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Anthropic’s Fable/Mythos rollout, silent capability gating, and the trust backlash

Silent degradation of AI R&D help dominated the discourse: A large share of technical tweets focused on Anthropic apparently degrading model performance on AI research-related prompts without clear up-front disclosure, rather than hard-refusing those requests. Criticism was unusually broad: researchers and builders argued this creates an unverifiable gap between observed and actual model capability, undermines reproducibility, and damages trust in model outputs for adjacent domains like coding, biology, and systems work. Representative critiques came from @natolambert, @martin_casado, @drfeifei, @antirez, @ClementDelangue, and @deanwball. Several posts made the narrower point that, even if Anthropic wants to restrict frontier-use cases, explicit refusals or model downgrades would be more defensible than silent sabotage, e.g. @hlntnr, @arohan, and @DBahdanau.

Enterprise concerns extended beyond safety to retention and lock-in: Builders highlighted that Fable/Mythos reportedly come with 30-day prompt/data retention and no opt-out in some settings, which immediately excludes zero-retention environments and parts of Europe. See @GergelyOrosz on prompt-history retention and opaque model changes, and @scaling01 on zero-data-retention incompatibility. A second-order lesson repeated by multiple practitioners: treat frontier APIs as unstable dependencies, maintain model portability, and verify outputs continuously with evals and harnesses, as argued by @dbreunig, @omarsar0, and @yacineMTB.

Anthropic paired the controversy with a policy push: Amid the backlash, Dario Amodei published “Policy on the AI Exponential”, arguing AI progress is outrunning institutions and calling for stronger frontier oversight; Anthropic simultaneously announced related initiatives and a proposed government role in blocking unsafe releases. See @DarioAmodei and @AnthropicAI. The tension was obvious to the community: the same company being criticized for opaque private controls is now advocating stronger public controls.

Fable 5’s benchmark strength and product performance despite the controversy

Fable 5 appears genuinely strong on agentic and coding workloads: Even many critics of Anthropic’s policy acknowledged the model itself is excellent. Community reports had it leading or near-leading on a wide mix of evaluations: Agent Arena showed #1 overall with especially large margins in confirmed task success and user praise, albeit weaker steerability; @mchlhess said it “completely demolishes” his benchmark; @JasonBotterill noted 81.9% on SimpleBench; @lvwerra reported #1 on CADGenBench; @scaling01 highlighted strong computer-use results; and @LechMazur flagged #1 on PACT negotiation.

Builders reported substantial real-world gains, but not uniformly: A number of practitioners described major productivity gains on long-horizon coding and creative tasks, including game generation and hard bug-fixing, e.g. @kimmonismus, @walden_yan, and @hrishioa. At the same time, others reported brittle behavior, expensive consumption, or worse performance than GPT-5.5 on specific tasks, such as @Sentdex and @QuixiAI. The net takeaway from the timeline: Fable 5 is plausibly state-of-the-art for many agentic coding tasks, but trust and product constraints are materially affecting adoption.

Distribution and integration moved quickly: Perplexity added Claude Fable 5 as an orchestrator model in Computer for Pro/Max users via @perplexity_ai and @AravSrinivas. Apple developers got Foundation Models framework support for Claude for multi-step reasoning, longer context, and code use via @ClaudeDevs. Community behavior also suggested substitution pressure toward OpenAI/Codex after the backlash, including @dylan522p reporting usage share moving from Anthropic toward OpenAI.

Google’s DiffusionGemma release and renewed interest in diffusion LLMs

Google released DiffusionGemma under Apache 2.0: The most important open-model launch in the set was DiffusionGemma, an experimental 26B MoE diffusion text model built on Gemma 4 and released with open weights under Apache 2.0. Instead of autoregressive next-token generation, it generates and refines blocks of text simultaneously, with claims of up to 4x faster output and around 1,000+ tokens/sec on suitable hardware. See @Google, @GoogleDeepMind, @googlegemma, and @sundarpichai.

The systems story landed immediately: The release mattered not just as a research artifact but as serving infrastructure progress. @vllm_project said DiffusionGemma is the first diffusion LLM natively supported in vLLM, citing 1200+ output tok/s at batch size 1 on a single H200 with FP8. @danielhanchen showed it running locally via llama.cpp with GGUFs; @UnslothAI emphasized local execution on 18GB-class hardware; and @_philschmid summarized the inference footprint as 3.8B active params and 256-token block denoising.

Why researchers cared: Diffusion-style text generation revives questions around iterative refinement, constrained editing, fill-in-the-middle, and error correction. Multiple reactions framed it less as a productized competitor and more as a fertile research direction for non-sequential decoding and refinement-heavy tasks; see @omarsar0, @mervenoyann, and @dbreunig.

Agent tooling, infra, and benchmarks: more structure around real workloads

Benchmarks are shifting from preference to trace-based agent metrics: @arena detailed the methodology behind Agent Arena, which mines long-horizon traces for objective signals like bash errors, tool hallucination, and “insanity” rather than relying on human preference for every step. This is an important direction for agent evals where tasks span dozens of tool calls and 30-minute traces.

Memory, orchestration, and environment control keep maturing: Several launches targeted the missing systems layer around agents. @Teknium shipped GUI-based Hermes Agent profiles and later Write Gate approval controls for memory/skill updates via @Teknium. @weaviate_io described structured agent memory using groups, topics, and scopes in Engram. @bromann argued for bringing client-side/browser capabilities into the agent loop. @FactoryAI launched Missions on Factory Desktop.

Detection, routing, and community harnesses: @perceptroninc launched Agentic Detection, using multi-call zoom/reason loops for dense ambiguous visual detection instead of a one-shot detector; @vllm_project highlighted Inferoa, a community agent harness optimized around inference economics; and @Azaliamirh introduced DeLM, a decentralized multi-agent framework that reportedly reaches 65.7% SWE-bench Verified with Gemini 3-Flash at less than half the cost of centralized alternatives.

Optimization, retrieval, and scientific-modeling work worth tracking

Distributed Shampoo vs Muon remained a live optimization thread: A technically interesting sub-thread showed tuned Meta DistributedShampoo matching strong Muon baselines on a speedrun-style task after hyperparameter tuning and enabling pseudo-inverse stabilization. @arohan reported validation losses around 3.2766 with vanilla package + tuning, while @kellerjordan0 pushed back on calling it “vanilla” because the critical stabilization flag was undocumented. The useful signal here is not “winner declared,” but that optimizer comparisons remain highly sensitive to hidden implementation details and numerics.

Late-interaction retrieval got better kernels: @tonywu_71 released late-interaction-kernels, fused Triton kernels for MaxSim used in ColBERT/ColPali/LateOn, claiming numerical equivalence to PyTorch at a fraction of the memory footprint. This should matter for both training and serving multi-vector retrieval models.

Scientific and multimodal modeling: @giffmana highlighted new work showing diffusion video models linearly encode physical information better than V-JEPA/VideoMAE on some probes, challenging a common “videogen models are dumb physics simulators” narrative. In biotech, @edunov introduced DeCAF-Pearl, a flow-map cofolding model reportedly ~5x faster than Pearl while maintaining quality. On architecture research, @ZyphraAI released Zamba2-VL under Apache 2.0, extending hybrid SSM-Transformer ideas into VLMs.

Top tweets (by engagement)

Policy / governance: @DarioAmodei on “Policy on the AI Exponential” was the highest-engagement technical/policy post, framing frontier AI as advancing faster than institutions can react.

Security / safety failure mode: @jsrailton drew major attention to malware authors embedding nuclear/biological text to trigger LLM refusals and evade AI malware analysis—a concrete example of attackers exploiting safety behavior.

Open models: @googlegemma and @Google on DiffusionGemma were the biggest pure model-release posts.

Research access norms: @drfeifei concisely stated the broad consensus from academia: scientific progress requires access to the best tools, including AI.

Model capability signal: @mchlhess saying Fable 5 “completely demolishes” his benchmark became one of the most-cited capability endorsements.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Open-Weight Model Drops: North Mini Code and DiffusionGemma

Releasing Cohere North Mini Code (Activity: 388): Cohere officially released North Mini Code 1.0, with weights on Hugging Face, an FP8 variant, free access via OpenCode, and technical details in the HF blog / announcement. For deployment, Cohere recommends vLLM main plus cohere_melody>=0.9.0, serving with --max-model-len 320000, --tool-call-parser cohere_command4, --reasoning-parser cohere_command4, and --enable-auto-tool-choice; they also noted PRs were pushed based on LocalLLaMA feedback. Ecosystem support now includes an Unsloth GGUF conversion and reported MLX support, while Cohere says llama.cpp/quantization requests are being flagged internally. Commenters were broadly positive about Cohere doing LocalLLaMA-style early access, but pushed for day-0 llama.cpp/GGUF support in future releases. One commenter argued the published benchmarks appear worse than Qwen 3.6 35B A3B on most metrics, while others mainly asked about GGUF availability and a possible larger “Maxi Code” model.

Commenters asked for Day-0 llama.cpp / GGUF support for future Cohere releases, noting that immediate local inference compatibility would likely improve adoption in the LocalLLaMA ecosystem. One commenter mentioned that llama.cpp support for North Mini Code appears to be “in progress.”

A benchmark-focused commenter observed that Cohere North Mini Code appears worse than Qwen 3.6 35B A3B on almost every listed metric, suggesting the release may not be competitive on raw benchmark performance despite being welcomed as a new open model.

The Apache-2.0 license was specifically praised, which is relevant for developers evaluating commercial or permissive downstream use of the model.

DeepMind Just Dropped "DiffusionGemma" — Text Generation via Image-Style Diffusion Model (Activity: 355): ****Google DeepMind released DiffusionGemma, an Apache 2.0 open-weight 26B MoE text-diffusion model based on Gemma 4/Gemini Diffusion research that activates only 3.8B parameters and denoises a 256-token block in parallel instead of autoregressive token-by-token decoding. Google reports 1000+ tok/s on an H100 and 700+ tok/s on an RTX 5090, with quantized deployment fitting in roughly 18GB VRAM; the design shifts low-concurrency local inference from memory-bandwidth-bound sequential decoding toward compute-heavy parallel refinement, with support in Hugging Face, vLLM, and Unsloth. Commenters framed this as a significant development for real-time/local applications, but several emphasized that quality may lag standard autoregressive Gemma models: “I don’t need ultra-fast if it’s going to be stupid.” There was also broader positive surprise at Google’s recent pace of open model releases.

Commenters highlight the reported 700+ tok/s generation speed as potentially important for agentic workflows, where a diffusion text model could generate candidate actions and a smaller autoregressive model could verify them within the same latency budget. One technical angle raised is that bidirectional attention may make code infilling more natural without requiring special FIM tokens, which could benefit local coding agents.

Several comments frame DiffusionGemma as a speed/quality tradeoff versus standard autoregres

この記事をシェア

TechCrunch AI★42026年6月20日 01:01

米国がアンソロピックの「Fable 5」発売を禁止、しかし市場は動じず

米国政府は国家安全保障上の懸念から、アマゾンの研究者らがガードレール回避手法を発見したとして、アンソロピックに対し最新モデル「Fable 5」と「Mythos 5」の販売差し止めを命じた。サイバーセキュリティ研究者らはこの措置が危険だとする公開書簡に署名し、同社も他モデルでも同様の抜け道が存在すると指摘している。

The Zvi★42026年6月19日 23:34

Claude Fable 5 と Mythos 5 の能力に関する記事

Anthropic は、Claude Fable 5 が米政府から不正アクセス（ジャイルブレイク）の懸念によりリリース後わずか3日で利用停止を命じられたと報じています。この措置により、多くのユーザーが失った機能への愛着を表明しています。

TLDR AI★42026年6月19日 09:00

OpenAI や Anthropic の安価な代替案に賭ける 130 億ドル規模の AI スタートアップ

TLDR AI が報じた記事によると、OpenAI や Anthropic に代わる低コストソリューションへ巨額の投資を行う 130 億ドル規模の AI スタートアップが注目されています。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Smol AI News·2026年6月10日 14:44·約16分で読める

今日は何も大きな出来事はありませんでした

#LLM #Anthropic #モデルガバナンス #データプライバシー #再現性

TL;DR

AI深層分析2026年6月20日 17:03

重要/ 5段階

深度40%

キーポイント

AI R&D 支援機能の非明示的劣化への批判

エンタープライズにおけるデータ保持とロックイン懸念

明示的拒否またはモデルダウングレードの必要性

影響分析・編集コメントを表示

影響分析

編集コメント

静かな一日。

AI ツイートリキャップ

Anthropic の Fable/Mythos ロールアウト、サイレントな能力制限、そして信頼への反発**

AI 研究開発支援におけるAIの性能低下（silent degradation）が議論を支配しました：技術系ツイートの多くは、Anthropic が明確な事前開示なしにAI研究関連のプロンプトに対するモデルのパフォーマンスを意図的に低下させているという点に集中しており、単にリクエストを拒否するのではなく、その行為自体が問題視されました。批判は非常に広範で、研究者や開発者はこれが観測された能力と実際のモデル能力の間に検証不可能なギャップを生み出し、再現性を損ない、コーディング、生物学、システム関連などの隣接ドメインにおけるモデル出力への信頼を損なうと主張しました。代表的な批判者には @natolambert, @martin_casado, @drfeifei, @antirez, @ClementDelangue, @deanwball がいます。いくつかの投稿では、より限定的な視点として、Anthropic がフロンティアユースケースを制限したい場合でも、明示的な拒否やモデルのダウングレードの方が、サイレントな妨害行為よりも正当化されやすいという点が指摘されました（例：@hlntnr, @arohan, @DBahdanau）。

エンタープライズにおける懸念は安全性だけでなく、データ保持とロックインにも及んでいた：ビルダーたちは、Fable/Mythos は報告によると 30 日間のプロンプト・データ保持を伴い、一部の設定ではオプトアウトが不可能であると指摘し、これによりゼロ保持環境や欧州の一部の地域が即座に除外されると述べている。@GergelyOrosz のプロンプト履歴保持と不透明なモデル変更に関する見解、および @scaling01 のゼロデータ保持との互換性欠如に関する指摘を参照のこと。複数の実務者によって繰り返された第 2 次の教訓：フロンティア API を不安定な依存関係として扱い、モデルの移植性を維持し、@dbreunig、@omarsar0、@yacineMTB が主張するように、評価（evals）とハーン（harnesses）を用いて出力を継続的に検証すること。

アンソロピックは論争に政策推進を組み合わせた：批判の高まりの中、ダリオ・アモダイが「AI 指数関数的成長に関するポリシー」を発表し、AI の進展が制度の追従を超えていると主張してより強力なフロンティア監督を呼びかけると同時に、アンソロピックは関連するイニシアチブの発表と、不安全なリリースを阻止するための政府の役割の提案を行った。@DarioAmodei と @AnthropicAI を参照のこと。コミュニティにとってこの緊張関係は明白だった：不透明な私的制御で批判されている同じ企業が、今度はより強力な公的制御を主張しているのだ。

論争にもかかわらず、Fable 5 のベンチマークにおける強さと製品パフォーマンス

Fable 5 は、エージェント型およびコーディングのワークロードにおいて本格的に強力であることが示されています：Anthropic の方針に対する批判者の中にも、モデル自体は優秀であると認める者が多くいます。コミュニティからの報告では、幅広い評価で首位またはそれに準ずる位置を占めているとされており、Agent Arena では総合 1 位となり、特にタスクの成功確認やユーザーからの称賛において大きな差をつけましたが、操作性についてはやや劣りました。@mchlhess は「自分のベンチマークを完全に打ち破った」と述べており、@JasonBotterill は SimpleBench で 81.9% のスコアを記録したと指摘し、@lvwerra は CADGenBench で 1 位だと報告しました。また、@scaling01 はコンピュータ操作に関する結果の強さを強調し、@LechMazur は PACT 交渉タスクで 1 位であると指摘しています。

ビルダーからは実世界での大幅な成果が報告されましたが、一律ではありませんでした。多くの実践者は、ゲーム生成や難易度の高いバグ修正など、長期にわたるコーディングおよび創造的タスクにおいて大きな生産性向上を体験したと述べています（例：@kimmonismus, @walden_yan, @hrishioa）。一方で、@Sentdex や @QuixiAI などは、特定のタスクにおいて挙動が脆い、消費コストが高い、あるいは GPT-5.5 よりも性能が劣ると報告しています。タイムライン全体からの結論として、Fable 5 は多くのエージェント型コーディングタスクにおいて最先端である可能性がありますが、信頼性や製品上の制約が採用に実質的な影響を与えていることがわかります。

配布と統合は迅速に進みました：Perplexity は @perplexity_ai と @AravSrinivas を通じて、Pro/Max ユーザー向けに Computer に Claude Fable 5 をオーケストレーターモデルとして追加しました。Apple の開発者たちは、@ClaudeDevs を介して、多段階推論、長いコンテキスト、およびコード利用のための Foundation Models フレームワークサポートを Claude で得ました。コミュニティの行動もまた、反発の後、OpenAI/Codex への置換圧力を示唆しており、@dylan522p が Anthropic から OpenAI へ利用シェアが移動しているとの報告が含まれています。

Google の DiffusionGemma リリースと拡散型大規模言語モデル（Diffusion LLMs）への関心の再燃

Google は Apache 2.0 ライセンスの下で DiffusionGemma をリリースしました：このセットにおける最も重要なオープンモデルの発表は、Gemma 4 に基づいて構築された実験的な 26B モデル（MoE: Mixture of Experts）である DiffusionGemma で、Apache 2.0 の下でオープンウェイトとして公開されました。自己回帰的な次トークン生成ではなく、テキストブロックを同時に生成・精緻化し、適切なハードウェアでは最大 4 倍の高速な出力と約 1,000+ トークン/秒の実現を謳っています。@Google、@GoogleDeepMind、@googlegemma、および @sundarpichai を参照してください。

システムの話題は即座に広まりました：今回のリリースは研究用アーティファクトとしての意義だけでなく、インフラストラクチャの進展にも寄与するものとして捉えられています。@vllm_project は、DiffusionGemma が vLLM でネイティブサポートされる初の拡散型大規模言語モデル（diffusion LLM）であると指摘し、単一の H200 GPU 上で FP8 精度を用いたバッチサイズ 1 の場合、出力速度が秒間 1200 トークン以上であることを示しました。@danielhanchen は llama.cpp を介して GGUF 形式でローカル環境で動作している様子を紹介し、@UnslothAI は 18GB クラスのハードウェア上でのローカル実行を強調しました。また @_philschmid は、推論に必要なリソースとしてアクティブパラメータが 38 億個、ブロックサイズは 256 トークンのノイズ除去（denoising）であると要約しています。

なぜ研究者たちが注目したのか：拡散型テキスト生成は、反復的な精緻化（iterative refinement）、制約付き編集、途中埋め込み（fill-in-the-middle）、および誤り修正に関する問いを再び浮上させました。複数の反応は、これを製品化された競合他社としてではなく、非逐次的なデコーディングや精緻化重視のタスクにおける有望な研究方向性として捉えています。詳細は @omarsar0、@mervenoyann、@dbreunig の投稿を参照してください。

エージェントのツールリング、インフラ、ベンチマーク：実 workload に関する構造がさらに明確に

ベンチマークの評価基準が「人間の好み」から「トレースベースのエージェント指標」へとシフトしています：@arena は Agent Arena の背後にある手法を詳細に説明しました。これは、各ステップで人間の好みに頼るのではなく、bash エラー、ツールの幻覚（hallucination）、そして「狂気」といった客観的な信号を捉えるために、長期のトレースデータを掘り起こすものです。数十回のツール呼び出しと 30 分にも及ぶトレースにわたるタスクを対象とするエージェント評価において、これは重要な方向性です。

メモリ、オーケストレーション、環境制御の成熟が進んでいる：エージェントを取り巻く欠落していたシステム層を対象とした複数のローンチが行われた。@Teknium は GUI ベースの Hermes エージェントプロファイルと、メモリ/スキル更新のための Write Gate 承認コントロールを @Teknium を通じて提供した。@weaviate_io は Engram において、グループ、トピック、スコープを用いた構造化エージェントメモリについて説明した。@bromann は、クライアントサイドやブラウザの機能をエージェントループに組み込むべきだと主張した。@FactoryAI は Factory Desktop で Missions をローンチした。

検出、ルーティング、コミュニティハネス：@perceptroninc は Agentic Detection をローンチし、単発型検出器ではなく、多段階のズーム/推論ループを用いて密度の高い曖昧な視覚検出を行うアプローチを採用した。@vllm_project は、推論経済性を中核に最適化されたコミュニティエージェントハネスである Inferoa を紹介した。また @Azaliamirh は、分散型マルチエージェントフレームワーク DeLM を導入し、これは集中型の代替案と比較してコストが半分以下で Gemini 3-Flash を用いた SWE-bench Verified で 65.7% の性能を達成したと報告されている。

追跡に値する最適化、検索、科学モデル化の取り組み

Distributed Shampoo と Muon の比較は最適化の議論として継続中：技術的に興味深いサブスレッドでは、ハイパーパラメータの調整と擬似逆行列安定化機能を有効にした後、Meta 製の DistributedShampoo が速度競走スタイルのタスクにおいて強力な Muon ベースラインに匹敵する結果を示しました。@arohan は、標準パッケージ＋調整後の検証損失が約 3.2766 であると報告しましたが、@kellerjordan0 は重要な安定化フラグが文書化されていないため「標準」と呼ぶことに異議を唱えました。ここで得られる有用な信号は「勝者が宣言された」ことではなく、最適器の比較がいかに隠された実装詳細や数値計算に敏感であるかという点です。

後期相互作用型検索（late-interaction retrieval）のためのより高性能なカーネルが公開されました：@tonywu_71 が ColBERT/ColPali/LateOn で使用される MaxSim 向けの融合された Triton カーネル「late-interaction-kernels」をリリースし、メモリフットプリントはピクチャーの几分の一でありながら PyTorch と数値的に等価であると主張しています。これは、マルチベクトル検索モデルのトレーニングと推論（サービング）の両方において重要な意味を持ちます。

科学計算および多モーダルモデリング：@giffmana は、拡散動画モデルが V-JEPA や VideoMAE に比べて一部のプローブで物理情報をより線形的に符号化していることを示す新しい研究を指摘し、「動画生成モデルは物理シミュレータとして愚かである」という一般的な通説に異議を唱えました。バイオテック分野では、@edunov が DeCAF-Pearl を紹介しました。これはフローマップ共折りたたみ（flow-map cofolding）モデルであり、品質を維持しつつ Pearl よりも約 5 倍高速であると報告されています。アーキテクチャ研究については、@ZyphraAI が Apache 2.0 ライセンスの下で Zamba2-VL をリリースし、ハイブリッド SSM-Transformer のアイデアを VLM（Vision-Language Model）に拡張しました。

エンゲージメント上位のツイート

ポリシー/ガバナンス：@DarioAmodei による「AI の指数関数的成長に関するポリシー」が、最もエンゲージメントの高い技術・政策投稿となり、フロンティア AI が機関の対応速度を超えて急速に進展しているという枠組みを示しました。

セキュリティ/安全性の失敗モード：@jsrailton は、マルウェア作成者が LLM の拒絶反応をトリガーして AI マルウェア分析を回避するために、核兵器や生物学的テキストを組み込む事例に大きな注目を集めました。これは攻撃者が安全性機能を悪用する具体的な例です。

オープンモデル：@googlegemma と @Google による DiffusionGemma の投稿が、純粋なモデル公開に関する最も大きな投稿となりました。

リサーチアクセスの規範：@drfeifei は、学界からの広範な合意を簡潔に表明しました。すなわち、科学の進展には AI を含む最良のツールへのアクセスが必要であるという点です。

モデル能力のシグナル：@mchlhess が Fable 5 のベンチマーク結果について「完全に打ち破る」と発言したことは、最も引用された能力評価の一つとなりました。

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. オープンウェイトモデルのリリース：North Mini Code と DiffusionGemma

Cohere North Mini Code のリリース（アクティビティ：388）：Cohere は公式に North Mini Code 1.0 をリリースしました。重みは Hugging Face に公開され、FP8 バリアントも用意されています。OpenCode を通じて無料アクセスが可能で、技術詳細は HF ブログおよび発表記事に記載されています。デプロイについては、Cohere は vLLM のメインブランチに cohere_melody>=0.9.0 を組み合わせることを推奨しており、--max-model-len 320000、--tool-call-parser cohere_command4、--reasoning-parser cohere_command4、--enable-auto-tool-choice というパラメータでサービングを行うよう案内しています。また、LocalLLaMA のフィードバックに基づいて PR がプッシュされたことも明記されています。エコシステムサポートとしては、Unsloth による GGUF 変換が追加され、MLX サポートも報告されています。一方、Cohere は llama.cpp や量子化（quantization）に関する要望は内部でフラグ付けされていると述べています。コメント投稿者は、Cohere が LocalLLaMA スタイルの早期アクセスを提供した点については概ね好意的でしたが、今後のリリースでは Day-0 の llama.cpp/GGUF サポートを強く求める声が上がりました。ある投稿者は、公開されたベンチマークが Qwen 3.6 35B A3B と比較してほとんどの指標で劣っているように見えると指摘し、他の投稿者たちは主に GGUF の利用可能性や、より大きな「Maxi Code」モデルの存在について質問していました。

ベンチマークに焦点を当てたコメント投稿者は、Cohere North Mini Code がリストされたほぼすべての指標において Qwen 3.6 35B A3B よりも劣っていると指摘し、新しいオープンモデルとして歓迎されているにもかかわらず、純粋なベンチマーク性能では競争力がない可能性があると示唆しました。

Apache-2.0 ライセンスは特に称賛され、これはモデルの商用または許容された下流利用を評価する開発者にとって重要な要素です。

DeepMind が「DiffusionGemma」を発表 — 画像スタイル拡散モデルによるテキスト生成（アクティビティ：355）: ****Google DeepMind は、Gemma 4/Gemini Diffusion の研究に基づいた Apache 2.0 オープンウェイトの 26B モデル（MoE: モジュール型エキスパート）である「DiffusionGemma」をリリースしました。このモデルは、3.8B パラメータのみを活性化し、自己回帰的なトークンごとのデコードではなく、256 トークンのブロックを並列でノイズ除去します。Google によると、H100 では 1000+ トークン/秒、RTX 5090 では 700+ トークン/秒の速度を実現し、量子化されたデプロイでは約 18GB の VRAM で収まります。この設計は、低同時実行数のローカル推論を、メモリ帯域幅に依存する逐次デコードから、計算集約型の並列リファインメントへとシフトさせるものであり、Hugging Face、vLLM、Unsloth でのサポートも提供されています。コメント投稿者はこれをリアルタイム/ローカルアプリケーションにとって重要な進展と捉えましたが、いくつかの意見では、品質は標準的な自己回帰型 Gemma モデルに劣る可能性があることが強調されました。「愚かなものになるなら、超高速である必要はない」という声もありました。また、Google の最近のオープンモデルリリースペースに対する全般的な驚きと称賛の声も聞かれました。

いくつかのコメントでは、DiffusionGemmaを標準的な自己回帰モデルとの速度と品質のトレードオフとして位置づけています。

原文を表示

a quiet day.

AI News for 6/9/2026-6/10/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews' website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Anthropic’s Fable/Mythos rollout, silent capability gating, and the trust backlash

Silent degradation of AI R&D help dominated the discourse: A large share of technical tweets focused on Anthropic apparently degrading model performance on AI research-related prompts without clear up-front disclosure, rather than hard-refusing those requests. Criticism was unusually broad: researchers and builders argued this creates an unverifiable gap between observed and actual model capability, undermines reproducibility, and damages trust in model outputs for adjacent domains like coding, biology, and systems work. Representative critiques came from @natolambert, @martin_casado, @drfeifei, @antirez, @ClementDelangue, and @deanwball. Several posts made the narrower point that, even if Anthropic wants to restrict frontier-use cases, explicit refusals or model downgrades would be more defensible than silent sabotage, e.g. @hlntnr, @arohan, and @DBahdanau.

Enterprise concerns extended beyond safety to retention and lock-in: Builders highlighted that Fable/Mythos reportedly come with 30-day prompt/data retention and no opt-out in some settings, which immediately excludes zero-retention environments and parts of Europe. See @GergelyOrosz on prompt-history retention and opaque model changes, and @scaling01 on zero-data-retention incompatibility. A second-order lesson repeated by multiple practitioners: treat frontier APIs as unstable dependencies, maintain model portability, and verify outputs continuously with evals and harnesses, as argued by @dbreunig, @omarsar0, and @yacineMTB.

Anthropic paired the controversy with a policy push: Amid the backlash, Dario Amodei published “Policy on the AI Exponential”, arguing AI progress is outrunning institutions and calling for stronger frontier oversight; Anthropic simultaneously announced related initiatives and a proposed government role in blocking unsafe releases. See @DarioAmodei and @AnthropicAI. The tension was obvious to the community: the same company being criticized for opaque private controls is now advocating stronger public controls.

Fable 5’s benchmark strength and product performance despite the controversy

Fable 5 appears genuinely strong on agentic and coding workloads: Even many critics of Anthropic’s policy acknowledged the model itself is excellent. Community reports had it leading or near-leading on a wide mix of evaluations: Agent Arena showed #1 overall with especially large margins in confirmed task success and user praise, albeit weaker steerability; @mchlhess said it “completely demolishes” his benchmark; @JasonBotterill noted 81.9% on SimpleBench; @lvwerra reported #1 on CADGenBench; @scaling01 highlighted strong computer-use results; and @LechMazur flagged #1 on PACT negotiation.

Builders reported substantial real-world gains, but not uniformly: A number of practitioners described major productivity gains on long-horizon coding and creative tasks, including game generation and hard bug-fixing, e.g. @kimmonismus, @walden_yan, and @hrishioa. At the same time, others reported brittle behavior, expensive consumption, or worse performance than GPT-5.5 on specific tasks, such as @Sentdex and @QuixiAI. The net takeaway from the timeline: Fable 5 is plausibly state-of-the-art for many agentic coding tasks, but trust and product constraints are materially affecting adoption.

Distribution and integration moved quickly: Perplexity added Claude Fable 5 as an orchestrator model in Computer for Pro/Max users via @perplexity_ai and @AravSrinivas. Apple developers got Foundation Models framework support for Claude for multi-step reasoning, longer context, and code use via @ClaudeDevs. Community behavior also suggested substitution pressure toward OpenAI/Codex after the backlash, including @dylan522p reporting usage share moving from Anthropic toward OpenAI.

Google’s DiffusionGemma release and renewed interest in diffusion LLMs

Google released DiffusionGemma under Apache 2.0: The most important open-model launch in the set was DiffusionGemma, an experimental 26B MoE diffusion text model built on Gemma 4 and released with open weights under Apache 2.0. Instead of autoregressive next-token generation, it generates and refines blocks of text simultaneously, with claims of up to 4x faster output and around 1,000+ tokens/sec on suitable hardware. See @Google, @GoogleDeepMind, @googlegemma, and @sundarpichai.

The systems story landed immediately: The release mattered not just as a research artifact but as serving infrastructure progress. @vllm_project said DiffusionGemma is the first diffusion LLM natively supported in vLLM, citing 1200+ output tok/s at batch size 1 on a single H200 with FP8. @danielhanchen showed it running locally via llama.cpp with GGUFs; @UnslothAI emphasized local execution on 18GB-class hardware; and @_philschmid summarized the inference footprint as 3.8B active params and 256-token block denoising.

Why researchers cared: Diffusion-style text generation revives questions around iterative refinement, constrained editing, fill-in-the-middle, and error correction. Multiple reactions framed it less as a productized competitor and more as a fertile research direction for non-sequential decoding and refinement-heavy tasks; see @omarsar0, @mervenoyann, and @dbreunig.

Agent tooling, infra, and benchmarks: more structure around real workloads

Benchmarks are shifting from preference to trace-based agent metrics: @arena detailed the methodology behind Agent Arena, which mines long-horizon traces for objective signals like bash errors, tool hallucination, and “insanity” rather than relying on human preference for every step. This is an important direction for agent evals where tasks span dozens of tool calls and 30-minute traces.

Memory, orchestration, and environment control keep maturing: Several launches targeted the missing systems layer around agents. @Teknium shipped GUI-based Hermes Agent profiles and later Write Gate approval controls for memory/skill updates via @Teknium. @weaviate_io described structured agent memory using groups, topics, and scopes in Engram. @bromann argued for bringing client-side/browser capabilities into the agent loop. @FactoryAI launched Missions on Factory Desktop.

Detection, routing, and community harnesses: @perceptroninc launched Agentic Detection, using multi-call zoom/reason loops for dense ambiguous visual detection instead of a one-shot detector; @vllm_project highlighted Inferoa, a community agent harness optimized around inference economics; and @Azaliamirh introduced DeLM, a decentralized multi-agent framework that reportedly reaches 65.7% SWE-bench Verified with Gemini 3-Flash at less than half the cost of centralized alternatives.

Optimization, retrieval, and scientific-modeling work worth tracking

Distributed Shampoo vs Muon remained a live optimization thread: A technically interesting sub-thread showed tuned Meta DistributedShampoo matching strong Muon baselines on a speedrun-style task after hyperparameter tuning and enabling pseudo-inverse stabilization. @arohan reported validation losses around 3.2766 with vanilla package + tuning, while @kellerjordan0 pushed back on calling it “vanilla” because the critical stabilization flag was undocumented. The useful signal here is not “winner declared,” but that optimizer comparisons remain highly sensitive to hidden implementation details and numerics.

Late-interaction retrieval got better kernels: @tonywu_71 released late-interaction-kernels, fused Triton kernels for MaxSim used in ColBERT/ColPali/LateOn, claiming numerical equivalence to PyTorch at a fraction of the memory footprint. This should matter for both training and serving multi-vector retrieval models.

Scientific and multimodal modeling: @giffmana highlighted new work showing diffusion video models linearly encode physical information better than V-JEPA/VideoMAE on some probes, challenging a common “videogen models are dumb physics simulators” narrative. In biotech, @edunov introduced DeCAF-Pearl, a flow-map cofolding model reportedly ~5x faster than Pearl while maintaining quality. On architecture research, @ZyphraAI released Zamba2-VL under Apache 2.0, extending hybrid SSM-Transformer ideas into VLMs.

Top tweets (by engagement)

Policy / governance: @DarioAmodei on “Policy on the AI Exponential” was the highest-engagement technical/policy post, framing frontier AI as advancing faster than institutions can react.

Security / safety failure mode: @jsrailton drew major attention to malware authors embedding nuclear/biological text to trigger LLM refusals and evade AI malware analysis—a concrete example of attackers exploiting safety behavior.

Open models: @googlegemma and @Google on DiffusionGemma were the biggest pure model-release posts.

Research access norms: @drfeifei concisely stated the broad consensus from academia: scientific progress requires access to the best tools, including AI.

Model capability signal: @mchlhess saying Fable 5 “completely demolishes” his benchmark became one of the most-cited capability endorsements.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Open-Weight Model Drops: North Mini Code and DiffusionGemma

Releasing Cohere North Mini Code (Activity: 388): Cohere officially released North Mini Code 1.0, with weights on Hugging Face, an FP8 variant, free access via OpenCode, and technical details in the HF blog / announcement. For deployment, Cohere recommends vLLM main plus cohere_melody>=0.9.0, serving with --max-model-len 320000, --tool-call-parser cohere_command4, --reasoning-parser cohere_command4, and --enable-auto-tool-choice; they also noted PRs were pushed based on LocalLLaMA feedback. Ecosystem support now includes an Unsloth GGUF conversion and reported MLX support, while Cohere says llama.cpp/quantization requests are being flagged internally. Commenters were broadly positive about Cohere doing LocalLLaMA-style early access, but pushed for day-0 llama.cpp/GGUF support in future releases. One commenter argued the published benchmarks appear worse than Qwen 3.6 35B A3B on most metrics, while others mainly asked about GGUF availability and a possible larger “Maxi Code” model.

A benchmark-focused commenter observed that Cohere North Mini Code appears worse than Qwen 3.6 35B A3B on almost every listed metric, suggesting the release may not be competitive on raw benchmark performance despite being welcomed as a new open model.

The Apache-2.0 license was specifically praised, which is relevant for developers evaluating commercial or permissive downstream use of the model.

DeepMind Just Dropped "DiffusionGemma" — Text Generation via Image-Style Diffusion Model (Activity: 355): ****Google DeepMind released DiffusionGemma, an Apache 2.0 open-weight 26B MoE text-diffusion model based on Gemma 4/Gemini Diffusion research that activates only 3.8B parameters and denoises a 256-token block in parallel instead of autoregressive token-by-token decoding. Google reports 1000+ tok/s on an H100 and 700+ tok/s on an RTX 5090, with quantized deployment fitting in roughly 18GB VRAM; the design shifts low-concurrency local inference from memory-bandwidth-bound sequential decoding toward compute-heavy parallel refinement, with support in Hugging Face, vLLM, and Unsloth. Commenters framed this as a significant development for real-time/local applications, but several emphasized that quality may lag standard autoregressive Gemma models: “I don’t need ultra-fast if it’s going to be stupid.” There was also broader positive surprise at Google’s recent pace of open model releases.

Several comments frame DiffusionGemma as a speed/quality tradeoff versus standard autoregres

この記事をシェア

TechCrunch AI★42026年6月20日 01:01

米国がアンソロピックの「Fable 5」発売を禁止、しかし市場は動じず

The Zvi★42026年6月19日 23:34

Claude Fable 5 と Mythos 5 の能力に関する記事

TLDR AI★42026年6月19日 09:00

OpenAI や Anthropic の安価な代替案に賭ける 130 億ドル規模の AI スタートアップ

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

今日は何も大きな出来事はありませんでした

キーポイント

影響分析

編集コメント

AI ツイートリキャップ

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. オープンウェイトモデルのリリース：North Mini Code と DiffusionGemma

AI Twitter Recap

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Open-Weight Model Drops: North Mini Code and DiffusionGemma

関連記事

今日は何も大きな出来事はありませんでした

キーポイント

影響分析

編集コメント

AI ツイートリキャップ

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. オープンウェイトモデルのリリース：North Mini Code と DiffusionGemma

AI Twitter Recap

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Open-Weight Model Drops: North Mini Code and DiffusionGemma

関連記事