Smol AI News·2026年4月24日 14:44·約13分

DeepSeek v4 の発表と AI ニュースまとめ

#LLM #オープンソース #MIT ライセンス #長文コンテキスト #DeepSeek

TL;DR

DeepSeek が MIT ライセンスの V4 Pro と Flash を発表し、100 万トークンコンテキストとエージェント機能でオープンウェイトモデルの基準を劇的に引き上げた。

AI深層分析2026年4月29日 14:08

最重要/ 5段階

深度40%

キーポイント

新アーキテクチャとラインナップ

V3 以来初の刷新となり、1.6T パラメータ（49B アクティブ）の「Pro」と 284B（13B アクティブ）の「Flash」の 2 層構成でリリースされた。

技術的革新と性能

100 万トークンのコンテキストウィンドウ、ハイブリッド推論モードを採用し、特に長文処理とコーディングエージェント性能が飛躍的に向上した。

業界へのインパクト

MIT ライセンスの採用によりオープンウェイトモデルとしての利用制限がなくなり、ベンチマークでは Kimi K2.6 や Claude Sonnet クラスと並び、トップクローズドモデルに肉薄する評価を得ている。

影響分析・編集コメントを表示

影響分析

この発表は、オープンソース AI の性能と利用可能性に対する業界の認識を根本から変える出来事であり、特に MIT ライセンスによる制約の撤廃は開発コミュニティへの民主化を加速させる。長文コンテキストとエージェント機能の強化により、実務レベルでの LLM 活用がより広範かつ低コストで実現可能となり、クローズドモデルへの依存度低下を促す転換点となるだろう。

編集コメント

MIT ライセンスという破格の条件と、100 万トークンという圧倒的なコンテキスト長を両立させた DeepSeek-V4 の登場は、オープンソース AI のパラダイムシフトを意味します。開発者は今すぐ、クローズドモデル依存からの脱却を検討すべきタイミングです。

静かな一日。

**2026年4月23日〜4月24日のAIニュース。私たちは12のサブレッド、544件のTwitter、およびそれ以上のDiscordサーバーを確認しました。AINewsのウェブサイトでは、過去のすべての号を検索できます。念のためお知らせしますが、AINewsは現在Latent Spaceの一部となっています。メール配信頻度についてオプトイン・オプトアウトが可能です！

AI Twitter recap

トップストーリー：DeepSeek V4**

何が起きたか

DeepSeekは、V3以降初となる主要なアーキテクチャ刷新モデルである「DeepSeek-V4 Pro」と「DeepSeek-V4 Flash」をリリースしました。これは同社として初めて明確な2段階のラインアップを示すもので、100万トークンのコンテキスト長、推論モードと非推論モードのハイブリッド機能、MITライセンスを採用しており、その技術レポートは複数の研究者から「今年最重要、あるいは最も優れたモデル論文の一つ」と称賛されるほど詳細な内容となっています。各反応における事実に基づく共通認識は、V4がオープンウェイトの長文コンテキスト処理やエージェント型コーディング性能を大幅に向上させた一方で、全体としてはトップクラスのクローズドな最前線モデルにはまだやや劣るというものです。独立したベンチマーク実施者によれば、V4 Proはオープンウェイト層において「第2位」の位置にあり、ベンチマークやモードによってはKimi K2.6、GLM-5.1、Claude Sonnetクラスの強力なモデルからOpusに近い性能を示し、特に長文コンテキスト処理とエージェント型パフォーマンスが顕著です。GPT-5.xやOpus 4.7にどの程度近づいているか、そしてこれが「民主化」への進展なのか、それともオープンラボが実際に再現できるような複雑さを持つアーキテクチャではないのかについては意見が分かれています。主要な情報源には、@ArtificialAnlys、@scaling01、@nrehiew_、@ben_burtenshaw、@TheZachMueller、@ZhihuFrontierによる詳細な解説コメント、および@vllm_project、@NVIDIAAI、@Togethercomputeからのインフラベンダーの投稿が含まれます。

コアな事実と技術的詳細

議論全体で繰り返し言及されている、最も具体的な技術的な主張は以下の通りです：

2つのモデル

V4 Pro：総パラメータ数1.6T / アクティブパラメータ数49B

V4 Flash：総パラメータ数284B / アクティブパラメータ数13B

@ArtificialAnlys、@teortaxesTex、@baseten、@NVIDIAAI によって報告

コンテキスト（文脈）

1Mトークン。@ArtificialAnlys によると、V3.2の128Kから拡大

複数の投稿者がこれを見どころとなる成果と位置づけている：「堅実な超長文コンテキスト」@teortaxesTex

トレーニング規模

32T〜33Tトークンが繰り返し言及

@nrehiew_ 氏は、1.6Tパラメータに対して32Tトークンを使用しており、つまりパラメータあたり約20トークンであると指摘

@teortaxesTex 氏は33Tを引用

@nrehiew_ 氏は事前学習の計算量を約1e25 FLOPsと推定

推論 / モード

@Togethercompute によると、DeepSeekは3つの推論モードを提供

@ArtificialAnlys によって、「思考あり/なし」のハイブリッドな位置づけが指摘

長文コンテキストアーキテクチャ

複数のスレッドで、新しいハイブリッド注意機構（アテンション）システムが要約されている：

共有KVベクトル

圧縮されたKVストリーム

圧縮トークンに対するスパース注意（アテンション）

近接コンテキストに対するローカル/スライディングウィンドウ注意（アテンション）

@ZhihuFrontier による最もコンパクトな公開要約：

共有キー・バリュー（key-value）ベクトルによるKVの2倍削減

c4a ≈ 4倍圧縮

c128a ≈ 128倍圧縮

圧縮トークンに対するtop-kスパース注意（アテンション）

128トークンのスライディングウィンドウ

1MコンテキストKVキャッシュ = 9.62 GiB/シーケンス（bf16）

DeepSeek V3.2の83.9 GiBよりも8.7倍小さい

FP4インデックスキャッシュとFP8注意（アテンション）キャッシュにより、さらに約2倍の削減

@ben_burtenshaw による要約：「KVキャッシュが10倍小さい」

@TheZachMueller氏と@TheZachMueller氏は、CSA（Contextual Self-Attention）とHCA（Hybrid Context Attention）のレイヤーパターンについて説明しており、V4 Flashでは一部のHCAの代わりにスライディングウィンドウ（sliding-window）レイヤーを用いた交互のレイヤー構成を採用しています。

量子化 / チェックポイント形式

@LambdaAPI氏によると、チェックポイントはFP4とFP8の混合です。

MoE（Mixture of Experts）のエキスパート重みはFP4

アテンション / ノルム / ルーターはFP8
主張：フルモデルは単一の8×B200ノードに収まる

推論ハードウェア / サービング

@NVIDIAAI氏によると、Blackwell Ultra上ではV4 Proがエージェントワークフローに対してユーザーあたりのインタラクティブ性を150+ TPS（Transactions Per Second）提供できます。

@NVIDIAAI氏は、vLLMを用いたV4 Proのパレート最適（pareto）なday-0パフォーマンスを発表しました。
@SemiAnalysis_氏は、H200、MI355、B200、B300、GB200/300 acrossのday-0サポートとベンチマークについて報告しています。
@Prince_Canuma氏は、256GBのMac上でDeepSeek4-Flashを実行しています。
@Prince_Canuma氏は、MLX量子化（quantization）が公開されたことを伝えています。
@simonw氏は、小RAM Macの実行可能性について質問しており、コミュニティの関心を示唆していますが、サポート状況は不完全であることを示しています。
@QuixiAI氏は、多くのローカルスタックがまだテンソル並列（tensor parallel）を欠いていることをユーザーに思い出させ、V4クラスモデルが推論インフラストラクチャに強い負荷をかけるという点で関連性があると指摘しています。

ライセンス / 利用可能性 / 価格

@ArtificialAnlys氏によると、MITライセンスです。

公式APIに加え、@Togethercompute、@baseten、@NousResearch、@Teknium経由で第三者による迅速な利用が可能になりました。
V4 Proの価格：入力/出力トークン100万個あたり$1.74 / $3.48
V4 Flashの価格：$0.14 / $0.28
@ArtificialAnlys氏によると、キャッシュヒット時の価格も提供されています。
@scaling01氏は、この価格設定を、将来の「Mythosレベル」の安価なコーディングモデルの一瞥と見なしています。

Reuters経由で投稿された@scaling01の引用：DeepSeekは、後半期にHuawei Ascend 950 スーパーノードが大量展開されれば、Pro版の価格が大幅に下落する可能性があると述べた。

独立した評価とV4の位置づけ

最も有用な独立ベンチマークの合成結果は、@ArtificialAnlysによるものであった：

V4 Pro Max: Artificial Analysis Intelligence Indexで52点

V3.2の42点から10ポイント上昇

第2位オープンウェイト推論モデルとなり、Kimi K2.6（54点）に次ぐ

V4 Flash Max: 47点

強力なミドル/ハイエンドのオープンモデルの周辺に位置づけられ、「Claude Sonnet 4.6 maxレベルの知能」と評価されている

GDPval-AA（エージェント型実世界作業）：

V4 Pro: 1554点、オープンウェイトモデルの中で首位

Kimi K2.6（1484点）、GLM-5.1（1535点）、MiniMax-M2.7（1514点）を上回る

AA-Omniscience

V4 Pro: -10点、V3.2より11ポイント改善

ただし、幻覚発生率は依然として94%

V4 Flash: 幻覚発生率96%

AA Indexの実行コスト

V4 Pro: 1,071ドル

V4 Flash: 113ドル

AA Indexで使用された出力トークン数

V4 Pro: 1億9,000万トークン

V4 Flash: 2億4,000万トークン

これは重要な注意点である：モデルが膨大なトークン量を出力する場合、トークン単価が安くても、総タスクコストは安くなるとは限らない

追加の評価視点：

@arena:

デビュー時、Text Arena全体でオープンモデル第2位

カテゴリ別勝利/順位：

#1 医療・ヘルスケア

#15 クリエイティブライティング

#18 多ターン対話

思考型バリアント：

#8 数学

#9 生命/物理/社会科学

@arenaはPro版とFlash版のトレードオフを強調している：

Pro版は約30位上位にランクイン

コストは12倍高い

Flashは中国語、医療、数学の分野で依然として競争力がある

@scaling01:

「~Opus 4.5の推定値は現時点で維持されている。少なくともSimpleBenchにおいては」

@scaling01:

V4は「GLM-5.1より明らかに優れているが、Opus 4.7、GPT-5.4、Gemini 3.1 Proにはまだ及ばない」

@scaling01は、6ヶ月以内のギャップを確認するスコアをリストアップしている：

ARC-AGI-1 約75%

ARC-AGI-2 約35%

GSO 約26%

METR 4.5〜5時間

WeirdML 約63%

@TheZachMueller:

自身の評価において、Flash@maxはPro@highと推論能力が同等

Proは知識（SimpleQA）により焦点を当てている

@VictorTaelin:

ベンチマークのバグを修正し、長時間実行するモデルにより長い実行時間を許容した後、DeepSeekとKimiは大幅に改善した

@mbusigin:

詳細のない単純な否定的な初期印象

@petergostev:

BullshitBenchにおいて、能力ではなく拒否や反発の行動に関するものであり、GPT-5.5は劣る結果となった。多くの読者が評価に懐疑的な環境でV4を比較するため、ここに含める

事実と意見

事実 / 比較的裏付けのある主張

V4 Pro / Flashは上記の仕様でリリースされ、MITライセンス、1Mコンテキスト、オープンな技術文書を提供：@ArtificialAnlys, @TheZachMueller

本アーキテクチャは、KVキャッシュ（Key-Value Cache）を劇的に削減する新しい長期コンテキスト注意機構を導入：@ZhihuFrontier, @ben_burtenshaw

独立したベンチマーク実施者は、V4 Proをオープンウェイト（Open Weights）の最高峰に近い位置に置くが、総合的なベストなプロプライエタリ（Proprietary）モデルには及ばないと広く見なしている：@ArtificialAnlys, @arena, @scaling01

DeepSeek V4は、いくつかの評価においてトークン使用量が非常に多い：@ArtificialAnlys

チェックポイントはFP4/FP8の混合精度を使用しており、8×B200ノードに収まるサイズです：@LambdaAPI

vLLMやその他のプロバイダーを通じて、エコシステムのサポートがDay 0から迅速に提供されました：@vllm_project, @SemiAnalysis_

意見／解釈

「V4はフロンティア層から約4〜5ヶ月遅れている」という@scaling01の見解は、@scaling01, @scaling01による情報に基づいた推定であり、測定された事実ではありません。

「トップ3のオープンモデル」対「フロンティアに近い唯一のオープンモデル」という議論は、@teortaxesTexによるものであり、ベンチマークへの信頼性とフレームングに関する部分が含まれています。

「私たちが持っている中で最も強力な事前学習モデル」という@teortaxesTexの主張は、規模とアーキテクチャに依存する意見であり、直接的なベンチマークでの優位性を示すものではありません。

「今年最も重要なAI論文」という@Dorialexanderの評価は、熱意を示すものであり、コンセンサス（合意）を意味するものではありません。

「これが研究のあるべき姿である」という@scaling01の発言は、能力だけでなく透明性やスタイルについて語ったものです。

「必ずしも民主化を促進する技術ではない」という@teortaxesTexの主張は、強力なアーキテクチャ的・政治的な解釈です。

異なる意見と分断

1) V4はフロンティアに近いのか、それとも明らかに遅れているのか？

より肯定的な見解

@scaling01：GPT-5.2 / Opus 4.5+のティアに位置づけるとする

@scaling01：SimpleBenchは約Opus 4.5をサポートする

@teortaxesTex：オープンソースの中で最も強力な事前学習の基盤であると主張し、ポストトレーニングが人々が考えている以上に効果的である可能性を示唆する

より懐疑的な見解

@scaling01：Opus 4.7 / GPT-5.4 / Gemini 3.1 Proを下回るとする

@scaling01: 閉鎖系ラボはより大規模なモデル、優れた科学・法律・医療分野のカバレッジ、GB200を用いた高速推論を持っているため、格差が再び拡大する可能性がある
@mbusigin: 初期の印象は「あまり良くない」
@teortaxesTex: K2.6やGLM 5.1のような磨き上げられたモデルは、内在する能力が低くとも、コーディングにおいては依然として使い心地が良いと感じられると述べている

2) V4の真の貢献はモデル品質か、それとも長文コンテキストシステムの設計か？

反応に大きな分断が見られ、多くの技術的な読者は生のベンチマークの順位よりも長文コンテキストアーキテクチャの方が重要だと考えている。

@teortaxesTex: 「彼らの目標は達成された：堅固な超長文コンテキスト」
@ben_burtenshaw: 長文コンテキストとアジェンティック（自律型エージェント）ポストトレーニングが「出会う」初のオープンモデル
@scaling01: 他のオープンラボがこのアーキテクチャの一部を採用すると予想している
@Dorialexander: Huaweiや主権に関する制約を、ハードウェアおよびメモリ/インターコネクト設計を再構築する機会として位置づけている
@jukan05: この論文は、NVIDIAのハードウェアロードマップがMoE（Mixture of Experts）や長文コンテキストモデルが進む方向性と unusually に整合していることを示す証拠として読んでいる

3) V4は「オープンな民主化」か、それとも複製が難しすぎるか？

これは最も鋭い戦略的な意見の相違点の一つだった。

@teortaxesTex: V4はアーキテクチャがほとんどのラボにとって複製に難しすぎるため、「厳密には民主化を促進する技術ではない」と述べている
@teortaxesTex: DeepSeek自身でさえ、リファクタリングを行わない限り、この特定のアーキテクチャを再度採用したくないかもしれないと示唆している
@stochasticchasm: 膨大なハイパーパラメータの複雑さが畏怖すべきものであることに言及している

それに対して、@Prince_Canuma氏と@Prince_Canuma氏は、エコシステムがすでに圧縮され、ローカルに近いApple Siliconでの使用向けにFlashを適応させており、推論側ではあってもトレーニング側でなくても「民主化していない」という主張を和らげていることを示している。

4）人々はFlashを見下しすぎているか？

いくつかの反応は、実用的な採用においてFlashがProよりも重要である可能性を示唆している。

@arena：Flashは価格対性能のフロンティアをシフトする

原文を表示

a quiet day.

AI News for 4/23/2026-4/24/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews' website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Top Story: DeepSeek V4

What happened

DeepSeek released DeepSeek-V4 Pro and DeepSeek-V4 Flash, its first major architecture refresh since V3 and first clear two-tier lineup, with 1M-token context, hybrid reasoning/non-reasoning modes, an MIT license, and a technical report detailed enough that multiple researchers called it one of the most important or best-written model papers of the year. Across the reactions, the factual consensus is that V4 materially advances open-weight long-context and agentic coding performance while remaining somewhat behind the top closed frontier models overall. Independent benchmarkers place V4 Pro around the #2 open-weights tier, roughly near Kimi K2.6 / GLM-5.1 / strong Claude Sonnet-class to Opus-ish depending on benchmark and mode, with especially strong long-context and agentic performance; opinions diverge on how close it is to GPT-5.x / Opus 4.7 and on whether this is “democratizing” progress or an architecture so complex that few open labs can realistically reproduce it. Key sources include deep-dive commentary from @ArtificialAnlys, @scaling01, @nrehiew_, @ben_burtenshaw, @TheZachMueller, @ZhihuFrontier, and infra/vendor posts from @vllm_project, @NVIDIAAI, and @Togethercompute.

Core facts and technical details

The most concrete technical claims repeated across the discussion:

Two models

V4 Pro: 1.6T total parameters / 49B active

V4 Flash: 284B total / 13B active

Reported by @ArtificialAnlys, @teortaxesTex, @baseten, @NVIDIAAI

Context

1M tokens, up from 128K in V3.2 per @ArtificialAnlys

Multiple posters frame this as the headline achievement: “solid ultra-long context” @teortaxesTex

Training scale

32T–33T tokens cited repeatedly

@nrehiew_ notes 32T tokens over 1.6T parameters, i.e. roughly 20 tokens/parameter

@teortaxesTex cites 33T

@nrehiew_ estimates pretraining compute at ~1e25 FLOPs

Reasoning / modes

DeepSeek exposes three reasoning modes per @Togethercompute

Hybrid “thinking/non-thinking” positioning noted by @ArtificialAnlys

Long-context architecture

Several threads summarize a new hybrid attention system:

shared KV vectors

compressed KV streams

sparse attention over compressed tokens

local/sliding-window attention for nearby context

@ZhihuFrontier gives the most compact public summary:

2× KV reduction via shared key-value vectors

c4a ≈ 4× compression

c128a ≈ 128× compression

top-k sparse attention on compressed tokens

128-token sliding window

1M context KV cache = 9.62 GiB/sequence (bf16)

8.7× smaller than DeepSeek V3.2’s 83.9 GiB

FP4 index cache + FP8 attention cache gives another ~2× reduction

@ben_burtenshaw condenses this to “10× smaller KV cache”

@TheZachMueller and @TheZachMueller describe CSA + HCA layer patterns, with alternating layers and V4 Flash using sliding-window layers instead of HCA in some places

Quantization / checkpoint format

@LambdaAPI: checkpoint is mixed FP4 + FP8

MoE expert weights in FP4

attention / norm / router in FP8

claim: the full model fits on a single 8×B200 node

Inference hardware / serving

@NVIDIAAI: on Blackwell Ultra, V4 Pro can deliver 150+ TPS/user interactivity for agentic workflows

@NVIDIAAI: published day-0 V4 Pro performance pareto using vLLM

@SemiAnalysis_: day-0 support and benchmarking across H200, MI355, B200, B300, GB200/300

@Prince_Canuma: DeepSeek4-Flash on 256GB Mac

@Prince_Canuma: MLX quants published

@simonw asks about smaller-RAM Mac viability, implying community interest but incomplete support story

@QuixiAI reminds users that many local stacks still lack tensor parallel, relevant because V4-class models strongly stress inference infra

License / availability / pricing

MIT license per @ArtificialAnlys

first-party API plus rapid third-party availability via @Togethercompute, @baseten, @NousResearch, @Teknium

V4 Pro pricing: $1.74 / $3.48 per 1M input/output tokens

V4 Flash pricing: $0.14 / $0.28

cache-hit pricing also given by @ArtificialAnlys

@scaling01 views the pricing as a glimpse of future “Mythos-level” cheap coding models

Reuters-via-posted quote from @scaling01: DeepSeek said Pro pricing could fall sharply once Huawei Ascend 950 supernodes are deployed at scale in H2

Independent evaluations and where V4 lands

The most useful independent benchmark synthesis came from @ArtificialAnlys:

V4 Pro Max: 52 on Artificial Analysis Intelligence Index

up 10 points from V3.2 at 42

becomes #2 open weights reasoning model, behind Kimi K2.6 (54)

V4 Flash Max: 47

positioned around strong mid/high open models, “Claude Sonnet 4.6 max level intelligence”

GDPval-AA (agentic real-world work):

V4 Pro: 1554, leading open-weight models

ahead of Kimi K2.6 (1484), GLM-5.1 (1535), MiniMax-M2.7 (1514)

AA-Omniscience

V4 Pro: -10, an 11-point improvement over V3.2

but still paired with 94% hallucination rate

V4 Flash: 96% hallucination rate

Cost to run AA Index

V4 Pro: $1,071

V4 Flash: $113

Output tokens used on AA Index

V4 Pro: 190M

V4 Flash: 240M

This is a major caveat: cheap per-token pricing does not imply cheap total task cost if the model spills huge token volumes

Additional eval perspectives:

@arena:

#2 open in Text Arena overall at debut

category wins/placements:

#1 Medical & Healthcare

#15 Creative Writing

#18 Multi-Turn

thinking variant:

#8 Math

#9 Life/Physical/Social Science

@arena emphasizes the Pro vs Flash tradeoff:

Pro ranks ~30 places higher

costs 12× more

Flash is still competitive in Chinese, medicine, math

@scaling01:

“~Opus 4.5 estimate holds for now, at least on SimpleBench”

@scaling01:

V4 is “definitely better than GLM-5.1 but not quite Opus 4.7, GPT-5.4 or Gemini 3.1 Pro”

@scaling01 lists what scores would confirm <6 month gap:

ARC-AGI-1 ~75%

ARC-AGI-2 ~35%

GSO ~26%

METR 4.5–5 hours

WeirdML ~63%

@TheZachMueller:

on his evals, Flash@max ≈ Pro@high on reasoning

Pro focuses more on knowledge (SimpleQA)

@VictorTaelin:

after fixing benchmark bugs and letting long-running models run longer, DeepSeek and Kimi improved materially

@mbusigin:

a simple negative early impression with no detail

@petergostev:

on BullshitBench, not about capability but refusal/pushback behavior, GPT-5.5 underperformed; included here because many readers compare V4 in an eval-skeptical environment

Facts vs opinions

Facts / relatively well-supported claims

V4 Pro / Flash were released with the specs above, MIT-licensed, 1M context, and open technical documentation: @ArtificialAnlys, @TheZachMueller

The architecture introduces a new long-context attention system with dramatic KV-cache reduction: @ZhihuFrontier, @ben_burtenshaw

Independent benchmarkers broadly place V4 Pro near the very top of open weights but below the best proprietary models overall: @ArtificialAnlys, @arena, @scaling01

DeepSeek V4 is heavily token-intensive in some evaluations: @ArtificialAnlys

The checkpoint uses FP4/FP8 mixed precision and can fit on an 8×B200 node: @LambdaAPI

Rapid ecosystem support arrived via vLLM and other providers day 0: @vllm_project, @SemiAnalysis_

Opinions / interpretation

“V4 is ~4–5 months behind the frontier” from @scaling01, @scaling01, @scaling01 is an informed estimate, not a measured fact

“Top three open” vs “only open model close to frontier” debate from @teortaxesTex is partly about benchmark trust and framing

“Strongest pretrained model we have” from @teortaxesTex is an opinion hinging on scale + architecture, not direct benchmark supremacy

“Most significant AI paper of the year” from @Dorialexander is enthusiasm, not consensus

“This is what research should look like” from @scaling01 speaks to transparency/style rather than only capability

“Not exactly a democratizing technology” from @teortaxesTex is a strong architectural/political interpretation

Different opinions and fault lines

1) Is V4 near frontier, or clearly behind?

More favorable

@scaling01: puts it at roughly GPT-5.2 / Opus 4.5+ tier

@scaling01: SimpleBench supports ~Opus 4.5

@teortaxesTex: argues it is the strongest pretraining base among opens and implies people are underestimating what post-training can do

More skeptical

@scaling01: below Opus 4.7 / GPT-5.4 / Gemini 3.1 Pro

@scaling01: the gap may widen again because closed labs have bigger models, better science/law/medicine coverage, faster inference with GB200s

@mbusigin: early impressions “not great”

@teortaxesTex: says polished models like K2.6 and GLM 5.1 may still feel better in coding despite lower intrinsic capacity

2) Is V4’s real contribution model quality, or long-context systems design?

A big split in reactions is that many technical readers think the long-context architecture matters more than the raw benchmark position.

@teortaxesTex: “They've completed their quest: Solid Ultra-Long Context”

@ben_burtenshaw: first open model where long context and agentic post-training “meet”

@scaling01: expects other open labs to adopt pieces of the architecture

@Dorialexander: frames Huawei/sovereignty constraints as an opportunity to reshape hardware and memory/interconnect design

@jukan05: reads the paper as evidence that NVIDIA’s hardware roadmap is unusually well aligned to where MoE/long-context models are going

3) Is V4 “open democratization,” or too hard to copy?

This was one of the sharpest strategic disagreements.

@teortaxesTex: says V4 is “not exactly a democratizing technology” because the architecture is too difficult for most labs to replicate

@teortaxesTex: suggests even DeepSeek may not want to do this exact architecture again without refactoring

@stochasticchasm: notes the sheer hyperparameter complexity is daunting

Against that, @Prince_Canuma and @Prince_Canuma show that the ecosystem is already compressing and adapting Flash for localish Apple Silicon use, softening the “not democratizing” claim on the inference side if not the training side

4) Are people underrating Flash?

Several reactions suggest Flash may be more important than Pro for practical adoption.

@arena: Flash shifts the price/performance front

この記事をシェア

KDnuggets重要度42026年6月25日 23:00

テキスト、画像、音声、動画を処理する 5 つのオープンソース・オムニ AI モデル

Smol AI News重要度42026年6月25日 14:44

今日は何も大きな出来事はありませんでした

KDnuggets重要度42026年6月27日 00:00

Apple Silicon で MLX を用いた言語モデルのファインチューニング

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Smol AI News·2026年4月24日 14:44·約13分

DeepSeek v4 の発表と AI ニュースまとめ

#LLM #オープンソース #MIT ライセンス #長文コンテキスト #DeepSeek

TL;DR

AI深層分析2026年4月29日 14:08

最重要/ 5段階

深度40%

キーポイント

新アーキテクチャとラインナップ

V3 以来初の刷新となり、1.6T パラメータ（49B アクティブ）の「Pro」と 284B（13B アクティブ）の「Flash」の 2 層構成でリリースされた。

技術的革新と性能

100 万トークンのコンテキストウィンドウ、ハイブリッド推論モードを採用し、特に長文処理とコーディングエージェント性能が飛躍的に向上した。

業界へのインパクト

影響分析・編集コメントを表示

影響分析

編集コメント

静かな一日。

AI Twitter recap

トップストーリー：DeepSeek V4**

何が起きたか

コアな事実と技術的詳細

議論全体で繰り返し言及されている、最も具体的な技術的な主張は以下の通りです：

2つのモデル

V4 Pro：総パラメータ数1.6T / アクティブパラメータ数49B

V4 Flash：総パラメータ数284B / アクティブパラメータ数13B

@ArtificialAnlys、@teortaxesTex、@baseten、@NVIDIAAI によって報告

コンテキスト（文脈）

1Mトークン。@ArtificialAnlys によると、V3.2の128Kから拡大

複数の投稿者がこれを見どころとなる成果と位置づけている：「堅実な超長文コンテキスト」@teortaxesTex

トレーニング規模

32T〜33Tトークンが繰り返し言及

@nrehiew_ 氏は、1.6Tパラメータに対して32Tトークンを使用しており、つまりパラメータあたり約20トークンであると指摘

@teortaxesTex 氏は33Tを引用

@nrehiew_ 氏は事前学習の計算量を約1e25 FLOPsと推定

推論 / モード

@Togethercompute によると、DeepSeekは3つの推論モードを提供

@ArtificialAnlys によって、「思考あり/なし」のハイブリッドな位置づけが指摘

長文コンテキストアーキテクチャ

複数のスレッドで、新しいハイブリッド注意機構（アテンション）システムが要約されている：

共有KVベクトル

圧縮されたKVストリーム

圧縮トークンに対するスパース注意（アテンション）

近接コンテキストに対するローカル/スライディングウィンドウ注意（アテンション）

@ZhihuFrontier による最もコンパクトな公開要約：

共有キー・バリュー（key-value）ベクトルによるKVの2倍削減

c4a ≈ 4倍圧縮

c128a ≈ 128倍圧縮

圧縮トークンに対するtop-kスパース注意（アテンション）

128トークンのスライディングウィンドウ

1MコンテキストKVキャッシュ = 9.62 GiB/シーケンス（bf16）

DeepSeek V3.2の83.9 GiBよりも8.7倍小さい

FP4インデックスキャッシュとFP8注意（アテンション）キャッシュにより、さらに約2倍の削減

@ben_burtenshaw による要約：「KVキャッシュが10倍小さい」

@TheZachMueller氏と@TheZachMueller氏は、CSA（Contextual Self-Attention）とHCA（Hybrid Context Attention）のレイヤーパターンについて説明しており、V4 Flashでは一部のHCAの代わりにスライディングウィンドウ（sliding-window）レイヤーを用いた交互のレイヤー構成を採用しています。

量子化 / チェックポイント形式

@LambdaAPI氏によると、チェックポイントはFP4とFP8の混合です。

MoE（Mixture of Experts）のエキスパート重みはFP4

アテンション / ノルム / ルーターはFP8
主張：フルモデルは単一の8×B200ノードに収まる

推論ハードウェア / サービング

@NVIDIAAI氏は、vLLMを用いたV4 Proのパレート最適（pareto）なday-0パフォーマンスを発表しました。
@SemiAnalysis_氏は、H200、MI355、B200、B300、GB200/300 acrossのday-0サポートとベンチマークについて報告しています。
@Prince_Canuma氏は、256GBのMac上でDeepSeek4-Flashを実行しています。
@Prince_Canuma氏は、MLX量子化（quantization）が公開されたことを伝えています。
@simonw氏は、小RAM Macの実行可能性について質問しており、コミュニティの関心を示唆していますが、サポート状況は不完全であることを示しています。
@QuixiAI氏は、多くのローカルスタックがまだテンソル並列（tensor parallel）を欠いていることをユーザーに思い出させ、V4クラスモデルが推論インフラストラクチャに強い負荷をかけるという点で関連性があると指摘しています。

ライセンス / 利用可能性 / 価格

@ArtificialAnlys氏によると、MITライセンスです。

公式APIに加え、@Togethercompute、@baseten、@NousResearch、@Teknium経由で第三者による迅速な利用が可能になりました。
V4 Proの価格：入力/出力トークン100万個あたり$1.74 / $3.48
V4 Flashの価格：$0.14 / $0.28
@ArtificialAnlys氏によると、キャッシュヒット時の価格も提供されています。
@scaling01氏は、この価格設定を、将来の「Mythosレベル」の安価なコーディングモデルの一瞥と見なしています。

Reuters経由で投稿された@scaling01の引用：DeepSeekは、後半期にHuawei Ascend 950 スーパーノードが大量展開されれば、Pro版の価格が大幅に下落する可能性があると述べた。

独立した評価とV4の位置づけ

最も有用な独立ベンチマークの合成結果は、@ArtificialAnlysによるものであった：

V4 Pro Max: Artificial Analysis Intelligence Indexで52点

V3.2の42点から10ポイント上昇

第2位オープンウェイト推論モデルとなり、Kimi K2.6（54点）に次ぐ

V4 Flash Max: 47点

強力なミドル/ハイエンドのオープンモデルの周辺に位置づけられ、「Claude Sonnet 4.6 maxレベルの知能」と評価されている

GDPval-AA（エージェント型実世界作業）：

V4 Pro: 1554点、オープンウェイトモデルの中で首位

Kimi K2.6（1484点）、GLM-5.1（1535点）、MiniMax-M2.7（1514点）を上回る

AA-Omniscience

V4 Pro: -10点、V3.2より11ポイント改善

ただし、幻覚発生率は依然として94%

V4 Flash: 幻覚発生率96%

AA Indexの実行コスト

V4 Pro: 1,071ドル

V4 Flash: 113ドル

AA Indexで使用された出力トークン数

V4 Pro: 1億9,000万トークン

V4 Flash: 2億4,000万トークン

これは重要な注意点である：モデルが膨大なトークン量を出力する場合、トークン単価が安くても、総タスクコストは安くなるとは限らない

追加の評価視点：

@arena:

デビュー時、Text Arena全体でオープンモデル第2位

カテゴリ別勝利/順位：

#1 医療・ヘルスケア

#15 クリエイティブライティング

#18 多ターン対話

思考型バリアント：

#8 数学

#9 生命/物理/社会科学

@arenaはPro版とFlash版のトレードオフを強調している：

Pro版は約30位上位にランクイン

コストは12倍高い

Flashは中国語、医療、数学の分野で依然として競争力がある

@scaling01:

「~Opus 4.5の推定値は現時点で維持されている。少なくともSimpleBenchにおいては」

@scaling01:

V4は「GLM-5.1より明らかに優れているが、Opus 4.7、GPT-5.4、Gemini 3.1 Proにはまだ及ばない」

@scaling01は、6ヶ月以内のギャップを確認するスコアをリストアップしている：

ARC-AGI-1 約75%

ARC-AGI-2 約35%

GSO 約26%

METR 4.5〜5時間

WeirdML 約63%

@TheZachMueller:

自身の評価において、Flash@maxはPro@highと推論能力が同等

Proは知識（SimpleQA）により焦点を当てている

@VictorTaelin:

ベンチマークのバグを修正し、長時間実行するモデルにより長い実行時間を許容した後、DeepSeekとKimiは大幅に改善した

@mbusigin:

詳細のない単純な否定的な初期印象

@petergostev:

事実と意見

事実 / 比較的裏付けのある主張

V4 Pro / Flashは上記の仕様でリリースされ、MITライセンス、1Mコンテキスト、オープンな技術文書を提供：@ArtificialAnlys, @TheZachMueller

本アーキテクチャは、KVキャッシュ（Key-Value Cache）を劇的に削減する新しい長期コンテキスト注意機構を導入：@ZhihuFrontier, @ben_burtenshaw

独立したベンチマーク実施者は、V4 Proをオープンウェイト（Open Weights）の最高峰に近い位置に置くが、総合的なベストなプロプライエタリ（Proprietary）モデルには及ばないと広く見なしている：@ArtificialAnlys, @arena, @scaling01

DeepSeek V4は、いくつかの評価においてトークン使用量が非常に多い：@ArtificialAnlys

チェックポイントはFP4/FP8の混合精度を使用しており、8×B200ノードに収まるサイズです：@LambdaAPI

vLLMやその他のプロバイダーを通じて、エコシステムのサポートがDay 0から迅速に提供されました：@vllm_project, @SemiAnalysis_

意見／解釈

「V4はフロンティア層から約4〜5ヶ月遅れている」という@scaling01の見解は、@scaling01, @scaling01による情報に基づいた推定であり、測定された事実ではありません。

「トップ3のオープンモデル」対「フロンティアに近い唯一のオープンモデル」という議論は、@teortaxesTexによるものであり、ベンチマークへの信頼性とフレームングに関する部分が含まれています。

「私たちが持っている中で最も強力な事前学習モデル」という@teortaxesTexの主張は、規模とアーキテクチャに依存する意見であり、直接的なベンチマークでの優位性を示すものではありません。

「今年最も重要なAI論文」という@Dorialexanderの評価は、熱意を示すものであり、コンセンサス（合意）を意味するものではありません。

「これが研究のあるべき姿である」という@scaling01の発言は、能力だけでなく透明性やスタイルについて語ったものです。

「必ずしも民主化を促進する技術ではない」という@teortaxesTexの主張は、強力なアーキテクチャ的・政治的な解釈です。

異なる意見と分断

1) V4はフロンティアに近いのか、それとも明らかに遅れているのか？

より肯定的な見解

@scaling01：GPT-5.2 / Opus 4.5+のティアに位置づけるとする

@scaling01：SimpleBenchは約Opus 4.5をサポートする

@teortaxesTex：オープンソースの中で最も強力な事前学習の基盤であると主張し、ポストトレーニングが人々が考えている以上に効果的である可能性を示唆する

より懐疑的な見解

@scaling01：Opus 4.7 / GPT-5.4 / Gemini 3.1 Proを下回るとする

@scaling01: 閉鎖系ラボはより大規模なモデル、優れた科学・法律・医療分野のカバレッジ、GB200を用いた高速推論を持っているため、格差が再び拡大する可能性がある
@mbusigin: 初期の印象は「あまり良くない」
@teortaxesTex: K2.6やGLM 5.1のような磨き上げられたモデルは、内在する能力が低くとも、コーディングにおいては依然として使い心地が良いと感じられると述べている

2) V4の真の貢献はモデル品質か、それとも長文コンテキストシステムの設計か？

反応に大きな分断が見られ、多くの技術的な読者は生のベンチマークの順位よりも長文コンテキストアーキテクチャの方が重要だと考えている。

@teortaxesTex: 「彼らの目標は達成された：堅固な超長文コンテキスト」
@ben_burtenshaw: 長文コンテキストとアジェンティック（自律型エージェント）ポストトレーニングが「出会う」初のオープンモデル
@scaling01: 他のオープンラボがこのアーキテクチャの一部を採用すると予想している
@Dorialexander: Huaweiや主権に関する制約を、ハードウェアおよびメモリ/インターコネクト設計を再構築する機会として位置づけている
@jukan05: この論文は、NVIDIAのハードウェアロードマップがMoE（Mixture of Experts）や長文コンテキストモデルが進む方向性と unusually に整合していることを示す証拠として読んでいる

3) V4は「オープンな民主化」か、それとも複製が難しすぎるか？

これは最も鋭い戦略的な意見の相違点の一つだった。

@teortaxesTex: V4はアーキテクチャがほとんどのラボにとって複製に難しすぎるため、「厳密には民主化を促進する技術ではない」と述べている
@teortaxesTex: DeepSeek自身でさえ、リファクタリングを行わない限り、この特定のアーキテクチャを再度採用したくないかもしれないと示唆している
@stochasticchasm: 膨大なハイパーパラメータの複雑さが畏怖すべきものであることに言及している

それに対して、@Prince_Canuma氏と@Prince_Canuma氏は、エコシステムがすでに圧縮され、ローカルに近いApple Siliconでの使用向けにFlashを適応させており、推論側ではあってもトレーニング側でなくても「民主化していない」という主張を和らげていることを示している。

4）人々はFlashを見下しすぎているか？

いくつかの反応は、実用的な採用においてFlashがProよりも重要である可能性を示唆している。

@arena：Flashは価格対性能のフロンティアをシフトする

原文を表示

a quiet day.

AI News for 4/23/2026-4/24/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews' website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Top Story: DeepSeek V4

What happened

Core facts and technical details

The most concrete technical claims repeated across the discussion:

Two models

V4 Pro: 1.6T total parameters / 49B active

V4 Flash: 284B total / 13B active

Reported by @ArtificialAnlys, @teortaxesTex, @baseten, @NVIDIAAI

Context

1M tokens, up from 128K in V3.2 per @ArtificialAnlys

Multiple posters frame this as the headline achievement: “solid ultra-long context” @teortaxesTex

Training scale

32T–33T tokens cited repeatedly

@nrehiew_ notes 32T tokens over 1.6T parameters, i.e. roughly 20 tokens/parameter

@teortaxesTex cites 33T

@nrehiew_ estimates pretraining compute at ~1e25 FLOPs

Reasoning / modes

DeepSeek exposes three reasoning modes per @Togethercompute

Hybrid “thinking/non-thinking” positioning noted by @ArtificialAnlys

Long-context architecture

Several threads summarize a new hybrid attention system:

shared KV vectors

compressed KV streams

sparse attention over compressed tokens

local/sliding-window attention for nearby context

@ZhihuFrontier gives the most compact public summary:

2× KV reduction via shared key-value vectors

c4a ≈ 4× compression

c128a ≈ 128× compression

top-k sparse attention on compressed tokens

128-token sliding window

1M context KV cache = 9.62 GiB/sequence (bf16)

8.7× smaller than DeepSeek V3.2’s 83.9 GiB

FP4 index cache + FP8 attention cache gives another ~2× reduction

@ben_burtenshaw condenses this to “10× smaller KV cache”

@TheZachMueller and @TheZachMueller describe CSA + HCA layer patterns, with alternating layers and V4 Flash using sliding-window layers instead of HCA in some places

Quantization / checkpoint format

@LambdaAPI: checkpoint is mixed FP4 + FP8

MoE expert weights in FP4

attention / norm / router in FP8

claim: the full model fits on a single 8×B200 node

Inference hardware / serving

@NVIDIAAI: on Blackwell Ultra, V4 Pro can deliver 150+ TPS/user interactivity for agentic workflows

@NVIDIAAI: published day-0 V4 Pro performance pareto using vLLM

@SemiAnalysis_: day-0 support and benchmarking across H200, MI355, B200, B300, GB200/300

@Prince_Canuma: DeepSeek4-Flash on 256GB Mac

@Prince_Canuma: MLX quants published

@simonw asks about smaller-RAM Mac viability, implying community interest but incomplete support story

@QuixiAI reminds users that many local stacks still lack tensor parallel, relevant because V4-class models strongly stress inference infra

License / availability / pricing

MIT license per @ArtificialAnlys

first-party API plus rapid third-party availability via @Togethercompute, @baseten, @NousResearch, @Teknium

V4 Pro pricing: $1.74 / $3.48 per 1M input/output tokens

V4 Flash pricing: $0.14 / $0.28

cache-hit pricing also given by @ArtificialAnlys

@scaling01 views the pricing as a glimpse of future “Mythos-level” cheap coding models

Reuters-via-posted quote from @scaling01: DeepSeek said Pro pricing could fall sharply once Huawei Ascend 950 supernodes are deployed at scale in H2

Independent evaluations and where V4 lands

The most useful independent benchmark synthesis came from @ArtificialAnlys:

V4 Pro Max: 52 on Artificial Analysis Intelligence Index

up 10 points from V3.2 at 42

becomes #2 open weights reasoning model, behind Kimi K2.6 (54)

V4 Flash Max: 47

positioned around strong mid/high open models, “Claude Sonnet 4.6 max level intelligence”

GDPval-AA (agentic real-world work):

V4 Pro: 1554, leading open-weight models

ahead of Kimi K2.6 (1484), GLM-5.1 (1535), MiniMax-M2.7 (1514)

AA-Omniscience

V4 Pro: -10, an 11-point improvement over V3.2

but still paired with 94% hallucination rate

V4 Flash: 96% hallucination rate

Cost to run AA Index

V4 Pro: $1,071

V4 Flash: $113

Output tokens used on AA Index

V4 Pro: 190M

V4 Flash: 240M

This is a major caveat: cheap per-token pricing does not imply cheap total task cost if the model spills huge token volumes

Additional eval perspectives:

@arena:

#2 open in Text Arena overall at debut

category wins/placements:

#1 Medical & Healthcare

#15 Creative Writing

#18 Multi-Turn

thinking variant:

#8 Math

#9 Life/Physical/Social Science

@arena emphasizes the Pro vs Flash tradeoff:

Pro ranks ~30 places higher

costs 12× more

Flash is still competitive in Chinese, medicine, math

@scaling01:

“~Opus 4.5 estimate holds for now, at least on SimpleBench”

@scaling01:

V4 is “definitely better than GLM-5.1 but not quite Opus 4.7, GPT-5.4 or Gemini 3.1 Pro”

@scaling01 lists what scores would confirm <6 month gap:

ARC-AGI-1 ~75%

ARC-AGI-2 ~35%

GSO ~26%

METR 4.5–5 hours

WeirdML ~63%

@TheZachMueller:

on his evals, Flash@max ≈ Pro@high on reasoning

Pro focuses more on knowledge (SimpleQA)

@VictorTaelin:

after fixing benchmark bugs and letting long-running models run longer, DeepSeek and Kimi improved materially

@mbusigin:

a simple negative early impression with no detail

@petergostev:

on BullshitBench, not about capability but refusal/pushback behavior, GPT-5.5 underperformed; included here because many readers compare V4 in an eval-skeptical environment

Facts vs opinions

Facts / relatively well-supported claims

V4 Pro / Flash were released with the specs above, MIT-licensed, 1M context, and open technical documentation: @ArtificialAnlys, @TheZachMueller

The architecture introduces a new long-context attention system with dramatic KV-cache reduction: @ZhihuFrontier, @ben_burtenshaw

Independent benchmarkers broadly place V4 Pro near the very top of open weights but below the best proprietary models overall: @ArtificialAnlys, @arena, @scaling01

DeepSeek V4 is heavily token-intensive in some evaluations: @ArtificialAnlys

The checkpoint uses FP4/FP8 mixed precision and can fit on an 8×B200 node: @LambdaAPI

Rapid ecosystem support arrived via vLLM and other providers day 0: @vllm_project, @SemiAnalysis_

Opinions / interpretation

“V4 is ~4–5 months behind the frontier” from @scaling01, @scaling01, @scaling01 is an informed estimate, not a measured fact

“Top three open” vs “only open model close to frontier” debate from @teortaxesTex is partly about benchmark trust and framing

“Strongest pretrained model we have” from @teortaxesTex is an opinion hinging on scale + architecture, not direct benchmark supremacy

“Most significant AI paper of the year” from @Dorialexander is enthusiasm, not consensus

“This is what research should look like” from @scaling01 speaks to transparency/style rather than only capability

“Not exactly a democratizing technology” from @teortaxesTex is a strong architectural/political interpretation

Different opinions and fault lines

1) Is V4 near frontier, or clearly behind?

More favorable

@scaling01: puts it at roughly GPT-5.2 / Opus 4.5+ tier

@scaling01: SimpleBench supports ~Opus 4.5

@teortaxesTex: argues it is the strongest pretraining base among opens and implies people are underestimating what post-training can do

More skeptical

@scaling01: below Opus 4.7 / GPT-5.4 / Gemini 3.1 Pro

@scaling01: the gap may widen again because closed labs have bigger models, better science/law/medicine coverage, faster inference with GB200s

@mbusigin: early impressions “not great”

@teortaxesTex: says polished models like K2.6 and GLM 5.1 may still feel better in coding despite lower intrinsic capacity

2) Is V4’s real contribution model quality, or long-context systems design?

A big split in reactions is that many technical readers think the long-context architecture matters more than the raw benchmark position.

@teortaxesTex: “They've completed their quest: Solid Ultra-Long Context”

@ben_burtenshaw: first open model where long context and agentic post-training “meet”

@scaling01: expects other open labs to adopt pieces of the architecture

@Dorialexander: frames Huawei/sovereignty constraints as an opportunity to reshape hardware and memory/interconnect design

@jukan05: reads the paper as evidence that NVIDIA’s hardware roadmap is unusually well aligned to where MoE/long-context models are going

3) Is V4 “open democratization,” or too hard to copy?

This was one of the sharpest strategic disagreements.

@teortaxesTex: says V4 is “not exactly a democratizing technology” because the architecture is too difficult for most labs to replicate

@teortaxesTex: suggests even DeepSeek may not want to do this exact architecture again without refactoring

@stochasticchasm: notes the sheer hyperparameter complexity is daunting

Against that, @Prince_Canuma and @Prince_Canuma show that the ecosystem is already compressing and adapting Flash for localish Apple Silicon use, softening the “not democratizing” claim on the inference side if not the training side

4) Are people underrating Flash?

Several reactions suggest Flash may be more important than Pro for practical adoption.

@arena: Flash shifts the price/performance front

この記事をシェア

KDnuggets重要度42026年6月25日 23:00

テキスト、画像、音声、動画を処理する 5 つのオープンソース・オムニ AI モデル

Smol AI News重要度42026年6月25日 14:44

今日は何も大きな出来事はありませんでした

KDnuggets重要度42026年6月27日 00:00

Apple Silicon で MLX を用いた言語モデルのファインチューニング

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

キーポイント

影響分析

編集コメント

AI Twitter recap

何が起きたか

コアな事実と技術的詳細

独立した評価とV4の位置づけ

事実と意見

事実 / 比較的裏付けのある主張

意見／解釈

異なる意見と分断

1) V4はフロンティアに近いのか、それとも明らかに遅れているのか？

2) V4の真の貢献はモデル品質か、それとも長文コンテキストシステムの設計か？

3) V4は「オープンな民主化」か、それとも複製が難しすぎるか？

4）人々はFlashを見下しすぎているか？

AI Twitter Recap

What happened

Core facts and technical details

Independent evaluations and where V4 lands

Facts vs opinions

Facts / relatively well-supported claims

Opinions / interpretation

Different opinions and fault lines

1) Is V4 near frontier, or clearly behind?

2) Is V4’s real contribution model quality, or long-context systems design?

3) Is V4 “open democratization,” or too hard to copy?

4) Are people underrating Flash?

関連記事

キーポイント

影響分析

編集コメント

AI Twitter recap

何が起きたか

コアな事実と技術的詳細

独立した評価とV4の位置づけ

事実と意見

事実 / 比較的裏付けのある主張

意見／解釈

異なる意見と分断

1) V4はフロンティアに近いのか、それとも明らかに遅れているのか？

2) V4の真の貢献はモデル品質か、それとも長文コンテキストシステムの設計か？

3) V4は「オープンな民主化」か、それとも複製が難しすぎるか？

4）人々はFlashを見下しすぎているか？

AI Twitter Recap

What happened

Core facts and technical details

Independent evaluations and where V4 lands

Facts vs opinions

Facts / relatively well-supported claims

Opinions / interpretation

Different opinions and fault lines

1) Is V4 near frontier, or clearly behind?

2) Is V4’s real contribution model quality, or long-context systems design?

3) Is V4 “open democratization,” or too hard to copy?

4) Are people underrating Flash?

関連記事