読み込み中…

Simon Willison Blog·2026年4月24日 15:01·約3分

DeepSeek V4：最前線に近い性能、価格は数分の1

#LLM #MoE（Experts混合）#オープンソースモデル #DeepSeek #ローカル推論

TL;DR

DeepSeekがMoEアーキテクチャを採用した1.6兆パラメータのオープンソースモデル「V4-Pro」と「V4-Flash」をリリースし、フロントランナー並みの性能とローカル実行可能性を実現しながらも、API利用料を従来の数分の一に抑えている。

AI深層分析2026年4月24日 15:34

最重要/ 5段階

深度40%

キーポイント

モデル仕様とアーキテクチャ

V4シリーズは100万トークンコンテキストのMoE方式を採用し、Proは総パラメータ1.6T/アクティブ49B、Flashは284B/13Bと推論効率を最適化している。

オープンライセンスとローカル展開

MITライセンスの採用により商用利用が自由で、量子化技術を活用すれば128GB RAM搭載のマシン（例：M5 MacBook Pro）でのローカル推論が現実的な選択肢となる。

実証テストと品質比較

OpenRouter経由のCLIテストではSVG生成タスクにおいてFlashモデルがProより安定した出力を示し、用途に応じたモデル選択の重要性とV3シリーズからの進化が確認された。

破壊的な価格戦略

API料金が入力$0.14/出力$0.28（百万トークンあたり）に設定されており、主要モデルの1/5〜1/10レベルでフロントランナー級の性能を提供するコストパフォーマンスを実現している。

重要な引用

Both models are 1 million token context Mixture of Experts.

I think this makes DeepSeek-V4-Pro the new largest open weights model.

So the pelicans are pretty good, but what's really notable here is the cost. DeepSeek V4 is a very, very inexpensive model.

影響分析・編集コメントを表示

影響分析

DeepSeekのV4シリーズは、オープンソースライセンスと破壊的な低価格戦略を組み合わせることで、企業向けAI導入のハードルを大幅に引き下げている。特にMoEアーキテクチャと量子化技術の成熟により、ハイエンドGPUに依存しないローカル推論の実現可能性が高まり、AIインフラの民主化が現実味を帯びている。この価格競争は既存のクラウドAI事業者とオープンソースコミュニティの双方に大きな影響を与え、次世代の標準価格帯を再定義する可能性が高い。

編集コメント

「性能×オープンライセンス×低料金」のトリコロールを提示したV4シリーズは、次期AI市場の価格破壊とローカル推論普及の起爆剤となるだろう。

中国のAIラボDeepSeekが最後にリリースしたモデルはV3.2（およびV3.2 Speciale）去年12月でした。彼らは今、注目のV4シリーズの第1弾として、2つのプレビューモデルDeepSeek-V4-ProとDeepSeek-V4-Flashを公開しました。

両モデルとも100万トークン（token）のコンテキスト（context）を持つMixture of Experts（MoE、専門家混合モデル）です。Proは総パラメータ数1.6T、アクティブ（active）49B。Flashは総パラメータ数284B、アクティブ13Bです。標準のMITライセンス（license）が適用されています。

これにより、DeepSeek-V4-Proが新たな最大規模のオープンウェights（open weights）モデルになると考えられます。Kimi K2.6（1.1T）やGLM-5.1（754B）よりも大きく、DeepSeek V3.2（685B）の倍以上のサイズです。

Hugging Face上でのProのファイルサイズは865GB、Flashは160GBです。軽量化（quantization）を施したFlashが私の128GB M5 MacBook Proで動作することを期待しています。ディスクから必要なアクティブな専門家のみをストリーミングできれば、Proモデルが動作する*可能性*もあります。

とりあえず、llm-openrouterを使用して、OpenRouter経由でモデルを試しました：

llm install llm-openrouter\nllm openrouter refresh\nllm -m openrouter/deepseek/deepseek-v4-pro 'Generate an SVG of a pelican riding a bicycle'

ここがペリカン（pelican）のDeepSeek-V4-Flash版です：

image

そしてDeepSeek-V4-Pro版:

image

比較のため、去年12月のDeepSeek V3.2、8月のV3.1、2025年3月のV3-0324から得たペリカンをご覧ください。

ペリカンの品質はなかなか良いですが、ここで本当に注目すべきなのは*コスト*です。DeepSeek V4は非常に、非常に低価格なモデルです。

ここがDeepSeekの料金ページです。Flashは入力100万トークンあたり$0.14、出力100万トークンあたり$0.28、Proは入力$1.74、出力$3.48を請求しています。

ここがGemini、OpenAI、Anthropicのフロンティアモデル（frontier models）との比較表です：

Model\nInput ($/M)\nOutput ($/M)\n\n\nDeepSeek V4 Flash\n$0.14\n$0.28\n\n\nGPT-5.4 Nano\n$0.20\n$1.25\n\n\nGemini 3.1 Flash-Lite\n$0.25\n$1.50\n\n\nGemini 3 Flash Preview\n$0.50\n$3\n\n\nGPT-5.4 Mini\n$0.75\n$4.50\n\n\nClaude Haiku 4.5\n$1\n$5\n\n\nDeepSeek V4 Pro\n$1.74\n$3.48\n\n\nGemini 3.1 Pro\n$2\n$12\n\n\nGPT-5.4\n$2.50\n$15\n\n\nClaude Sonnet 4.6\n$3\n$15\n\n\nClaude Opus 4.7\n$5\n$25\n\n\nGPT-5.5\n$5\n$30

DeepSeek-V4-Flashは小規模モデルの中で最安値であり、OpenAIのGPT-5.4 Nanoをも凌駕しています。DeepSeek-V4-Proは大規模フロンティアモデルの中で最安値です。

DeepSeekの論文からのこの注記は、なぜこれらのモデルをこれほど低価格で提供できるのかを説明する助けとなる。今回のリリースでは、特に長いコンテキストプロンプト（context prompts）において、効率性に多大な注力が払われている。

1Mトークンのコンテキスト（context）を想定した場合、アクティブパラメータ数が多いDeepSeek-V4-Proでさえ、単一トークンあたりのFLOPs（浮動小数点演算数。同等のFP8 FLOPsで測定）はDeepSeek-V3.2の27％、KVキャッシュ（Key-Value Cache）サイズは10％に留まる。さらに、アクティブパラメータ数が少ないDeepSeek-V4-Flashは効率性をさらに押し上げている。1Mトークンのコンテキスト設定において、単一トークンあたりのFLOPsはDeepSeek-V3.2の10％、KVキャッシュサイズは7％を達成している。

DeepSeekが論文内で報告しているベンチマーク（benchmarks）によると、Proモデルは他のフロンティアモデルと互角の競争力を持つことが示されているが、以下の注記が付いている。

推論トークン（reasoning tokens）の拡張により、DeepSeek-V4-Pro-Maxは標準的な推論ベンチマークにおいてGPT-5.2やGemini-3.0-Proに対して優れたパフォーマンスを示している。しかしながら、そのパフォーマンスはGPT-5.4やGemini-3.1-Proにはわずかに及ばないため、最先端のフロンティアモデルに対して約3〜6ヶ月の開発遅れがあることが示唆されている。

今後、Unslothチームが間もなく量子化バージョン（quantized versions）のセットを公開すると予想しているため、huggingface.co/unsloth/modelsを注視している。このFlashモデルが自分のマシンでどれほど快適に動作するかを見るのは非常に興味深いだろう。

タグ: ai, generative-ai, llms, llm, llm-pricing, pelican-riding-a-bicycle, deepseek, llm-release, openrouter, ai-in-china

原文を表示

Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) last December. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash.

Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They're using the standard MIT license.

I think this makes DeepSeek-V4-Pro the new largest open weights model. It's larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B) and more than twice the size of DeepSeek V3.2 (685B).

Pro is 865GB on Hugging Face, Flash is 160GB. I'm hoping that a lightly quantized Flash will run on my 128GB M5 MacBook Pro. It's *possible* the Pro model may run on it if I can stream just the necessary active experts from disk.

For the moment I tried the models out via OpenRouter, using llm-openrouter:

code

llm install llm-openrouter
llm openrouter refresh
llm -m openrouter/deepseek/deepseek-v4-pro 'Generate an SVG of a pelican riding a bicycle'

Here's the pelican for DeepSeek-V4-Flash:

Excellent bicycle - good frame shape, nice chain, even has a reflector on the front wheel. Pelican has a mean looking expression but has its wings on the handlebars and feet on the pedals. Pouch is a little sharp.

And for DeepSeek-V4-Pro:

Another solid bicycle, albeit the spokes are a little jagged and the frame is compressed a bit. Pelican has gone a bit wrong - it has a VERY large body, only one wing, a weirdly hairy backside and generally loos like it was drown be a different artist from the bicycle.

For comparison, take a look at the pelicans I got from DeepSeek V3.2 in December, V3.1 in August, and V3-0324 in March 2025.

So the pelicans are pretty good, but what's really notable here is the *cost*. DeepSeek V4 is a very, very inexpensive model.

Here's DeepSeek's pricing page. They're charging $0.14/million tokens input and $0.28/million tokens output for Flash, and $1.74/million input and $3.48/million output for Pro.

Here's a comparison table with the frontier models from Gemini, OpenAI and Anthropic:

Model

Input ($/M)

Output ($/M)

DeepSeek V4 Flash

$0.14

$0.28

GPT-5.4 Nano

$0.20

$1.25

Gemini 3.1 Flash-Lite

$0.25

$1.50

Gemini 3 Flash Preview

$0.50

GPT-5.4 Mini

$0.75

$4.50

Claude Haiku 4.5

DeepSeek V4 Pro

$1.74

$3.48

Gemini 3.1 Pro

$12

GPT-5.4

$2.50

$15

Claude Sonnet 4.6

$15

Claude Opus 4.7

$25

GPT-5.5

$30

DeepSeek-V4-Flash is the cheapest of the small models, beating even OpenAI's GPT-5.4 Nano. DeepSeek-V4-Pro is the cheapest of the larger frontier models.

This note from the DeepSeek paper helps explain why they can price these models so low - they've focused a great deal on efficiency with this release, especially for longer context prompts:

In the scenario of 1M-token context, even DeepSeek-V4-Pro, which has a larger number of activated parameters, attains only 27% of the single-token FLOPs (measured in equivalent FP8 FLOPs) and 10% of the KV cache size relative to DeepSeek-V3.2. Furthermore, DeepSeek-V4-Flash, with its smaller number of activated parameters, pushes efficiency even further: in the 1M-token context setting, it achieves only 10% of the single-token FLOPs and 7% of the KV cache size compared with DeepSeek-V3.2.

DeepSeek's self-reported benchmarks in their paper show their Pro model competitive with those other frontier models, albeit with this note:

Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months.

I'm keeping an eye on huggingface.co/unsloth/models as I expect the Unsloth team will have a set of quantized versions out pretty soon. It's going to be very interesting to see how well that Flash model runs on my own machine.

Tags: ai, generative-ai, llms, llm, llm-pricing, pelican-riding-a-bicycle, deepseek, llm-release, openrouter, ai-in-china

この記事をシェア

DeepSeek Blog2026年2月16日 19:01

駿馬が春を迎え、共に新たな境地へ駆ける | DeepSeek 丙午新年・春節 AI 挨拶

TLDR AI2026年5月8日 09:00

DeepSeek V4 Flash のための軽量ネイティブ推論エンジン「ds4.c」が GitHub に公開

NVIDIA Developer Blog重要度42026年4月25日 08:29

NVIDIA BlackwellとGPUアクセラレーションエンドポイントを用いたDeepSeek V4の構築

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Simon Willison Blog·2026年4月24日 15:01·約3分

DeepSeek V4：最前線に近い性能、価格は数分の1

#LLM #MoE（Experts混合）#オープンソースモデル #DeepSeek #ローカル推論

TL;DR

AI深層分析2026年4月24日 15:34

最重要/ 5段階

深度40%

キーポイント

モデル仕様とアーキテクチャ

V4シリーズは100万トークンコンテキストのMoE方式を採用し、Proは総パラメータ1.6T/アクティブ49B、Flashは284B/13Bと推論効率を最適化している。

オープンライセンスとローカル展開

実証テストと品質比較

破壊的な価格戦略

重要な引用

Both models are 1 million token context Mixture of Experts.

I think this makes DeepSeek-V4-Pro the new largest open weights model.

So the pelicans are pretty good, but what's really notable here is the cost. DeepSeek V4 is a very, very inexpensive model.

影響分析・編集コメントを表示

影響分析

編集コメント

「性能×オープンライセンス×低料金」のトリコロールを提示したV4シリーズは、次期AI市場の価格破壊とローカル推論普及の起爆剤となるだろう。

とりあえず、llm-openrouterを使用して、OpenRouter経由でモデルを試しました：

llm install llm-openrouter\nllm openrouter refresh\nllm -m openrouter/deepseek/deepseek-v4-pro 'Generate an SVG of a pelican riding a bicycle'

ここがペリカン（pelican）のDeepSeek-V4-Flash版です：

image

そしてDeepSeek-V4-Pro版:

image

比較のため、去年12月のDeepSeek V3.2、8月のV3.1、2025年3月のV3-0324から得たペリカンをご覧ください。

ペリカンの品質はなかなか良いですが、ここで本当に注目すべきなのは*コスト*です。DeepSeek V4は非常に、非常に低価格なモデルです。

ここがDeepSeekの料金ページです。Flashは入力100万トークンあたり$0.14、出力100万トークンあたり$0.28、Proは入力$1.74、出力$3.48を請求しています。

ここがGemini、OpenAI、Anthropicのフロンティアモデル（frontier models）との比較表です：

DeepSeek-V4-Flashは小規模モデルの中で最安値であり、OpenAIのGPT-5.4 Nanoをも凌駕しています。DeepSeek-V4-Proは大規模フロンティアモデルの中で最安値です。

タグ: ai, generative-ai, llms, llm, llm-pricing, pelican-riding-a-bicycle, deepseek, llm-release, openrouter, ai-in-china

原文を表示

Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They're using the standard MIT license.

I think this makes DeepSeek-V4-Pro the new largest open weights model. It's larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B) and more than twice the size of DeepSeek V3.2 (685B).

For the moment I tried the models out via OpenRouter, using llm-openrouter:

code

llm install llm-openrouter
llm openrouter refresh
llm -m openrouter/deepseek/deepseek-v4-pro 'Generate an SVG of a pelican riding a bicycle'

Here's the pelican for DeepSeek-V4-Flash:

And for DeepSeek-V4-Pro:

For comparison, take a look at the pelicans I got from DeepSeek V3.2 in December, V3.1 in August, and V3-0324 in March 2025.

So the pelicans are pretty good, but what's really notable here is the *cost*. DeepSeek V4 is a very, very inexpensive model.

Here's DeepSeek's pricing page. They're charging $0.14/million tokens input and $0.28/million tokens output for Flash, and $1.74/million input and $3.48/million output for Pro.

Here's a comparison table with the frontier models from Gemini, OpenAI and Anthropic:

Model

Input ($/M)

Output ($/M)

DeepSeek V4 Flash

$0.14

$0.28

GPT-5.4 Nano

$0.20

$1.25

Gemini 3.1 Flash-Lite

$0.25

$1.50

Gemini 3 Flash Preview

$0.50

GPT-5.4 Mini

$0.75

$4.50

Claude Haiku 4.5

DeepSeek V4 Pro

$1.74

$3.48

Gemini 3.1 Pro

$12

GPT-5.4

$2.50

$15

Claude Sonnet 4.6

$15

Claude Opus 4.7

$25

GPT-5.5

$30

DeepSeek-V4-Flash is the cheapest of the small models, beating even OpenAI's GPT-5.4 Nano. DeepSeek-V4-Pro is the cheapest of the larger frontier models.

This note from the DeepSeek paper helps explain why they can price these models so low - they've focused a great deal on efficiency with this release, especially for longer context prompts:

In the scenario of 1M-token context, even DeepSeek-V4-Pro, which has a larger number of activated parameters, attains only 27% of the single-token FLOPs (measured in equivalent FP8 FLOPs) and 10% of the KV cache size relative to DeepSeek-V3.2. Furthermore, DeepSeek-V4-Flash, with its smaller number of activated parameters, pushes efficiency even further: in the 1M-token context setting, it achieves only 10% of the single-token FLOPs and 7% of the KV cache size compared with DeepSeek-V3.2.

DeepSeek's self-reported benchmarks in their paper show their Pro model competitive with those other frontier models, albeit with this note:

Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months.

Tags: ai, generative-ai, llms, llm, llm-pricing, pelican-riding-a-bicycle, deepseek, llm-release, openrouter, ai-in-china

この記事をシェア

DeepSeek Blog2026年2月16日 19:01

駿馬が春を迎え、共に新たな境地へ駆ける | DeepSeek 丙午新年・春節 AI 挨拶

TLDR AI2026年5月8日 09:00

DeepSeek V4 Flash のための軽量ネイティブ推論エンジン「ds4.c」が GitHub に公開

NVIDIA Developer Blog重要度42026年4月25日 08:29

NVIDIA BlackwellとGPUアクセラレーションエンドポイントを用いたDeepSeek V4の構築

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む