Together AI Blog·2026年4月29日 09:00·約16分

Together AI で DeepSeek-V4 Pro が利用可能に

#LLM #MoE #Reasoning #Long-context #Serverless Inference #DeepSeek

TL;DR

Together AI が DeepSeek-V4 Pro を提供開始し、1.6T パラメータの MoE アーキテクチャと 512K トークンのコンテキストウィンドウをサーバーレスで利用可能にした。

AI深層分析2026年7月6日 01:09

重要/ 5段階

深度40%

キーポイント

大規模 MoE アーキテクチャの実装

DeepSeek V4 Pro は 1.6T パラメータの Mixture-of-Experts モデルであり、推論時に 49B のパラメータが活性化される設計となっている。

制御可能な推論モード

「Non-Think」「Think High」「Think Max」の 3 つのモードを提供し、タスクに応じてレスポンス速度と推論深度を柔軟に切り替えられる。

長文コンテキスト対応と価格設定

Together AI 上では 512K トークン、モデルレベルでは 1M トークンのコンテキストをサポートし、キャッシュ入力に対して特化した低価格プランを提供している。

サーバーレスから専用インフラへの移行パス

チームはまずサーバーレス推論で利用を開始し、必要に応じて 1M コンテキストや予約キャパシティを備えた専用インフラへスケールアップできる。

ストリーミング対応の推論機能

コード例では、DeepSeek-V4-Pro の回答生成過程をリアルタイムで取得するストリーミング処理が実装されています。

思考プロセスと回答の分離表示

API レスポンスから 'reasoning' フィールドを抽出し、推論ステップと最終的な回答内容を別々に出力するロジックが含まれています。

影響分析・編集コメントを表示

影響分析

この発表は、超大規模な MoE モデルと長文コンテキスト処理を、従来のオンプレミス構築の壁を取り払ってクラウドネイティブに提供した点で業界に大きな影響を与える。特に、推論深度の制御機能とキャッシュ価格戦略により、企業は複雑なタスクに対してコスト効率よく高品質な AI 応用を開発・展開できるようになる。

編集コメント

1.6T パラメータ規模でありながら、サーバーレスで柔軟に推論深度を制御できる点は、長文処理が必要なエンタープライズユースケースにおいて非常に魅力的です。キャッシュ入力価格の導入は、リファレンスや履歴データの再利用コストを劇的に下げる可能性を秘めています。

Together AI で、512K のコンテキストを備えた 1.6T パラメータの MoE 推論モデルが利用可能になりました。制御可能な推論モードと、長文コンテキストワークロード向けのキャッシュ入力価格設定も用意されています。

新機能

Together AI での DeepSeek V4 Pro の提供: DeepSeek V4 Pro が Together AI で利用可能となり、長文コンテキスト推論ワークロード向けに 512K トークンのコンテキストウィンドウを提供します。
大規模 MoE アーキテクチャ: DeepSeek V4 Pro は、活性化パラメータ数 49B を持つ 1.6T パラメータの Mixture-of-Experts (MoE) アーキテクチャを採用しています。
制御可能な推論モード: 「Non-Think（思考なし）」、「Think High（高強度思考）」、「Think Max（最大強度思考）」により、チームは高速レスポンス、深い推論、最大限の推論努力の中から選択できます。
透明なサーバーレス価格設定: DeepSeek V4 Pro の料金は、入力トークン 1M あたり $2.10、キャッシュ済み入力トークン 1M あたり $0.20、出力トークン 1M あたり $4.40 です。

長文コンテキスト推論は、チームがモデルに何を要求できるかを変えます。リポジトリ全体や大規模なドキュメントセット、長いエージェントのトレース、ツール出力などを、脆い要約に圧縮するのではなく、モデルの動作コンテキスト内にそのまま収めることが可能になります。しかし、そのような大量のコンテキストを活用できるモデルは、提供が最も困難でもあります。1.6T パラメータの MoE に百万トークン単位のコンテキストを組み合わせるものは、ほとんどのチームが自らデプロイし、チューニングし、運用したいとは考えないものです。

DeepSeek-V4 Pro が、AI ネイティブクラウドである Together AI で利用可能になりました。これにより、チームは 512K トークンのコンテキストでサーバーレス推論から開始し、フル 1M トークンコンテキスト、予約済み容量、および本番環境制御のために専用インフラストラクチャへ移行することが可能です。DeepSeek-V4 Flash もまもなく登場し、最大推論深度よりも速度とコストが重要となるワークロードに対して、もう一つの V4 オプションを提供します。

概要

Spec

Value

Model

Together AI 上の DeepSeek V4 Pro

Endpoint

deepseek-ai/DeepSeek-V4-Pro

Architecture

1.6T パラメータの MoE（Mixture of Experts）アーキテクチャ

Activated parameters

49B（活性化パラメータ数）

Context on Together AI

Together AI 上でのコンテキスト：512K トークン

Model-level context

モデルレベルでのコンテキスト：1M トークン

Reasoning modes

推論モード：Non-Think、Think High、Think Max

Deployment

デプロイメント：サーバーレス、月額予約型

Input price

入力価格：$2.10 / 1M トークン

Cached input price

キャッシュ済み入力価格：$0.20 / 1M トークン

Output price

出力価格：$4.40 / 1M トークン

Best-fit workloads

最適ワークロード：コードエージェント、ドキュメントインテリジェンス、長文コンテキスト対応エージェント、研究合成

長文コンテキスト推論のために設計されたモデル

DeepSeek V4 Pro は、短いプロンプトを超えてモデルが推論を行う必要があるワークロード向けに設計されています。具体的には、大規模なリポジトリ、長い技術ドキュメント、高密度な検索バンドル、ツール呼び出し履歴、および研究コーパスなどが該当します。

DeepSeek V4 Pro はモデルレベルで百万トークンのコンテキストをサポートしていますが、Together AI では現在 512K トークンのコンテキストウィンドウで利用可能です。この区別は重要です。なぜなら、モデルの能力とデプロイされたサービングプロファイルが常に同じとは限らないからです。Together AI は、DeepSeek V4 Pro を信頼性の高い本番環境でのサービングに適したコンテキストウィンドウと共に提供しつつも、チームに重大な長文コンテキストワークロードに必要な十分な余地を与えています。

アーキテクチャも重要です。なぜなら、長いコンテキストは単なる製品仕様ではないからです。コンテキストが拡大するにつれて、サービングコスト、メモリ負荷、KV キャッシュ（Key-Value Cache）の使用量、レイテンシ、並行処理能力などがすべてシステム設計の一部となります。DeepSeek V4 Pro はハイブリッドアテンション（Hybrid Attention）を採用しており、圧縮スパースアテンション（Compressed Sparse Attention）と重度圧縮アテンション（Heavily Compressed Attention）を組み合わせることで、百万トークンコンテキストにおいて DeepSeek V3.2 と比較して単一トークンの推論 FLOPs が 27%、KV キャッシュが 10% に削減されたと DeepSeek は報告しています。

ワークロードに応じて推論エフォートを選択

DeepSeek V4 Pro は 3 つの推論モードをサポートしており、チームはタスクの難易度に合わせて推論深度をマッチングさせることができます。すべてのリクエストを一律に扱うのではなくです。

モード	使用時	トレードオフ
Non-Think（非思考）	抽出、分類、単純な Q&A、定型応答	低複雑度タスクに対する最速のパス
Think High（高推論）	コード計画、ドキュメント分析、多段階推論	複雑な作業のためのより深い推論深度

Think Max

ハードなデバッグ、深い研究合成、エージェントによる意思決定ポイント

最大限の推論努力を要します。レイテンシとトークン使用量の増加が予想されます

ドキュメントアシスタントは、単純な抽出には Non-Think を使い、ポリシー間の競合分析には Think High を用い、モデルが難しい意思決定を推論する必要がある場合にのみ Think Max を利用します。コードエージェントは、移行の計画立案に Think High を使い、微妙なクロスサービス障害のデバッグには Think Max を使用します。

DeepSeek は、コーディング、推論、長文コンテキスト、およびエージェントタスクにおけるベンチマーク結果を報告しており、LiveCodeBench で 93.5%、GPQA Diamond で 90.1%、SWE-bench Verified で 80.6%、MRCR 1M で 83.5%、CorpusQA 1M で 62.0% を達成しています。

キャッシュ入力価格設定により、繰り返し行われる長文コンテキストクエリを低コスト化

長文コンテキストシステムでは、複数の質問にわたって同じ大規模なコンテキストが再利用されることがよくあります。例えば、リポジトリのスナップショット、ドキュメントバンドル、ポリシーアーカイブ、検索ペイロード、または長いエージェントのトレースなどが該当します。キャッシュ入力価格設定により、これらの繰り返しワークロードをより実用的なものにできます。

DeepSeek V4 Pro の価格は、入力トークン 1M あたり $2.10 で、キャッシュ入力では 1M トークンあたり $0.20、出力は 1M トークンあたり $4.40 です。これは再利用されるコンテキストに対して90% のコスト削減を意味します。リクエストの費用がかかる部分が、フォローアップ分析 across で再利用される安定したテキストブロックである場合に、この効果は特に重要です。

例のパターン:

300K トークンのリポジトリ要約、契約セット、ポリシーアーカイブなど、大規模な安定したコンテキストを読み込みます。
その同じコンテキストに対して複数の追跡質問を行います。
繰り返し分析のコストを劇的に削減するために、適用可能な場合はキャッシュ入力価格を利用します。

ワークロードパターン

コードエージェント

リポジトリのスライス、イシューのトレース、内部ドキュメント、過去のツール呼び出し、提案されたパッチにわたって推論が必要な場合に DeepSeek V4 Pro を使用してください。計画の変更、デバッグの失敗解決、またはクロスファイル依存関係の解決には「Think High」または「Think Max」が最も有用です。

ドキュメントインテリジェンス

1 つのリクエストで比較が必要な契約、ポリシーセット、技術マニュアル、研究コレクションに対して長文コンテキストを使用します。単純な抽出や Q&A は「Non-Think」モードで対応可能ですが、競合分析、解釈、合成には「Think High」がより適しています。

長文コンテキストエージェントトレース

長いツール呼び出し履歴、中間結果、実行トレースを検査するには DeepSeek V4 Pro を使用します。意思決定ポイント（エージェントが継続するかどうか、別のツールを呼び出すか、計画を見直すか、停止するかを判断する場面）では、より高度な推論モードが最も有用です。

研究合成

論文、ノート、ベンチマークレポート、取得されたドキュメント、過去の分析を組み合わせたワークフローには DeepSeek V4 Pro を使用します。同じ証拠セットを複数の質問で再利用する場合、キャッシュ入力価格が特に有用です。

サーバーレスから開始し、予約済み容量へ移行する

DeepSeek V4 Pro は、Together AI のサーバーレス推論および月額予約インフラで利用可能です。サーバーレスは、評価、開発、変動するトラフィックへの対応に適した最初のステップです。一方、月額予約型は、チームがより予測可能なキャパシティとコスト管理を必要とする、安定した本番環境の需要に対して優れています。

長文コンテキストワークロードにおいては、デプロイパスが重要です。チームは単にモデルを選ぶだけでなく、コンテキストサイズが大きくなるにつれてスループット、同時接続数、レイテンシ、KV キャッシュへの負荷、そしてコストをどのように管理するかを選択しているのです。Together AI は、チームが自らサービングスタックを構築することなく、評価から本番環境までの道筋を提供します。

今すぐお試しください

DeepSeek-V4 Pro は本日、Together AI のサーバーレス推論および専用エンドポイントで利用可能です。

python

from together import Together

client = Together()

stream = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro",
    messages=[
        {
            "role": "user",
            "content": "Prove that the square root of 2 is irrational.",
        }
    ],
    stream=True,
)

for chunk in stream:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta

    if hasattr(delta, "reasoning") and delta.reasoning:
        print(delta.reasoning, end="", flush=True)
    if hasattr(delta, "content") and delta.content:
        print(delta.content, end="", flush=True)

開発と評価にはサーバーレス推論から始めましょう。1M コンテキストのフル利用、予約済み容量、ワークロードの分離、またはより予測可能なスループットが必要な本番環境の負荷については、Together AI の専用推論で DeepSeek-V4 Pro をデプロイするために営業担当までお問い合わせください。

image

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

DeepSeek R1

オーディオ名

オーディオ説明

0:00

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

image

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

パフォーマンスとスケーラビリティ

本文ここに記載 lorem ipsum dolor sit amet

箇条書き項目ここに記載 lorem ipsum
箇条書き項目ここに記載 lorem ipsum
箇条書き項目ここに記載 lorem ipsum

インフラストラクチャ

最適用途

より高速な処理速度（全体的なクエリレイテンシの低減）と運用コストの削減
明確に定義された単純なタスクの実行
ファンクション呼び出し、JSON モード、または他の構造化されたタスク

リストアイテム #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

リスト項目 #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

ビルド

含まれる特典:

✔ プラットフォーム利用料として最大 15,000 ドルの無料クレジット*
✔ フォワードデプロイ（現場導入）エンジニアリング時間 3 時間の無料提供。

資金調達：500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォーム利用料として最大 15,000 ドルの無料クレジット*
✔ フォワードデプロイ（現場導入）エンジニアリング時間 3 時間の無料提供。

資金調達：500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォーム利用料として最大 15,000 ドルの無料クレジット*
✔ フォワードデプロイ（現場導入）エンジニアリング時間 3 時間の無料提供。

資金調達：500 万ドル未満

ステップバイステップで考え、最終的な答えのみを *<answer>* および *</answer> タグの中に配置してください。推論を行う際は、アラビア語でのみ回答し、他の言語は使用しないでください。以下のルールに従って推論を記述してください：

質問：

ナタリアは 4 月に友人 48 人にクリップを販売し、5 月にはその半分の数を販売しました。ナタリアが 4 月と 5 月の合計で販売したクリップの数は何ですか？

タイトル

本文コピーはこちら：lorem ipsum dolor sit amet

タイトル

本文コピーはこちら：lorem ipsum dolor sit amet

タイトル

本文はここに lorem ipsum dolor sit amet と続きます

image

8 秒

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティック動画生成。

DeepSeek R1

8 秒

オーディオ名

オーディオ説明

0:00

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティック動画生成。

image

8 秒

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティック動画生成。

パフォーマンスとスケーラビリティ

本文はここに lorem ipsum dolor sit amet と続きます

箇条書き項目はここに lorem ipsum
箇条書き項目はここに lorem ipsum
箇条書き項目はここに lorem ipsum

インフラストラクチャ

最適な用途

より高速な処理速度（全体的なクエリレイテンシの低減）と低い運用コスト
明確に定義された単純なタスクの実行
ファンクション呼び出し、JSON モード、または他の構造化されたタスク

リスト項目 #1

ロレム・イプサムドロールシットアメト、コンセクテトゥールアディピスシングエリート、セドドゥエイスムテンポルインシディディット。
ロレム・イプサムドロールシットアメト、コンセクテトゥールアディピスシングエリート、セドドゥエイスムテンポルインシディディット。
ロレム・イプサムドロールシットアメト、コンセクテトゥールアディピスシングエリート、セドドゥエイスムテンポルインシディディット。

リストアイテム #1

ロレム・イプサムドロールシットアメト、コンセクテトゥールアディピスシングエリート、セドドゥエイスムテンポルインシディディットウトラボレエトドローレマグナアリクア。ウトエニムアドミニムヴェニアム、キスノストルエクセルチタティオンウルマコウラボリスニシィウトアリキップエク EA コモドーコンセクアトゥ。

ビルド

含まれる特典:

✔ プラットフォーム無料クレジット最大 15,000 ドル*
✔ フォワードデプロイエンジニアリング時間無償 3 時間。

資金調達: 500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォーム無料クレジット最大 15,000 ドル*
✔ フォワードデプロイエンジニアリング時間無償 3 時間。

資金調達: 500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォーム無料クレジット最大 15,000 ドル*
✔ フォワードデプロイエンジニアリング時間無償 3 時間。

資金調達: 500 万ドル未満

ステップバイステップで考え、最終的な答えを *<answer>* と *</answer> のタグ内のみに入力してください。推論は以下のルールに従って記述してください：推論を行う際はアラビア語でのみ回答し、他の言語は一切使用できません。 ここに質問があります:

‍ナタリアは 4 月に友人 48 人にクリップを販売し、5 月にはその半分の数を販売しました。ナタリアが 4 月と 5 月の合計で販売したクリップの数は何ですか？

タイトル

本文コピーはこちらロレム・イプサムドロールシットアメト

タイトル

本文コピーはこちらロレム・イプサムドロールシットアメト

タイトル

本文はここにローリム・イプサム・ドロール・シット・アメトが入ります。

原文を表示

1.6T-parameter MoE reasoning model with 512K context on Together AI, controllable reasoning modes, and cached-input pricing for long-context workloads.

What's New

DeepSeek V4 Pro on Together AI: DeepSeek V4 Pro is now available on Together AI with a 512K-token context window for long-context reasoning workloads.
Large-scale MoE architecture: DeepSeek V4 Pro uses a 1.6T-parameter Mixture-of-Experts architecture with 49B activated parameters.
Controllable reasoning modes: Non-Think, Think High, and Think Max let teams choose between fast responses, deeper reasoning, and maximum reasoning effort.
‍Transparent serverless pricing: DeepSeek V4 Pro is available at $2.10 per 1M input tokens, $0.20 per 1M cached input tokens, and $4.40 per 1M output tokens.

Long-context reasoning changes what teams can ask a model to do. Entire repositories, large document sets, long agent traces, and tool outputs can fit into the model’s working context instead of being compressed into brittle summaries. But the models that can use that much context are also the hardest to serve: a 1.6T-parameter MoE with million-token context is not something most teams want to deploy, tune, and operate themselves.

DeepSeek-V4 Pro is now available on Together AI, the AI Native Cloud, so teams can start with Serverless Inference at 512K context and move to dedicated infrastucture for full 1M context, reserved capacity, and production control. DeepSeek-V4 Flash is coming soon, giving teams another V4 option for workloads where speed and cost matter more than maximum reasoning depth.

At a glance

Spec

Value

Model

DeepSeek V4 Pro on Together AI

Endpoint

deepseek-ai/DeepSeek-V4-Pro

Architecture

1.6T-parameter MoE

Activated parameters

49B

Context on Together AI

512K tokens

Model-level context

1M tokens

Reasoning modes

Non-Think, Think High, Think Max

Deployment

Serverless, Monthly Reserved

Input price

$2.10 / 1M tokens

Cached input price

$0.20 / 1M tokens

Output price

$4.40 / 1M tokens

Best-fit workloads

Code agents, document intelligence, long-context agents, research synthesis

Built for long-context reasoning

DeepSeek V4 Pro is built for workloads where the model needs to reason over more than a short prompt: large repositories, long technical documents, dense retrieval bundles, tool-call histories, and research corpora.

DeepSeek V4 Pro supports million-token context at the model level; on Together AI, it is currently available with a 512K-token context window. That distinction matters because model capability and deployed serving profile are not always the same thing. Together AI is launching DeepSeek V4 Pro with a context window designed for reliable production serving, while still giving teams enough room for serious long-context workloads.

The architecture also matters because long context is not only a product spec. As context grows, serving cost, memory pressure, KV cache usage, latency, and concurrency all become part of the system design. DeepSeek V4 Pro uses hybrid attention, combining Compressed Sparse Attention and Heavily Compressed Attention, with DeepSeek reporting 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek V3.2 at million-token context.

Choose reasoning effort by workload

DeepSeek V4 Pro supports three reasoning modes, so teams can match reasoning depth to task difficulty instead of treating every request the same.

Mode

Use when

Tradeoff

Non-Think

Extraction, classification, simple Q&A, routine responses

Fastest path for lower-complexity tasks

Think High

Code planning, document analysis, multi-step reasoning

More reasoning depth for complex work

Think Max

Hard debugging, deep research synthesis, agentic decision points

Maximum reasoning effort; expect higher latency and token usage

A document assistant might use Non-Think for simple extraction, Think High for conflict analysis across policies, and Think Max only when the model needs to reason through a difficult decision. A code agent might use Think High for planning a migration and Think Max for debugging a subtle cross-service failure.

DeepSeek reports benchmark results across coding, reasoning, long-context, and agentic tasks, including 93.5% LiveCodeBench, 90.1% GPQA Diamond, 80.6% SWE-bench Verified, 83.5% MRCR 1M, and 62.0% CorpusQA 1M.

Make repeated long-context queries cheaper with cached input pricing

Long-context systems often reuse the same large context across multiple questions: a repository snapshot, a document bundle, a policy archive, a retrieval payload, or a long agent trace. Cached input pricing makes those repeated workloads more practical.

DeepSeek V4 Pro is priced at $2.10 / 1M input tokens, with cached input at $0.20 / 1M tokens and output at $4.40 / 1M tokens. That represents a 90% cost reduction for reused context, which matters when the expensive part of the request is a stable block of text that gets reused across follow-up analysis.

Example pattern:

Load a large stable context, such as a 300K-token repo summary, contract set, or policy archive.
Ask several follow-up questions over that same context.
Use cached input pricing where applicable to drastically reduce the cost of repeated analysis.

Workload patterns

Code agents

Use DeepSeek V4 Pro when an agent needs to reason across repository slices, issue traces, internal documentation, prior tool calls, and proposed patches. Think High or Think Max is most useful for planning changes, debugging failures, or resolving cross-file dependencies.

Document intelligence

Use long context for contracts, policy sets, technical manuals, or research collections that need to be compared in one request. Non-Think can handle extraction and simple Q&A; Think High is better for conflict analysis, interpretation, and synthesis.

Long-context agent traces

Use DeepSeek V4 Pro to inspect long tool-call histories, intermediate results, and execution traces. Higher reasoning modes are most useful at decision points: when the agent needs to decide whether to continue, call another tool, revise a plan, or stop.

Research synthesis

Use DeepSeek V4 Pro for workflows that combine papers, notes, benchmark reports, retrieved documents, and prior analysis. Cached input pricing is especially useful when the same evidence set is reused across multiple questions.

Start serverless, move to reserved capacity

DeepSeek V4 Pro is available on Together AI Serverless Inference and Monthly Reserved infrastructure. Serverless is the right starting point for evaluation, development, and variable traffic. Monthly Reserved is better for steadier production demand where teams need more predictable capacity and cost control.

For long-context workloads, the deployment path matters. Teams are not only choosing a model; they are choosing how to manage throughput, concurrency, latency, KV cache pressure, and cost as context sizes grow. Together AI gives teams a path from evaluation to production without standing up the serving stack themselves.

Try it now

DeepSeek-V4 Pro is available today on Together AI Serverless Inference and Dedicated Endpoints.

code


from together import Together

client = Together()

stream = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro",
    messages=[
        {
            "role": "user",
            "content": "Prove that the square root of 2 is irrational.",
        }
    ],
    stream=True,
)

for chunk in stream:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta

    if hasattr(delta, "reasoning") and delta.reasoning:
        print(delta.reasoning, end="", flush=True)
    if hasattr(delta, "content") and delta.content:
        print(delta.content, end="", flush=True)

Start with Serverless Inference for development and evaluation. For production workloads that require full 1M context, reserved capacity, workload isolation, or more predictable throughput, contact sales to deploy DeepSeek-V4 Pro on Together AI Dedicated Inference.

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Audio Name

Audio Description

0:00

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

Performance & Scale

Body copy goes here lorem ipsum dolor sit amet

Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum

Infrastructure

Best for

Faster processing speed (lower overall query latency) and lower operational costs
Execution of clearly defined, straightforward tasks
Function calling, JSON mode or other well structured tasks

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item #1

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Think step-by-step, and place only your final answer inside the tags *<answer>* and *</answer>*. Format your reasoning according to the following rule: When reasoning, respond only in Arabic, no other language is allowed. Here is the question:

‍Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Audio Name

Audio Description

0:00

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

Performance & Scale

Body copy goes here lorem ipsum dolor sit amet

Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum

Infrastructure

Best for

Faster processing speed (lower overall query latency) and lower operational costs
Execution of clearly defined, straightforward tasks
Function calling, JSON mode or other well structured tasks

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item #1

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

‍Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

この記事をシェア

MarkTechPost重要度52026年7月6日 06:25

美团发布长猫 2.0：1.6 兆パラメータのオープン MoE モデルがネイティブ 100 万トークンコンテキストと長猫スパースアテンションを実現

MarkTechPost重要度42026年7月5日 11:31

Qwen の元リーダーが「ハイブリッド思考」の誤りと、なぜ今「エージェント」を支持するのか

TLDR AI2026年7月3日 09:00

AI 向けラマヌジャン・チャレンジ（1 分読了）

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

モード

使用時

トレードオフ

Non-Think（非思考）

抽出、分類、単純な Q&A、定型応答

低複雑度タスクに対する最速のパス

Think High（高推論）

コード計画、ドキュメント分析、多段階推論

複雑な作業のためのより深い推論深度

from together import Together client = Together() stream = client.chat.completions.create( model="deepseek-ai/DeepSeek-V4-Pro", messages=[ { "role": "user", "content": "Prove that the square root of 2 is irrational.", } ], stream=True, ) for chunk in stream: if not chunk.choices: continue delta = chunk.choices[0].delta if hasattr(delta, "reasoning") and delta.reasoning: print(delta.reasoning, end="", flush=True) if hasattr(delta, "content") and delta.content: print(delta.content, end="", flush=True)

キーポイント

影響分析

編集コメント

概要

長文コンテキスト推論のために設計されたモデル

ワークロードに応じて推論エフォートを選択

キャッシュ入力価格設定により、繰り返し行われる長文コンテキストクエリを低コスト化

ワークロードパターン

サーバーレスから開始し、予約済み容量へ移行する

今すぐお試しください

オーディオ名

オーディオ名

At a glance

Built for long-context reasoning

Choose reasoning effort by workload

Make repeated long-context queries cheaper with cached input pricing

Workload patterns

Start serverless, move to reserved capacity

Try it now

Audio Name

Audio Name

関連記事

キーポイント

影響分析

編集コメント

概要

長文コンテキスト推論のために設計されたモデル

ワークロードに応じて推論エフォートを選択

キャッシュ入力価格設定により、繰り返し行われる長文コンテキストクエリを低コスト化

ワークロードパターン

サーバーレスから開始し、予約済み容量へ移行する

今すぐお試しください

オーディオ名

オーディオ名

At a glance

Built for long-context reasoning

Choose reasoning effort by workload

Make repeated long-context queries cheaper with cached input pricing

Workload patterns

Start serverless, move to reserved capacity

Try it now

Audio Name

Audio Name

関連記事