読み込み中…

Together AI Blog·2026年4月2日 09:00·約15分

Deepgramの音声認識・生成モデルがTogether AIでネイティブ利用可能に

#Speech-to-Text #Voice Agents #Deepgram #Together AI #Real-time Inference

TL;DR

Together AIは、Deepgramの音声認識（STT）および音声合成（TTS）モデルを自社のインフラ上でネイティブに実行可能にし、リアルタイム音声エージェント開発におけるレイテンシ削減と運用の簡素化を実現した。

AI深層分析2026年4月26日 19:40

注目/ 5段階

深度40%

キーポイント

DeepgramモデルのTogether AIでのネイティブ実行

DeepgramのNova-3、Flux、Aura-2などのSTT/TTSモデルが、Together AIの専用モデル推論インフラ上で直接動作するようになり、外部プロバイダへの依存が解消された。

リアルタイム音声エージェントのアーキテクチャ最適化

STT、LLM、TTSを単一のプラットフォームで完結させることで、マルチプロバイダ間のネットワークホップによるレイテンシ増大やシステム脆弱性を排除し、自然な会話体験を実現する。

エンタープライズ向けセキュリティとコンプライアンス

ゼロデータ保持、SOC 2 Type II認証、HIPAA準拠サポート、データ居住性オプションなど、企業利用に必要な厳格なセキュリティ制御を提供する。

音声の感情と詳細な情報の伝達力

音声は単なる単語の羅列ではなく、背後にある感情や静かな沈黙、そして重要な詳細を明確に伝える力を持つ。

医療・金融・多言語サポートへの適用

Nova-3とAura-2モデルは、医療用語の正確な認識や金融取引の精度向上、また単一パイプラインでの多言語・アクセント対応により、これらの分野の音声エージェント基盤を強化する。

Together AI上での本番環境インフラ統合

DeepgramモデルはTogether AIのDedicated Model Inference上でLLMやTTSワークロードと分離された専用GPU容量で実行され、99.9%の稼働率SLAとSOC 2/HIPAA準拠サポートを提供する。

統一された開発者体験

LLM、音声認識（STT）、音声合成（TTS）のエンドポイント間でSDKと認証が共通化され、単一の観測性・ログ管理画面、設定によるモデル切り替え、一元化した請求システムが提供される。

重要な引用

"Voice agents live or die by latency, and every network hop between providers is a place where the experience breaks down. By hosting Deepgram’s STT and TTS natively on Together AI’s infrastructure, we’re giving developers production-grade transcription without the tradeoff. Fast, accurate, and co-located with the rest of the pipeline."

"Real-time voice agents often fail when speech is treated as transcription rather than conversation."

"It's not just the words that matter. It's the feeling behind them, the quiet moments of reflection, and the clarity to handle the details when they count."

A transcription error at the start of the pipeline can surface later as a fluent but incorrect response, which is exactly the kind of failure these systems cannot afford.

Keeping transcription, reasoning, and synthesis in the same production environment makes real-time systems easier to operate and gives teams tighter control over performance as they scale.

Same SDKs and authentication across LLM, STT, and TTS endpoints

影響分析・編集コメントを表示

影響分析

本発表は、リアルタイム音声エージェント開発における「インフラ統合」のトレンドを具体化したものであり、開発者が複数のプロバイダを組み合わせる際のレイテンシと運用コストという課題を解決する。これにより、より低遅延で高信頼性の音声AIアプリケーションが普及し、カスタマーサポートや会話型インターフェースの品質向上に寄与すると予想される。

編集コメント

プレスリリース特有のPR色は強いものの、マルチプロバイダ構成に伴うレイテンシ問題という実務上の痛点を解消する具体的な解決策を示しており、音声AI開発現場にとって実用的な情報である。

Together AI の専用モデル推論上で、Deepgram の本番環境向け音声認識（STT）および音声合成（TTS）を利用可能に。

Deepgram Nova-3、Nova-3 Multilingual、Flux、Aura-2 が、Together AI の専用モデル推論上でネイティブ実行可能になりました
Deepgram は、転写から合成までを一つのモデルラインナップでカバーし、音声パイプラインの両端に対応します
Together AI は、STT（音声認識）、LLM（大規模言語モデル）、TTS（音声合成）を単一プラットフォーム上に統合し、リアルタイム音声エージェントのための単一の生産環境を提供します
エンタープライズ向け制御機能には、データ保持ゼロ、SOC 2 Type II 認証対応、HIPAA準拠サポート、およびデータ所在地オプションが含まれます

リアルタイム音声エージェントは、会話を転写として扱うことで失敗することがよくあります。単語を正しく認識することは課題の一部に過ぎず、システムはまた、話者交代の境界を検出し、割り込みや重なりを処理し、自然な対話を保つために十分な速度で応答する必要があります。チームがこれらのギャップをエンドポイント論理、ルーティング層、追加プロバイダーで埋め合わせようとすると、しばしばレイテンシと運用上の脆弱性をシステムに再び持ち込んでしまいます。Deepgram のモデルは、転写、話者交代、応答性がリアルタイムで連携する必要があるこの層のために特別に設計されています。

Deepgram の音声認識 (STT) および音声合成 (TTS) モデルのラインナップが、リアルタイム音声エージェントを構築するための AI ネイティブクラウドである Together AI でネイティブに実行可能になりました。これにより、チームは Deepgram の文字起こしと合成機能を Together カタログ内のあらゆる大規模言語モデル (LLM) と組み合わせ、完全な音声パイプラインを単一の生産プラットフォーム上で実行できるようになります。より広範なアーキテクチャについては、当社のリアルタイム音声エージェントの発表をご覧ください。

「音声エージェントはレイテンシによって成り立つか破綻するかであり、プロバイダー間のネットワークホップ一つひとつが体験が崩壊する場所です。Deepgram の STT と TTS を Together AI のインフラ上でネイティブにホストすることで、開発者には妥協のない生産グレードの文字起こしを提供しています。高速で正確、かつパイプラインの他の要素と同一場所に配置されています。」

Abe Pursell, Deepgram 提携担当バイスプレジデント

Flux: 話者検出を備えた対話型音声認識

正確な文字起こしは仕事の半分だけです。音声エージェントは、話者が実際に話し終わったタイミングを知る必要もあります。もしターン（発言の順番）を読み誤れば、通話者に割り込んで話してしまったり、逆に待ちすぎて反応がないように感じられたりするからです。

Flux は、Deepgram がリアルタイムエージェント向けに開発した会話型音声認識（STT）モデルであり、単に音声を文字起こしするだけでなく、沈黙のみではなく会話の文脈からターン信号を生成するように設計されています。これは重要です。多くのチームがまだこのギャップを埋めるために追加のエンドポイント処理ロジックに依存しており、それが複雑さを増し、レイテンシの制御を困難にしているからです。Flux はスタックのこの部分を簡素化し、250 ミリ秒でのターン終了検出により、本番環境における話者交代の予測可能性を高めます。

Nova-3: 実世界向け音声の生産用文字起こし

本番環境の音声はベンチマーク用の音声よりも雑多です。通話には背景ノイズ、重なる発話、アクセントの違い、電話回線圧縮、そして中断が含まれており、モデルはそれでもパイプラインの残りの部分が信頼できるテキストを返さなければなりません。Nova-3 はこれらの条件に最適化されており、ドメイン固有用語の認識精度を再学習なしで向上させるための語彙カスタマイズ機能を備えています。

Nova-3 Multilingual はこのアプローチを複数言語に拡張しており、通話者が会話中に言語を切り替えるような展開において特に重要です。

Aura-2：生産環境向けエンタープライズ TTS

Aura-2 は、明確さと一貫性が重要なビジネス環境におけるパイプラインの合成側を担います。チームは Deepgram の音声認識（STT）とテキスト読み上げ（TTS）を組み合わせて使用しながら、ドメイン固有の用語や構造化されたエンティティに対して出力の安定性を維持できます。

この違いは提供方法に現れます。構造化情報や専門用語をユーザーに読み上げる際、声は明確で、直接的で、信頼性のあるものでなければなりません。デモでは問題なく聞こえても、実際の運用開始後につまずき始めれば、それは十分ではありません。

Deepgram Aura-2

英語版 Thalia ボイス

0:00

"番組へようこそ。今日は本当に興味深いこと、つまり音声の力を探求しています。重要なのは言葉そのものだけではありません。それらの背後にある感情、内省の静かな瞬間、そして重要な局面で詳細を処理するための明確さです。"

例えばこうです：「サラ・チェン医師、450 パークアベニュー、ニューヨーク、10022 — ご確認番号は BX-4072 で、自己負担金は 14.99 ドルです。」

これは非常に多くの詳細を含んでおり、そのすべてが明確に伝わる必要があります。それが優れた音声ができることです。"

ユースケース

ヘルスケア音声エージェント

ヘルスケアの音声エージェントには、薬品名、処置用語、臨床用語の正確な文字起こしが必要であり、患者に同じ用語を読み返す際にも明確で誤りのない出力が求められます。パイプラインの初期段階での文字起こしのミスは、流暢ではあるが誤った応答として後から表面化することがあり、まさにこれらのシステムが許容できない失敗です。Nova-3 はチームが臨床言語への認識を適応させるのを助け、Aura-2 は患者向け出力の明確さと一貫性を保ちます。

金融サービス

金融の音声システムは精度に依存しています。口座番号、ルーティング番号、取引確認書、構造化された金融用語は、最初の試行で正しく捕捉される必要があります。なぜなら、文字起こしの単一のミスが、取引の失敗、コンプライアンス上の問題、または顧客とのやり取りの断絶を引き起こす可能性があるからです。Deepgram の音声モデルは、これらの規制対象ワークフローに対してチームにより強固な基盤を提供します。

多言語カスタマーサポート

グローバルなサポートチームには、通話者が同じインタラクション内で言語やアクセントを切り替えても機能する音声モデルが必要です。Nova-3 Multilingual は、各市場ごとに個別の STT パイプラインを構築することなく、チームがこれらの会話をサポートできるよう支援し、多言語サポートのスケーラビリティと運用の容易さを高めます。

Together AI 上の生産インフラ

Deepgram モデルは、LLM や TTS ワークロードと共に、隔離された容量上で Together AI Dedicated Model Inference（専用モデル推論）上で動作します。通訳、推論、合成を同じ生産環境に統合することで、リアルタイムシステムの運用が容易になり、スケーリングする際にチームのパフォーマンスに対する制御をより厳密に行うことができます。

Together AI は、生産用推論のための AI ネイティブクラウドであり、Dedicated Model Inference（専用モデル推論）は、大規模な音声エージェントを実行するために必要な制御と信頼性をチームに提供します。

インフラ

隔離されたワークロードを持つ専用 GPU 容量
99.9% の稼働率 SLA（サービスレベルアグリーメント）
SOC 2 Type II および HIPAA準拠サポート、適用可能な場合の PCI サポート
データ所在地オプションを備えたグローバルリージョン

開発者体験

LLM、STT（音声テキスト変換）、TTS（テキスト読み上げ）エンドポイント間での同一 SDK と認証
音声パイプラインのための単一の観測性とログ収集面
設定によるモデル選択と置換
スタック全体にわたる単一の課金面

Together AI は、一つの場所で広範な音声カタログをサポートしており、チームはベンダーを追加することなくパイプライン全体でモデルを自由に組み合わせることができます。これには、エージェントの推論を支える LLM と共にデプロイされるオープンソースおよび独自モデルも含まれます。

Together AI 音声ソリューションをご覧ください

はじめに

Deepgram の発表について
STT（音声認識）ドキュメントの閲覧
TTS（テキスト読み上げ）ドキュメントの閲覧
音声エージェントに関する発表の閲覧
専用エンドポイントのデプロイおよびボリューム価格設定については営業担当までお問い合わせください

image

8 秒

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

DeepSeek R1

8 秒

オーディオ名

オーディオ説明

0:00

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

image

8 秒

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

パフォーマンスとスケーラビリティ

本文ここに記載 lorem ipsum dolor sit amet

箇条書き項目ここに記載 lorem ipsum
箇条書き項目ここに記載 lorem ipsum
箇条書き項目ここに記載 lorem ipsum

インフラストラクチャ

最適な用途

より高速な処理速度（全体的なクエリレイテンシの低減）と低い運用コスト
明確に定義された単純なタスクの実行
ファンクション呼び出し、JSON モード、または他の構造化されたタスク

リストアイテム #1

ロレムイプサムドルルシットアメト、コンセクテトゥールアディピスシンエリート、セドドゥイウスモムテンポルインシディディット。
ロレムイプサムドルルシットアメト、コンセクテトゥールアディピスシンエリート、セドドゥイウスモムテンポルインシディディット。
ロレムイプサムドルルシットアメト、コンセクテトゥールアディピスシンエリート、セドドゥイウスモムテンポルインシディディット。

リストアイテム #1

ロレムイプサムドルルシットアメト、コンセクテトゥールアディピスシンエリート、セドドゥイウスモムテンポルインシディディットウトラボレエトドローレマグナアリクア。ウトエニムアドミニムヴェニアム、キスノストルエクセルチタティオンウルマコウラボリスニシィウトアリキップエクエアコモドゥオコンセクアトゥ。

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

ステップバイステップで考え、最終的な答えを *<answer>* と *</answer> のタグ内のみに入力してください。推論は以下のルールに従って記述してください：推論する際はアラビア語でのみ回答し、他の言語は一切使用できません。質問：

‍ナタリアは 4 月に友人 48 人にクリップを販売し、5 月にはその半分の数を販売しました。ナタリアが 4 月と 5 月の合計で販売したクリップの数は何ですか？

タイトル

本文コピーはここにロレムイプサムドルルシットアメト

タイトル

本文コピーはここにロレムイプサムドルルシットアメト

タイトル

本文はここに lorem ipsum dolor sit amet と続きます。

image

8 秒

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

DeepSeek R1

8 秒

オーディオ名

オーディオの説明

0:00

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

image

8 秒

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

パフォーマンスとスケーラビリティ

本文はここに lorem ipsum dolor sit amet と続きます。

箇条書き項目：lorem ipsum
箇条書き項目：lorem ipsum
箇条書き項目：lorem ipsum

インフラストラクチャ

最適な用途

より高速な処理速度（全体的なクエリレイテンシの低減）と低い運用コスト
明確に定義された単純なタスクの実行
ファンクション呼び出し、JSON モード、または他の構造化されたタスク

リスト項目 #1

ロレムイプサムドロールシットアメト、コンセクテトゥールアディピスキングエリート、セドドイエスムステンポルインシディディット。
ロレムイプサムドロールシットアメト、コンセクテトゥールアディピスキングエリート、セドドイエスムステンポルインシディディット。
ロレムイプサムドロールシットアメト、コンセクテトゥールアディピスキングエリート、セドドイエスムステンポルインシディディット。

リストアイテム #1

ロレムイプサムドロールシットアメト、コンセクテトゥールアディピスキングエリート、セドドイエスムステンポルインシディディットウトラボレエトドローレマグナアリクア。ウトエニムアドミニムヴェニアム、キスノストゥドエクセルチタティオンウルマノコラボリスニシィウトアリキップエクエアコモドゥオコンセクワトゥール。

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

ステップバイステップで考え、最終的な答えを *<answer>* および *</answer> タグ内にのみ配置してください。推論は以下のルールに従って記述してください：推論を行う際はアラビア語でのみ回答し、他の言語は一切使用しないでください。 ここが質問です:

ナタリアは 4 月に友人 48 人にクリップを販売し、5 月にはその半分の数を販売しました。ナタリアは 4 月と 5 月の合計で何個のクリップを販売したのでしょうか？

タイトル

本文コピーはこちらロレムイプサムドロールシットアメト

タイトル

本文コピーはこちらロレムイプサムドロールシットアメト

タイトル

本文にはここにローレン・イプサム・ドロール・シット・アメトが入ります

原文を表示

Production STT and TTS from Deepgram, available on Together AI Dedicated Model Inference for real-time voice agents.

Deepgram Nova-3, Nova-3 Multilingual, Flux, and Aura-2 now run natively on Together AI Dedicated Model Inference
Deepgram covers both ends of the voice pipeline, from transcription to synthesis, in one model lineup
Together AI gives teams a single production surface for real-time voice agents, with STT, LLM, and TTS on one platform
Enterprise controls include zero data retention, SOC 2 Type II, HIPAA-ready support, and data residency options

Real-time voice agents often fail when speech is treated as transcription rather than conversation. Getting the words right is only part of the challenge: the system also has to detect turn boundaries, handle interruptions and overlap, and respond quickly enough to keep the exchange feeling natural. When teams try to patch those gaps with endpointing logic, routing layers, and extra providers, they often add latency and operational fragility right back into the system. Deepgram’s models are purpose-built for that layer, where transcription, turn-taking, and responsiveness have to work together in real time.

Deepgram’s STT and TTS model lineup now runs natively on Together AI, the AI Native Cloud for building real-time voice agents, so teams can pair Deepgram transcription and synthesis with any LLM in the Together catalog and run the full voice pipeline on one production platform. For the broader architecture, see ourreal-time voice agents announcement.

“Voice agents live or die by latency, and every network hop between providers is a place where the experience breaks down. By hosting Deepgram’s STT and TTS natively on Together AI’s infrastructure, we’re giving developers production-grade transcription without the tradeoff. Fast, accurate, and co-located with the rest of the pipeline.”- Abe Pursell, VP of Partnerships, Deepgram

Flux: Conversational STT with turn detection

Accurate transcription is only part of the job. A voice agent also has to know when the speaker is actually finished, because if it misreads the turn, it either talks over the caller or waits too long and feels unresponsive.

Flux is Deepgram’s conversational STT model for real-time agents, built not just to transcribe speech but to produce turn signals from conversational context rather than silence alone. That matters because many teams still rely on extra endpointing logic to bridge this gap, which adds complexity and makes latency harder to control. Flux simplifies that part of the stack and helps keep turn-taking more predictable in production with 250ms end-of-turn detection.

Nova-3: Production transcription for real-world audio

Production audio is messier than benchmark audio. Calls come with background noise, overlapping speakers, accents, telephony compression, and interruptions, and the model still has to return text the rest of the pipeline can trust.Nova-3 is built for those conditions, with support for vocabulary customization so teams can improve recognition of domain-specific terms without retraining.

Nova-3 Multilingual extends that approach across multiple languages, which matters in deployments where callers switch languages mid-conversation.

Aura-2: Enterprise TTS for production voice agents

Aura-2 covers the synthesis side of the pipeline for business environments where clarity and consistency matter. Teams can use Deepgram STT and TTS together while keeping output stable for domain-specific terms and structured entities.

That difference shows up in delivery. The voice has to stay clear, direct, and reliable when it reads structured information or specialized language back to the user. A voice that sounds fine in a demo is not enough if it starts to stumble once the interaction becomes operational.

Deepgram Aura-2

Thalia voice in English

0:00

"Welcome to the show. Today we're exploring something truly fascinating — the power of voice. It's not just the words that matter. It's the feeling behind them, the quiet moments of reflection, and the clarity to handle the details when they count."

Like this: Dr. Sarah Chen, 450 Park Avenue, New York, 10022 — your confirmation is BX-4072 with a $14.99 copay.

That's a lot of detail, and every bit of it needs to land clearly. That's what a great voice can do."

Use cases

Healthcare voice agents

Healthcare voice agents need accurate transcription of medication names, procedure terms, and clinical language, along with output that stays clear when reading the same terms back to patients. A transcription error at the start of the pipeline can surface later as a fluent but incorrect response, which is exactly the kind of failure these systems cannot afford. Nova-3 helps teams adapt recognition to clinical language, while Aura-2 keeps patient-facing output clear and consistent.

Financial services

Financial voice systems depend on precision. Account numbers, routing numbers, trade confirmations, and structured financial language need to be captured correctly the first time, because a single transcription miss can create a failed transaction, compliance issue, or broken customer interaction. Deepgram’s speech models give teams a stronger foundation for these regulated workflows.

Multilingual customer support

Global support teams need speech models that hold up when callers move between languages and accents in the same interaction.Nova-3 Multilingual helps teams serve those conversations without building separate STT pipelines for every market, which makes multilingual support easier to scale and easier to operate.

Production infrastructure on Together AI

Deepgram models run on Together AI Dedicated Model Inference alongside LLM and TTS workloads on isolated capacity. Keeping transcription, reasoning, and synthesis in the same production environment makes real-time systems easier to operate and gives teams tighter control over performance as they scale.

Together AI is the AI Native Cloud for production inference, and Dedicated Model Inference gives teams the control and reliability they need to run voice agents at scale.

Infrastructure

Dedicated GPU capacity with isolated workloads

99.9% uptime SLA

SOC 2 Type II and HIPAA-ready support, with PCI support where applicable

Global regions with data residency options

Developer experience

Same SDKs and authentication across LLM, STT, and TTS endpoints

Single observability and logging surface for the voice pipeline

Model selection and swapping via configuration

One billing surface across your stack

Together AI supports a broad voice catalog in one place, so teams can mix and match across the pipeline without adding vendors. That includes open-source and proprietary models deployed alongside the LLMs that power agent reasoning.

See theTogether AI voice solutions

Get started

Deepgram’s announcement
Read STT documentation
Read TTS documentation
Read the voice agents announcement
Contact Sales for dedicated endpoint deployment and volume pricing

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Audio Name

Audio Description

0:00

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

Performance & Scale

Body copy goes here lorem ipsum dolor sit amet

Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum

Infrastructure

Best for

Faster processing speed (lower overall query latency) and lower operational costs
Execution of clearly defined, straightforward tasks
Function calling, JSON mode or other well structured tasks

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Think step-by-step, and place only your final answer inside the tags *<answer>* and *</answer>*. Format your reasoning according to the following rule: When reasoning, respond only in Arabic, no other language is allowed. Here is the question:

‍Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Audio Name

Audio Description

0:00

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

Performance & Scale

Body copy goes here lorem ipsum dolor sit amet

Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum

Infrastructure

Best for

Faster processing speed (lower overall query latency) and lower operational costs
Execution of clearly defined, straightforward tasks
Function calling, JSON mode or other well structured tasks

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item #1

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

‍Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

この記事をシェア

Together AI Blog2026年7月24日 09:00

Together AI、Kimi K3とClaude Fable 5を比較

Together AI Blog重要度42026年7月23日 09:00

Together AI、オープン重み推論プラットフォームを発表

Together AI Blog重要度42026年7月20日 09:00

Together AI と YC が GPU クラスターを共同設立

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Together AI Blog·2026年4月2日 09:00·約15分

Deepgramの音声認識・生成モデルがTogether AIでネイティブ利用可能に

#Speech-to-Text #Voice Agents #Deepgram #Together AI #Real-time Inference

TL;DR

AI深層分析2026年4月26日 19:40

注目/ 5段階

深度40%

キーポイント

DeepgramモデルのTogether AIでのネイティブ実行

リアルタイム音声エージェントのアーキテクチャ最適化

エンタープライズ向けセキュリティとコンプライアンス

ゼロデータ保持、SOC 2 Type II認証、HIPAA準拠サポート、データ居住性オプションなど、企業利用に必要な厳格なセキュリティ制御を提供する。

音声の感情と詳細な情報の伝達力

音声は単なる単語の羅列ではなく、背後にある感情や静かな沈黙、そして重要な詳細を明確に伝える力を持つ。

医療・金融・多言語サポートへの適用

Together AI上での本番環境インフラ統合

統一された開発者体験

重要な引用

"Voice agents live or die by latency, and every network hop between providers is a place where the experience breaks down. By hosting Deepgram’s STT and TTS natively on Together AI’s infrastructure, we’re giving developers production-grade transcription without the tradeoff. Fast, accurate, and co-located with the rest of the pipeline."

"Real-time voice agents often fail when speech is treated as transcription rather than conversation."

"It's not just the words that matter. It's the feeling behind them, the quiet moments of reflection, and the clarity to handle the details when they count."

A transcription error at the start of the pipeline can surface later as a fluent but incorrect response, which is exactly the kind of failure these systems cannot afford.

Keeping transcription, reasoning, and synthesis in the same production environment makes real-time systems easier to operate and gives teams tighter control over performance as they scale.

Same SDKs and authentication across LLM, STT, and TTS endpoints

影響分析・編集コメントを表示

影響分析

編集コメント

Together AI の専用モデル推論上で、Deepgram の本番環境向け音声認識（STT）および音声合成（TTS）を利用可能に。

Deepgram Nova-3、Nova-3 Multilingual、Flux、Aura-2 が、Together AI の専用モデル推論上でネイティブ実行可能になりました
Deepgram は、転写から合成までを一つのモデルラインナップでカバーし、音声パイプラインの両端に対応します
Together AI は、STT（音声認識）、LLM（大規模言語モデル）、TTS（音声合成）を単一プラットフォーム上に統合し、リアルタイム音声エージェントのための単一の生産環境を提供します
エンタープライズ向け制御機能には、データ保持ゼロ、SOC 2 Type II 認証対応、HIPAA準拠サポート、およびデータ所在地オプションが含まれます

Abe Pursell, Deepgram 提携担当バイスプレジデント

Flux: 話者検出を備えた対話型音声認識

Nova-3: 実世界向け音声の生産用文字起こし

Nova-3 Multilingual はこのアプローチを複数言語に拡張しており、通話者が会話中に言語を切り替えるような展開において特に重要です。

Aura-2：生産環境向けエンタープライズ TTS

Deepgram Aura-2

英語版 Thalia ボイス

0:00

例えばこうです：「サラ・チェン医師、450 パークアベニュー、ニューヨーク、10022 — ご確認番号は BX-4072 で、自己負担金は 14.99 ドルです。」

これは非常に多くの詳細を含んでおり、そのすべてが明確に伝わる必要があります。それが優れた音声ができることです。"

ユースケース

ヘルスケア音声エージェント

金融サービス

多言語カスタマーサポート

Together AI 上の生産インフラ

インフラ

隔離されたワークロードを持つ専用 GPU 容量
99.9% の稼働率 SLA（サービスレベルアグリーメント）
SOC 2 Type II および HIPAA準拠サポート、適用可能な場合の PCI サポート
データ所在地オプションを備えたグローバルリージョン

開発者体験

LLM、STT（音声テキスト変換）、TTS（テキスト読み上げ）エンドポイント間での同一 SDK と認証
音声パイプラインのための単一の観測性とログ収集面
設定によるモデル選択と置換
スタック全体にわたる単一の課金面

Together AI 音声ソリューションをご覧ください

はじめに

Deepgram の発表について
STT（音声認識）ドキュメントの閲覧
TTS（テキスト読み上げ）ドキュメントの閲覧
音声エージェントに関する発表の閲覧
専用エンドポイントのデプロイおよびボリューム価格設定については営業担当までお問い合わせください

image

8 秒

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

DeepSeek R1

8 秒

オーディオ名

オーディオ説明

0:00

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

image

8 秒

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

パフォーマンスとスケーラビリティ

本文ここに記載 lorem ipsum dolor sit amet

箇条書き項目ここに記載 lorem ipsum
箇条書き項目ここに記載 lorem ipsum
箇条書き項目ここに記載 lorem ipsum

インフラストラクチャ

最適な用途

より高速な処理速度（全体的なクエリレイテンシの低減）と低い運用コスト
明確に定義された単純なタスクの実行
ファンクション呼び出し、JSON モード、または他の構造化されたタスク

リストアイテム #1

ロレムイプサムドルルシットアメト、コンセクテトゥールアディピスシンエリート、セドドゥイウスモムテンポルインシディディット。
ロレムイプサムドルルシットアメト、コンセクテトゥールアディピスシンエリート、セドドゥイウスモムテンポルインシディディット。
ロレムイプサムドルルシットアメト、コンセクテトゥールアディピスシンエリート、セドドゥイウスモムテンポルインシディディット。

リストアイテム #1

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

タイトル

本文コピーはここにロレムイプサムドルルシットアメト

タイトル

本文コピーはここにロレムイプサムドルルシットアメト

タイトル

本文はここに lorem ipsum dolor sit amet と続きます。

image

8 秒

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

DeepSeek R1

8 秒

オーディオ名

オーディオの説明

0:00

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

image

8 秒

DeepSeek R1

image

ネイティブオーディオと生々しい物理挙動を備えた、プレミアムなシネマティックビデオ生成。

パフォーマンスとスケーラビリティ

本文はここに lorem ipsum dolor sit amet と続きます。

箇条書き項目：lorem ipsum
箇条書き項目：lorem ipsum
箇条書き項目：lorem ipsum

インフラストラクチャ

最適な用途

より高速な処理速度（全体的なクエリレイテンシの低減）と低い運用コスト
明確に定義された単純なタスクの実行
ファンクション呼び出し、JSON モード、または他の構造化されたタスク

リスト項目 #1

ロレムイプサムドロールシットアメト、コンセクテトゥールアディピスキングエリート、セドドイエスムステンポルインシディディット。
ロレムイプサムドロールシットアメト、コンセクテトゥールアディピスキングエリート、セドドイエスムステンポルインシディディット。
ロレムイプサムドロールシットアメト、コンセクテトゥールアディピスキングエリート、セドドイエスムステンポルインシディディット。

リストアイテム #1

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

ビルド

含まれる特典:

✔ プラットフォームクレジット最大 15,000 ドル無料*
✔ フォワードデプロイされたエンジニアリング時間 3 時間無料。

資金調達: 500 万ドル未満

タイトル

本文コピーはこちらロレムイプサムドロールシットアメト

タイトル

本文コピーはこちらロレムイプサムドロールシットアメト

タイトル

本文にはここにローレン・イプサム・ドロール・シット・アメトが入ります

原文を表示

Production STT and TTS from Deepgram, available on Together AI Dedicated Model Inference for real-time voice agents.

Deepgram Nova-3, Nova-3 Multilingual, Flux, and Aura-2 now run natively on Together AI Dedicated Model Inference
Deepgram covers both ends of the voice pipeline, from transcription to synthesis, in one model lineup
Together AI gives teams a single production surface for real-time voice agents, with STT, LLM, and TTS on one platform
Enterprise controls include zero data retention, SOC 2 Type II, HIPAA-ready support, and data residency options

“Voice agents live or die by latency, and every network hop between providers is a place where the experience breaks down. By hosting Deepgram’s STT and TTS natively on Together AI’s infrastructure, we’re giving developers production-grade transcription without the tradeoff. Fast, accurate, and co-located with the rest of the pipeline.”- Abe Pursell, VP of Partnerships, Deepgram

Flux: Conversational STT with turn detection

Nova-3: Production transcription for real-world audio

Nova-3 Multilingual extends that approach across multiple languages, which matters in deployments where callers switch languages mid-conversation.

Aura-2: Enterprise TTS for production voice agents

Deepgram Aura-2

Thalia voice in English

0:00

Like this: Dr. Sarah Chen, 450 Park Avenue, New York, 10022 — your confirmation is BX-4072 with a $14.99 copay.

That's a lot of detail, and every bit of it needs to land clearly. That's what a great voice can do."

Use cases

Healthcare voice agents

Financial services

Multilingual customer support

Production infrastructure on Together AI

Together AI is the AI Native Cloud for production inference, and Dedicated Model Inference gives teams the control and reliability they need to run voice agents at scale.

Infrastructure

Dedicated GPU capacity with isolated workloads

99.9% uptime SLA

SOC 2 Type II and HIPAA-ready support, with PCI support where applicable

Global regions with data residency options

Developer experience

Same SDKs and authentication across LLM, STT, and TTS endpoints

Single observability and logging surface for the voice pipeline

Model selection and swapping via configuration

One billing surface across your stack

See theTogether AI voice solutions

Get started

Deepgram’s announcement
Read STT documentation
Read TTS documentation
Read the voice agents announcement
Contact Sales for dedicated endpoint deployment and volume pricing

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Audio Name

Audio Description

0:00

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

Performance & Scale

Body copy goes here lorem ipsum dolor sit amet

Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum

Infrastructure

Best for

Faster processing speed (lower overall query latency) and lower operational costs
Execution of clearly defined, straightforward tasks
Function calling, JSON mode or other well structured tasks

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item #1

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

‍Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Audio Name

Audio Description

0:00

Premium cinematic video generation with native audio and lifelike physics.

DeepSeek R1

Premium cinematic video generation with native audio and lifelike physics.

Performance & Scale

Body copy goes here lorem ipsum dolor sit amet

Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum
Bullet point goes here lorem ipsum

Infrastructure

Best for

Faster processing speed (lower overall query latency) and lower operational costs
Execution of clearly defined, straightforward tasks
Function calling, JSON mode or other well structured tasks

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item #1

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Build

Benefits included:

✔ Up to $15K in free platform credits*
✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

‍Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

Title

Body copy goes here lorem ipsum dolor sit amet

この記事をシェア

Together AI Blog2026年7月24日 09:00

Together AI、Kimi K3とClaude Fable 5を比較

Together AI Blog重要度42026年7月23日 09:00

Together AI、オープン重み推論プラットフォームを発表

Together AI Blog重要度42026年7月20日 09:00

Together AI と YC が GPU クラスターを共同設立

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

キーポイント

重要な引用

影響分析

編集コメント

Flux: 話者検出を備えた対話型音声認識

Nova-3: 実世界向け音声の生産用文字起こし

Aura-2：生産環境向けエンタープライズ TTS

ユースケース

ヘルスケア音声エージェント

金融サービス

多言語カスタマーサポート

はじめに

オーディオ名

オーディオ名

Flux: Conversational STT with turn detection

Nova-3: Production transcription for real-world audio

Aura-2: Enterprise TTS for production voice agents

Use cases

Healthcare voice agents

Financial services

Multilingual customer support

Production infrastructure on Together AI

Get started

Audio Name

Audio Name

関連記事

キーポイント

重要な引用

影響分析

編集コメント

Flux: 話者検出を備えた対話型音声認識

Nova-3: 実世界向け音声の生産用文字起こし

Aura-2：生産環境向けエンタープライズ TTS

ユースケース

ヘルスケア音声エージェント

金融サービス

多言語カスタマーサポート

はじめに

オーディオ名

オーディオ名

Flux: Conversational STT with turn detection

Nova-3: Production transcription for real-world audio

Aura-2: Enterprise TTS for production voice agents

Use cases

Healthcare voice agents

Financial services

Multilingual customer support

Production infrastructure on Together AI

Get started

Audio Name

Audio Name

関連記事