Google DeepMind·2026年4月16日 01:03·約4分で読める

Gemini 3.1 Flash TTS：次世代の表現力豊かなAI音声

#TTS #音声合成 #マルチモーダルAI #Gemini #Google DeepMind #表現制御

TL;DR

Google DeepMindは、細かな音声タグによる精密な制御で表現力豊かなAI音声生成を可能にする次世代音声モデル「Gemini 3.1 Flash TTS」を発表した。

AI深層分析2026年4月16日 02:41

注目/ 5段階

深度40%

キーポイント

表現力豊かなAI音声生成

Gemini 3.1 Flash TTSは、従来よりも自然で感情表現に富んだAI音声の生成を実現する次世代音声モデルである。

細粒度音声タグによる精密制御

グラニュラーオーディオタグと呼ばれる細かなタグ付け機能により、ユーザーはAI音声の表現を精密に制御できる。

Google DeepMindの最新音声技術

Google DeepMindが開発した最新の音声モデルであり、同社のAI音声技術の進化を示している。

影響分析・編集コメントを表示

影響分析

この発表は、AI音声合成技術において表現力と制御性の両面で重要な進展を示しており、コンテンツ制作や音声インターフェースの品質向上に寄与する可能性がある。ただし、詳細な技術仕様や実用性能に関する情報が限られているため、実際の影響評価にはさらなる情報が必要である。

編集コメント

表現力豊かなAI音声生成への注目が高まる中、Google DeepMindが制御性を強化した新モデルを発表。実用化の進展に期待がかかる。

Apr 15, 2026

8 min read

Our newest audio model introduces granular audio tags that give you precise control to direct AI speech for expressive audio generation.

Vilobh Meshram

Senior Product Manager

Max Gubin

Principal Research Engineer on behalf of the Gemini team

General summary

Gemini 3.1 Flash TTS is here, giving you improved AI speech quality and control. You can now use audio tags to adjust vocal style and pacing in over 70 languages. Test it out in Google AI Studio, Vertex AI, and Google Vids, and know that all audio is watermarked with SynthID to prevent misinformation.

Summaries were generated by Google AI. Generative AI is experimental.

Bullet points

"Gemini 3.1 Flash TTS" is a new AI speech model with better control, expressiveness, and quality.

This model has improved speech quality, making it sound more natural than previous versions.

Audio tags let you control vocal style, pace, and delivery using natural language commands.

Developers can use Google AI Studio to fine-tune voices and export settings for consistent use.

Gemini 3.1 Flash TTS supports 70+ languages and uses SynthID watermarking to identify AI-generated audio.

Summaries were generated by Google AI. Generative AI is experimental.

Basic explainer

Gemini 3.1 Flash TTS is a new AI that makes computer speech sound more real. It lets people change how the AI talks by using special commands in the text. This AI can speak in over 70 languages and adds a hidden watermark to the audio. This helps people know it's AI-generated and not a real person.

Summaries were generated by Google AI. Generative AI is experimental.

Your browser does not support the audio element.

Listen to article

This content is generated by Google AI. Generative AI is experimental

[[duration]] minutes

Today, we’re introducing Gemini 3.1 Flash TTS, the latest text-to-speech model that delivers improved controllability, expressivity and quality — empowering developers, enterprises and everyday users to build the next generation of AI-speech applications.Starting today, 3.1 Flash TTS is rolling out:For developers in preview via the Gemini API and Google AI StudioFor enterprises in preview on Vertex AIFor Workspace users via Google VidsImproved speech quality and controllabilityWe’ve improved the overall speech quality of Gemini 3.1 Flash TTS, making it our most natural and expressive model to date. On the Artificial Analysis TTS leaderboard, a benchmark that captures thousands of blind human preferences, 3.1 Flash TTS achieved an impressive Elo score of 1,211.

Artificial Analysis has also positioned Gemini 3.1 Flash TTS within its “most attractive quadrant” for its ideal blend of high-quality speech generation and low cost. The model stands out further with native multi-speaker dialogue, support for 70+ languages, and granular creative control via natural language.

New audio tags for more expressive speech generation3.1 Flash TTS also introduces audio tags — an intuitive way to control vocal style, pace and delivery. By embedding natural language commands directly into the text input, you can steer AI-speech output with improved levels of granularity.

3.1 Flash TTS enables enterprises to utilize audio tags within Vertex AI, empowering the next generation of enterprise applications.

You can start experimenting with these audio tags along with other updates to the developer experience in Google AI Studio with configurable controls that place the developer in the “director’s chair”:Scene direction: Set the stage by defining the environment and providing specific dialogue instructions. This world-building context helps characters remain “in-character” and react to one another naturally across multiple turns.Speaker-level specificity: Cast characters using unique Audio Profiles, then specify Director’s Notes to toggle pace, tone and accent. Using inline tags, speakers can pivot from these high-level settings to change expression mid-sentence.Seamless export: Once the performance is perfected, these exact parameters can be exported as Gemini API code to ensure consistent, recognizable voices across various projects and platforms.With these new configurations, developers can enhance precision for specific scenarios, creating memorable characters and immersive audio experiences.

Built for global scaleGemini 3.1 Flash TTS delivers high-fidelity speech and more precise control across more than 70 languages. These core optimizations bring advanced style, pacing and accent control to major markets — helping developers create localized, expressive speech experiences for users at global scale.Early developer and enterprise testers are already seeing the impact of 3.1 Flash TTS, highlighting its impressive controllability and expressivity. They’ve told us how audio tags provide a new level of creative precision, transforming simple text into a high-fidelity vocal performance.

Watermarked with SynthIDAll audio generated by Gemini 3.1 Flash TTS is watermarked with SynthID. This imperceptible watermark is interwoven directly into the audio output, allowing the reliable detection of AI-generated content to help prevent misinformation. For more information on our approach to safety and responsibility, you can review the model card.

Get more stories from Google in your inbox.

Done. Just one step more.

Check your inbox to confirm your subscription.

You are already subscribed to our newsletter.

You can also subscribe with a

Gemini 3.1 Flash TTS：次世代の表現力豊かなAI音声

#TTS #音声合成 #マルチモーダルAI #Gemini #Google DeepMind #表現制御

TL;DR

Google DeepMindは、細かな音声タグによる精密な制御で表現力豊かなAI音声生成を可能にする次世代音声モデル「Gemini 3.1 Flash TTS」を発表した。

AI深層分析2026年4月16日 02:41

注目/ 5段階

深度40%

キーポイント

表現力豊かなAI音声生成

Gemini 3.1 Flash TTSは、従来よりも自然で感情表現に富んだAI音声の生成を実現する次世代音声モデルである。

細粒度音声タグによる精密制御

グラニュラーオーディオタグと呼ばれる細かなタグ付け機能により、ユーザーはAI音声の表現を精密に制御できる。

Google DeepMindの最新音声技術

Google DeepMindが開発した最新の音声モデルであり、同社のAI音声技術の進化を示している。

影響分析・編集コメントを表示

影響分析

編集コメント

表現力豊かなAI音声生成への注目が高まる中、Google DeepMindが制御性を強化した新モデルを発表。実用化の進展に期待がかかる。

Apr 15, 2026

8 min read

Our newest audio model introduces granular audio tags that give you precise control to direct AI speech for expressive audio generation.

Vilobh Meshram

Senior Product Manager

Max Gubin

Principal Research Engineer on behalf of the Gemini team

General summary

Summaries were generated by Google AI. Generative AI is experimental.

Bullet points

"Gemini 3.1 Flash TTS" is a new AI speech model with better control, expressiveness, and quality.

This model has improved speech quality, making it sound more natural than previous versions.

Audio tags let you control vocal style, pace, and delivery using natural language commands.

Developers can use Google AI Studio to fine-tune voices and export settings for consistent use.

Gemini 3.1 Flash TTS supports 70+ languages and uses SynthID watermarking to identify AI-generated audio.

Summaries were generated by Google AI. Generative AI is experimental.

Basic explainer

Summaries were generated by Google AI. Generative AI is experimental.

Your browser does not support the audio element.

Listen to article

This content is generated by Google AI. Generative AI is experimental

[[duration]] minutes

3.1 Flash TTS enables enterprises to utilize audio tags within Vertex AI, empowering the next generation of enterprise applications.

Gemini 3.1 Flash TTS：次世代の表現力豊かなAI音声

キーポイント

影響分析

編集コメント

関連記事

Gemini 3.1 Flash TTS：次世代の表現力豊かなAI音声

キーポイント

影響分析

編集コメント

関連記事