Gemini 3.1 Flash Live:音声AIをより自然で信頼性の高いものに
Google DeepMindは、音声AIモデル「Gemini 3.1 Flash Live」の精度向上と遅延低減により、音声インタラクションをより流暢で自然かつ正確にすることを発表した。
キーポイント
音声AIモデルの性能向上
Google DeepMindが最新の音声モデル「Gemini 3.1 Flash Live」を発表し、精度の向上と遅延の低減を実現した。
音声インタラクションの改善目標
この改善により、音声による対話をより流暢で自然、かつ正確なものにすることを目指している。
技術的焦点
発表内容の核心は、モデルの「精度(precision)」と「遅延(latency)」という二つの技術指標の改善にある。
影響分析・編集コメントを表示
影響分析
この発表は、音声AIの実用性とユーザー体験の向上に直接寄与する進展であり、音声インターフェースを利用するあらゆるサービスやデバイスの基盤技術強化につながる可能性がある。ただし、詳細な技術内容やベンチマークデータが公開されていないため、その革新性や業界全体への影響度は現時点では限定的と評価できる。
編集コメント
音声AIの基盤性能向上という実用的な進展だが、詳細な技術データや比較対象が示されていないため、現状は自社製品アップデートのアナウンスとして捉えるのが適切だろう。
私たちの最新音声モデルは、精度を向上させ、レイテンシーを低減。これにより、音声インタラクションはより流暢で自然、かつ正確になります。
原文を表示
Mar 26, 2026
8 min read
Our latest voice model has improved precision and lower latency to make voice interactions more fluid, natural and precise.
V
Valeria Wu
Product Manager
Y
Yifan Ding
Software Engineer on behalf of the Gemini team
General summary
Gemini 3.1 Flash Live is Google's highest-quality audio model, designed for natural and reliable real-time dialogue. Developers can access it through the Gemini Live API in Google AI Studio, while enterprises can use it for customer experience. Everyone can experience it via Search Live and Gemini Live, which now supports over 200 countries.
Summaries were generated by Google AI. Generative AI is experimental.
Bullet points
"Gemini 3.1 Flash Live" is here, making AI audio sound more natural and reliable.
This new audio model is faster and better at understanding tone for natural conversations.
Developers can use it to build voice agents that handle complex tasks more reliably.
Gemini Live and Search Live now offer more helpful responses in many languages.
All audio from 3.1 Flash Live is watermarked to help prevent the spread of misinformation.
Summaries were generated by Google AI. Generative AI is experimental.
Your browser does not support the audio element.
Listen to article
This content is generated by Google AI. Generative AI is experimental
[[duration]] minutes
Today, we’re advancing Gemini’s real-time dialogue capabilities with Gemini 3.1 Flash Live, our highest-quality audio and voice model yet. It delivers the speed and natural rhythm needed for the next generation of voice-first AI, offering a more intuitive experience for developers, enterprises and everyday users.3.1 Flash Live is available across Google products:For developers in preview via the Gemini Live API in Google AI StudioFor enterprises in Gemini Enterprise for Customer ExperienceFor everyone via Search Live and Gemini LiveFor developers: Robust reasoning and task executionWe’ve improved 3.1 Flash Live’s overall quality, making it more reliable for developers and enterprises to build voice-first agents that can complete complex tasks at scale. On ComplexFuncBench Audio, a benchmark that captures multi-step function calling with various constraints, it leads with a score of 90.8% compared to our previous model.
On Scale AI’s Audio MultiChallenge, Gemini 3.1 Flash Live leads with a score of 36.1% with “thinking” on. The benchmark specifically tests complex instruction following and long-horizon reasoning amidst the interruptions and hesitations typical of real-world audio.
3.1 Flash Live also has improved tonal understanding to deliver more natural dialogue. In Gemini Enterprise for Customer Experience, it’s even more effective at recognizing acoustic nuances like pitch and pace than 2.5 Flash Native Audio. It’s also better at dynamically adjusting its response to users' expressions of frustration or confusion.
3.1 Flash Live lets you build voice-ready agents that handle complex tasks in noisy environments.
Illustrative demonstration built with Gemini 3.1 Pro, powered by Gemini 3.1 Flash Live.
Companies like Verizon, LiveKit and The Home Depot have given positive feedback on 3.1 Flash Live in their workflows, highlighting its improved, natural conversation.
For everyone: More natural and intuitive interactionsIn Gemini Live and Search Live, the 3.1 Flash Live model delivers more helpful and natural responses, whether you’re asking quick daily questions or engaging in more complex conversations.With the 3.1 Flash Live model under the hood, Gemini Live delivers faster responses compared to the previous model and it can follow the thread of your conversation for twice as long, keeping your train of thought intact during longer brainstorms.
3.1 Flash Live makes Gemini Live faster and more helpful
3.1 Flash Live is also inherently multilingual, which enables this week’s global expansion of Search Live. With this launch, people in more than 200 countries and territories can now have real-time, multimodal conversations with Search in their preferred language.
Get real-time troubleshooting help using 3.1 Flash Live in Search Live
Search Live is enabling real-time multimodal conversations in more languages
Try Gemini 3.1 Flash LiveAll audio generated by 3.1 Flash Live is watermarked with SynthID. This imperceptible watermark is interwoven directly into the audio output, allowing the reliable detection of AI-generated content to help prevent misinformation. For more information on our approach to safety and responsibility, see the model card.Experience the naturalness and reliability of 3.1 Flash Live, starting today. We look forward to seeing how you interact and build with it.
Get more stories from Google in your inbox.
Done. Just one step more.
Check your inbox to confirm your subscription.
You are already subscribed to our newsletter.
You can also subscribe with a
Related stories
.
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み