読み込み中…

Google DeepMind·2026年5月18日 04:50·約8分

ジェミニ・オミの紹介

#Gemini Omni #マルチモーダル AI #動画生成 #Google DeepMind #推論能力

TL;DR

Google DeepMind は、画像生成から動画生成へと進化し、あらゆる入力から高品質な動画を生成できる新モデル「Gemini Omni」を発表した。

AI深層分析2026年7月4日 20:12

重要/ 5段階

深度40%

キーポイント

マルチモーダル入力の統合と動画生成

Gemini Omni は画像、音声、テキスト、動画など複数の入力形式を組み合わせ、それらに基づいて高品質な動画を生成する能力を持つ。

推論能力と創造性の融合

従来の Gemini の推論能力（reasoning）と創造的生成能力が統合され、現実世界の知識に基づいた動画作成が可能になった。

画像生成からの進化の継続

昨年の「Nano Banana」による画像生成・編集機能の成功を受け、次世代として動画領域への展開を宣言している。

多様な入力から動画生成

画像、音声、テキストを組み合わせることで、Gemini の実世界知識に基づいた高品質な動画を生成できます。

会話による直感的な編集

自然言語での指示により、キャラクターの一貫性や物理法則を維持したまま、動画の特定部分や全体を変更・編集することが可能です。

Gemini Omni Flash の展開

Omni ファミリー初のモデル「Gemini Omni Flash」が Gemini アプリ、Google Flow、YouTube Shorts に順次導入されます。

動画アクションの再構築

Omniを使用すると、撮影した動画に対して自然言語で指示を出すだけで、登場するキャラクターやオブジェクトの変更、アクションの変形、あるいは全く予期せぬ瞬間への転換が可能になります。

重要な引用

Gemini Omni Flash is a model that can create anything from any input – starting with video.

Omni is our new model that can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge.

Omni is our new model that can create anything from any input — starting with video.

Every instruction builds on the last. Your characters stay consistent, the physics hold up and the scene remembers what came before.

Take a video you shot and just ask Omni to change what's happening.

Change the environment, angle, style or even specific details, without ever losing the thread of your original scene.

影響分析・編集コメントを表示

影響分析

この発表は、テキストや静止画だけでなく、複雑なマルチモーダル入力を統合して動的な動画を生成する技術的飛躍を示しており、クリエイティブ産業における動画制作ワークフローの変革を促す可能性があります。特に「推論能力」を動画生成に組み込むことで、単なる映像の合成ではなく、文脈を理解した高品質なコンテンツ作成が可能になる点が決定的な意義を持ちます。

編集コメント

Google は画像生成の成功を動画領域へ迅速に拡張しており、マルチモーダル AI の実用化スピードが加速していることが伺えます。推論能力を生成プロセスに組み込むアプローチは、今後生成されるコンテンツの質と信頼性を高める鍵となるでしょう。

16 min read

Gemini Omni Flash は、あらゆる入力から何でも生成できるモデルです。まずは動画から始めます。

昨年、Nano Banana によって、ジェミニの知能が画像生成および編集に持ち込まれました。以来、数百万の人々が古い写真の修復やスケッチからのデザイン作成、以前には不可能だった方法でのアイデアの可視化を支援してきました。最初から私たちは、ジェミニを根本からネイティブにマルチモーダルなモデルとして構築しており、今、次の一歩を踏み出します。

Gemini Omni をご紹介します。ここで、ジェミニの推論能力と創造能力が融合します。Omni は、あらゆる入力から何でも生成できる新しいモデルで、まずは動画から始めます。Omni を使えば、画像、音声、動画、テキストを入力として組み合わせ、ジェミニの実世界知識に基づいた高品質な動画を生成できます。また、会話を通じて動画を簡単に編集することも可能です。

本日、Omni ファミリーの最初のモデルである Gemini Omni Flash を、Gemini アプリ、Google Flow、YouTube Shorts に順次展開します。将来的には、画像や音声などの出力モダリティもサポートする予定です。以下に、Omni が特別である理由の一部をご紹介します：

会話を通じて動画を編集する

Gemini Omni は、自然言語を使って動画をより簡単に編集できる方法を提供します。すべての指示は前の指示を基に積み重ねられていきます。登場人物の一貫性が保たれ、物理法則も正しく機能し、シーンは過去の出来事を記憶しています。

周囲の世界を変革する。 特定の要素だけを変更することも、すべてを変えることも可能です。あなたの動画は、自分では決して撮影できなかった何かを生み出すための出発点となります。

プロンプト: スカルプチャーを泡で作ってください。

アクションを再考する。 自分が撮影した動画に対して、Omni に何かが起きているかのように問いかけるだけでよいのです。アクションを編集したり、新しいキャラクターやオブジェクトを追加したり、ある瞬間を予期せぬものに変えたりできます。

プロンプト: 人物が鏡に触れたとき、鏡が液体のように美しく波打つようにし、人物の腕も反射する鏡の素材になるようにしてください。

複数のターンにわたって動画を洗練させる。 環境、アングル、スタイル、あるいは特定のディテールを変更しても、元のシーンの一貫性を失うことはありません。カルーセルをスクロールして、編集がどのように積み重ねられていくかを確認してください。

プロンプト: ヴァイオリニストが曲を演奏している動画です。

アイデアに命を吹き込み、ジェミニの世界知識に基づいて

ジェミニ・オムニは単にリアルに見えるシーンを構築するだけでなく、次に何が起こるべきかを推論します。重力や運動エネルギー、流体力学といった直感的な理解と、歴史・科学・文化的文脈に関するジェミニの知識を組み合わせ、フォトリアリズムから意味のあるストーリーテリングへの架け橋となります。

より正確な物理法則を持つビジュアルを作成。 オムニは、重力、運動エネルギー、流体力学などの力に対する直感的な理解が向上しており、より現実的なシーンを創造できます。

プロンプト: マーブルが連鎖反応式のトラックを高速で転がり続ける、連続した滑らかなショット。

知識と創造性の融合。 オムニはジェミニの知識を活用し、言語・画像・意味をパターンマッチングを超えた方法で結びつけます。

プロンプト: この動画ではアルファベットの各文字に対応するアイテムが映されています。各文字の頭文字から始まる珍しいアイテム（C はカピバラ、D はディスコボール、L はラヴァランプなど）がテーブルの上に置かれています。26 文字すべてを 26 のアイテムで表現し、対応する下部テロップを表示する必要があります。一度に 1 つのアイテムと下部テロップのみが表示されます。各下部テロップは、左下に黒いマーカーで紙に書かれたような見た目である必要があります。連射形式で、各アイテムあたり約 9 フレーム（24FPS）。最後のフレームには「THE END」と書かれた紙が映ります。動画全体には落ち着いた滑らかな音楽が流れます。

複雑なアイデアを視覚化。 Omni は短いプロンプトから説得力のある解説動画を作成し、より複雑なアイデアを分解するビジュアルを生成します。

プロンプト: タンパク質の折りたたみを説明するクレイアニメーション。すべて粘土製。手は出さない。ストップモーション。正確に描く

任意の入力組み合わせから動画を作成

あらゆるものを参照可能。 Omni は、画像、テキスト、ビデオ、音声のいずれを参照しても、単一の統合された出力に変換します。当初は音声参照として「音声」のみがサポートされますが、他の種類の音声入力については近日公開予定です。

プロンプト: image_0.png を基にしたダイナミックな SF フィルムスタイルの動画。要素は video_0.mp4 と同様に点灯し、audio_0.wav の音楽のリズムに同期する

手持ちのものから始める。 入力参照を使用することで、キャラクター、シーン、またはスケッチの画像を用いて、自分のビジョンに沿った方法で創作できます。

プロンプト: 私が歩く間に世界が徐々にレトロフューチャリスティックなスタイル（image-1 のように粒状でムードのある雰囲気）へと変化していく様子を想像してください。音声はレトロフューチャリスティックな背景音楽として使用します。10 秒間。

スタイル、モーション、またはエフェクトを適用。 入力参照を使用して視覚言語を定義するか、自然言語で記述するだけで構いません。Omni は入力参照をブレンドして、統合されたクリップを作成します。

プロンプト：すべての内容をそのままに保ちつつ、スケートボードから動きのアニメーション効果を追加してください

独自のデジタルアバターで動画を作成する

私たちは AI を責任を持って開発することにコミットしており、ユーザーを危害から守り、AI ツールの利用を管理するための明確なポリシーを持っています。まず、Avatars（アバター）を使用して、ご自身の声で動画を作成することができます。これにより、あなた自身に似た見た目と音声を持つデジタルバージョンが作成され、あなたそっくりの動画を生成できるようになります。アバター機能以外にも、音声を編集して変更したり、話し方を調整したりする機能については、現在も責任を持ってユーザーに提供できるようにテストを継続し、その仕組みをより深く理解しているところです。

Omni で作成されたすべての動画には、検知が困難な SynthID（シンシッド）のデジタル透かしが含まれています。Gemini アプリ、Chrome 内の Gemini、および Google Search を通じて、動画が Gemini Omni によって生成されたことを簡単に確認できます。ウェブ全体でコンテンツがどのように作成・編集されたかを理解し、コンテンツの透明性と検証ツールの拡充について詳しくは、当社のブログ記事をご覧ください。

Try Gemini Omni now

今日、私たちはオムニファミリーの最初のモデルである「Gemini Omni Flash」をリリースします。Gemini Omni Flash は本日、世界中のすべての Google AI Plus, Pro および Ultra 契約者向けに、Gemini アプリと Google Flow を通じて展開を開始します。また、今週から YouTube Shorts および YouTube Create App のユーザーにも無償で展開されます。

今後数週間にかけて、開発者および企業顧客向けにも API を通じて展開していきます。

## Get more stories from Google in your inbox.

Done. Just one step more.

Check your inbox to confirm your subscription.

You are already subscribed to our newsletter.

You can also subscribe with a

原文を表示

16 min read

Gemini Omni Flash is a model that can create anything from any input – starting with video.

Last year, Nano Banana brought Gemini's intelligence to image generation and editing. Since then, it’s helped millions of people restore old photos, design from sketches and visualize ideas in ways that weren’t possible before. From the start we built Gemini to be natively multimodal from the ground up, and now we’re taking the next step.

We’re introducing Gemini Omni, where Gemini’s ability to reason meets the ability to create. Omni is our new model that can create anything from any input — starting with video. With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge. You can also easily edit your videos through conversation.

Today, we’re rolling out the first model in the Omni family: Gemini Omni Flash, to the Gemini app, Google Flow and YouTube Shorts. In time we will support output modalities like image and audio. Here’s some of what makes Omni special:

Edit your videos through conversation

Gemini Omni gives you an easier way to edit video — with natural language. Every instruction builds on the last. Your characters stay consistent, the physics hold up and the scene remembers what came before.

Transform the world around you. Change specific things, or change everything. Your video becomes the starting point for something you never could have filmed yourself.

Prompt: Make the sculpture out of bubbles.

Reimagine the action. Take a video you shot and just ask Omni to change what’s happening. Edit the action, add in new characters or objects, or transform a moment into something unexpected.

Prompt: When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person's arm turns into reflective mirror material.

Refine your videos across multiple turns. Change the environment, angle, style or even specific details, without ever losing the thread of your original scene. Scroll through the carousel to see how edits build on each other.

Prompt: A video of a violinist playing a song.

Bring ideas to life, grounded in Gemini’s world knowledge

Gemini Omni doesn't just build scenes that look real, it reasons about what should happen next. It combines an intuitive understanding of physics with Gemini's knowledge of history, science and cultural context, bridging the gap from photorealism to meaningful storytelling.

Create visuals with more accurate physics. Omni has an improved intuitive understanding of forces like gravity, kinetic energy and fluid dynamics, allowing you to create more realistic scenes.

Prompt: A marble rolling fast on a chain reaction style track, continuous smooth shot.

Blend knowledge and creativity. Omni draws on Gemini's knowledge to connect language, imagery and meaning in ways that go far beyond pattern matching.

Prompt: The video shows items of the alphabet. An unusual item starting with each letter is shown sitting on a table (like a Capybara for C, disco globe for D and Lava Lamp for L). All 26 letters must be represented by 26 items with matching lower thirds displaying the letter. Only one item and lower third at a time. Each lower third must look like a black marker written on a slip of paper in the bottom left. Rapid fire, roughly 9 frames per item at 24FPS. Last frame is a slip of paper "THE END". The whole video is accompanied by calm smooth music.

Complex ideas made visual. Omni can create compelling explainers from short prompts, generating visuals that break down more complex ideas.

Prompt: claymation explainer of protein folding, everything is made out of clay, no hands, stop motion, accurate

Create videos from any combination of inputs

Reference anything. Omni turns any reference — image, text, video or audio — into a single, cohesive output. While only voice references will be supported for audio to start, we’ll roll out other types of audio inputs soon.

Prompt: Dynamic sci-fi film style video based on image_0.png. Elements light up similar to video_0.mp4 synchronized to the beat of the music from audio_0.wav

Start from what you have. With input references, you can use images of characters, scenes or drawings to create in a way that matches your vision.

Prompt: Imagine the world gradually changing into retro futuristic style (grainy and moody as image-1) as I walk. Use the audio for a retro-futuristic background music. 10s.

Apply styles, motion or effects. Define the visual language by using input references, or just describe it with natural language. Omni blends the input references to create a cohesive clip.

Prompt: edit this keeping everything the same. add animated motion effects coming out of the skateboard

Create videos with your own digital avatar

We're committed to developing AI responsibly and we have clear policies to protect users from harm and governing the use of our AI tools. To start, you can create videos with your own voice by using Avatars, which create a digital version of yourself so you can generate videos that look and sound like you. Beyond the avatar feature, in terms of editing videos to change audio and speech, we are still working to test this and better understand how we can bring this capability to users responsibly.

All videos created with Omni include our imperceptible SynthID digital watermark. You can easily verify that videos were generated with Gemini Omni through the Gemini app, Gemini in Chrome and Google Search. You can find out more about how we're expanding our content transparency and verification tools to help you understand how content was created and edited across the web in our blog post.

Try Gemini Omni now

Today, we’re launching the first model in the Omni family — Gemini Omni Flash. Gemini Omni Flash is rolling out today to all Google AI Plus, Pro and Ultra subscribers globally through the Gemini app and Google Flow. It’s also rolling out at no cost to users on YouTube Shorts and YouTube Create App starting this week.

In the coming weeks, we'll also be rolling it out to developers and enterprise customers via APIs.

Get more stories from Google in your inbox.

Done. Just one step more.

Check your inbox to confirm your subscription.

You are already subscribed to our newsletter.

You can also subscribe with a

この記事をシェア

Google DeepMind重要度42026年7月22日 22:38

Google、科学発見に4000万ドル拠出

Google DeepMind重要度42026年7月22日 00:16

Google DeepMind、新AIモデル「Gemini 3.6 Flash」シリーズを発表

Google DeepMind2026年7月18日 00:00

Google DeepMind、サイバーセキュリティ向け Gemini 3.5 Flash 発表

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Google DeepMind·2026年5月18日 04:50·約8分

ジェミニ・オミの紹介

#Gemini Omni #マルチモーダル AI #動画生成 #Google DeepMind #推論能力

TL;DR

Google DeepMind は、画像生成から動画生成へと進化し、あらゆる入力から高品質な動画を生成できる新モデル「Gemini Omni」を発表した。

AI深層分析2026年7月4日 20:12

重要/ 5段階

深度40%

キーポイント

マルチモーダル入力の統合と動画生成

Gemini Omni は画像、音声、テキスト、動画など複数の入力形式を組み合わせ、それらに基づいて高品質な動画を生成する能力を持つ。

推論能力と創造性の融合

従来の Gemini の推論能力（reasoning）と創造的生成能力が統合され、現実世界の知識に基づいた動画作成が可能になった。

画像生成からの進化の継続

昨年の「Nano Banana」による画像生成・編集機能の成功を受け、次世代として動画領域への展開を宣言している。

多様な入力から動画生成

画像、音声、テキストを組み合わせることで、Gemini の実世界知識に基づいた高品質な動画を生成できます。

会話による直感的な編集

自然言語での指示により、キャラクターの一貫性や物理法則を維持したまま、動画の特定部分や全体を変更・編集することが可能です。

Gemini Omni Flash の展開

Omni ファミリー初のモデル「Gemini Omni Flash」が Gemini アプリ、Google Flow、YouTube Shorts に順次導入されます。

動画アクションの再構築

重要な引用

Gemini Omni Flash is a model that can create anything from any input – starting with video.

Omni is our new model that can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge.

Omni is our new model that can create anything from any input — starting with video.

Every instruction builds on the last. Your characters stay consistent, the physics hold up and the scene remembers what came before.

Take a video you shot and just ask Omni to change what's happening.

Change the environment, angle, style or even specific details, without ever losing the thread of your original scene.

影響分析・編集コメントを表示

影響分析

編集コメント

16 min read

Gemini Omni Flash は、あらゆる入力から何でも生成できるモデルです。まずは動画から始めます。

会話を通じて動画を編集する

プロンプト: スカルプチャーを泡で作ってください。

プロンプト: 人物が鏡に触れたとき、鏡が液体のように美しく波打つようにし、人物の腕も反射する鏡の素材になるようにしてください。

プロンプト: ヴァイオリニストが曲を演奏している動画です。

アイデアに命を吹き込み、ジェミニの世界知識に基づいて

プロンプト: マーブルが連鎖反応式のトラックを高速で転がり続ける、連続した滑らかなショット。

知識と創造性の融合。 オムニはジェミニの知識を活用し、言語・画像・意味をパターンマッチングを超えた方法で結びつけます。

複雑なアイデアを視覚化。 Omni は短いプロンプトから説得力のある解説動画を作成し、より複雑なアイデアを分解するビジュアルを生成します。

プロンプト: タンパク質の折りたたみを説明するクレイアニメーション。すべて粘土製。手は出さない。ストップモーション。正確に描く

任意の入力組み合わせから動画を作成

プロンプト: image_0.png を基にしたダイナミックな SF フィルムスタイルの動画。要素は video_0.mp4 と同様に点灯し、audio_0.wav の音楽のリズムに同期する

プロンプト：すべての内容をそのままに保ちつつ、スケートボードから動きのアニメーション効果を追加してください

独自のデジタルアバターで動画を作成する

Try Gemini Omni now

今後数週間にかけて、開発者および企業顧客向けにも API を通じて展開していきます。

## Get more stories from Google in your inbox.

Done. Just one step more.

Check your inbox to confirm your subscription.

You are already subscribed to our newsletter.

You can also subscribe with a

原文を表示

16 min read

Gemini Omni Flash is a model that can create anything from any input – starting with video.

Edit your videos through conversation

Transform the world around you. Change specific things, or change everything. Your video becomes the starting point for something you never could have filmed yourself.

Prompt: Make the sculpture out of bubbles.

Reimagine the action. Take a video you shot and just ask Omni to change what’s happening. Edit the action, add in new characters or objects, or transform a moment into something unexpected.

Prompt: When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person's arm turns into reflective mirror material.

Prompt: A video of a violinist playing a song.

Bring ideas to life, grounded in Gemini’s world knowledge

Create visuals with more accurate physics. Omni has an improved intuitive understanding of forces like gravity, kinetic energy and fluid dynamics, allowing you to create more realistic scenes.

Prompt: A marble rolling fast on a chain reaction style track, continuous smooth shot.

Blend knowledge and creativity. Omni draws on Gemini's knowledge to connect language, imagery and meaning in ways that go far beyond pattern matching.

Complex ideas made visual. Omni can create compelling explainers from short prompts, generating visuals that break down more complex ideas.

Prompt: claymation explainer of protein folding, everything is made out of clay, no hands, stop motion, accurate

Create videos from any combination of inputs

Prompt: Dynamic sci-fi film style video based on image_0.png. Elements light up similar to video_0.mp4 synchronized to the beat of the music from audio_0.wav

Start from what you have. With input references, you can use images of characters, scenes or drawings to create in a way that matches your vision.

Prompt: Imagine the world gradually changing into retro futuristic style (grainy and moody as image-1) as I walk. Use the audio for a retro-futuristic background music. 10s.

Apply styles, motion or effects. Define the visual language by using input references, or just describe it with natural language. Omni blends the input references to create a cohesive clip.

Prompt: edit this keeping everything the same. add animated motion effects coming out of the skateboard

Create videos with your own digital avatar

Try Gemini Omni now

In the coming weeks, we'll also be rolling it out to developers and enterprise customers via APIs.

Get more stories from Google in your inbox.

Done. Just one step more.

Check your inbox to confirm your subscription.

You are already subscribed to our newsletter.

You can also subscribe with a

この記事をシェア

Google DeepMind重要度42026年7月22日 22:38

Google、科学発見に4000万ドル拠出

Google DeepMind重要度42026年7月22日 00:16

Google DeepMind、新AIモデル「Gemini 3.6 Flash」シリーズを発表

Google DeepMind2026年7月18日 00:00

Google DeepMind、サイバーセキュリティ向け Gemini 3.5 Flash 発表

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

ジェミニ・オミの紹介

キーポイント

重要な引用

影響分析

編集コメント

会話を通じて動画を編集する

アイデアに命を吹き込み、ジェミニの世界知識に基づいて

任意の入力組み合わせから動画を作成

独自のデジタルアバターで動画を作成する

Try Gemini Omni now

Edit your videos through conversation

Bring ideas to life, grounded in Gemini’s world knowledge

Create videos from any combination of inputs

Create videos with your own digital avatar

Try Gemini Omni now

Get more stories from Google in your inbox.

関連記事

ジェミニ・オミの紹介

キーポイント

重要な引用

影響分析

編集コメント

会話を通じて動画を編集する

アイデアに命を吹き込み、ジェミニの世界知識に基づいて

任意の入力組み合わせから動画を作成

独自のデジタルアバターで動画を作成する

Try Gemini Omni now

Edit your videos through conversation

Bring ideas to life, grounded in Gemini’s world knowledge

Create videos from any combination of inputs

Create videos with your own digital avatar

Try Gemini Omni now

Get more stories from Google in your inbox.

関連記事