Vercel Blog·2026年2月19日 22:00·約8分

AI Gatewayによる動画生成

#動画生成AI #マルチモーダルAI #AIプラットフォーム #生成AI #AI統合API #Vercel

TL;DR

VercelのAI Gatewayが動画生成機能をベータ公開し、4つの主要AIモデル（Grok Imagine、Wan、Kling、Veo）を統合APIで利用可能になり、テキスト・画像・動画を一貫したインターフェースで生成できるようになった。

AI深層分析2026年3月18日 07:44

重要/ 5段階

深度40%

キーポイント

AI Gatewayの動画生成機能ベータ公開

VercelのAI Gatewayが動画生成機能を追加し、AI SDK 6とPlaygroundを通じて、フォトリアルな品質のシネマティック動画を同期音声付きで生成できるようになった。

4つの主要動画生成モデルを統合

xAIのGrok Imagine、AlibabaのWan、Kling、GoogleのVeoの4モデル（17バリエーション）を統合し、各モデルの強み（指示追従、アイデンティティ維持、画像→動画、物理的リアリズム）を活用できる。

4種類の動画生成タイプをサポート

テキスト→動画、画像→動画、最初と最後のフレーム指定、参照→動画の4種類の生成方法を提供し、広告クリエイティブからブランドキャラクター制作まで多様なユースケースに対応。

開発者とノーコード両方のアクセス方法

AI SDK 6によるプログラム生成と、AI Gateway Playgroundによるノーコード実験環境の両方を提供し、プロバイダ比較やプロンプト調整が可能。

AI動画生成の主要手法

テキストから動画生成、画像から動画生成、開始・終了フレーム指定による遷移生成、参照動画/画像からのキャラクター生成、既存動画のスタイル変換など、多様な手法が提供されている。

具体的な応用例

プロダクション品質の映像制作、アプリケーション向けプログラム的動画生成、ソーシャルメディア向けクリエイティブコンテンツ、商品画像のアニメーション化、ビフォア/アフター比較動画など、様々な用途に活用できる。

影響分析・編集コメントを表示

影響分析

この発表は、AI動画生成技術の実用化が加速していることを示し、開発者が複数プロバイダの最先端モデルを統一インターフェースで利用できる環境を提供することで、AI動画生成の普及と応用範囲の拡大を促進する。Vercelのプラットフォーム戦略がテキスト・画像に続き動画領域にも拡大し、エンタープライズ向けAIソリューションの競争力を強化する。

編集コメント

AI動画生成が開発者向けプラットフォームで本格的に利用可能になった画期的な発表。主要モデルを一元管理できる環境は、企業のAI動画制作ワークフローを大きく変える可能性がある。

タイトル: AI Gatewayによる動画生成

AI Gatewayが動画生成をサポートしました。これにより、写真のようにリアルな品質の映画的な動画を、同期した音声とともに作成し、一貫したアイデンティティを持つパーソナライズされたコンテンツを生成できます。これらすべてがAI SDK 6を通じて実現します。

始め方（2つの方法）

動画生成はベータ版で、現在ProおよびEnterpriseプラン、ならびに有料のAI Gatewayユーザーが利用できます。

AI SDK 6: テキストや画像に使用するのと同じインターフェースで、プログラムにより動画を生成できます。単一のAPI、単一の認証フロー、AIパイプライン全体をカバーする単一のオブザーバビリティダッシュボードを利用できます。
AI Gateway Playground: 各モデルページに組み込まれた設定可能なAI Gatewayプレイグラウンドで、コードを書かずに動画モデルを試せます。プロバイダーを比較し、プロンプトを調整し、コードなしで結果をダウンロードできます。アクセスするには、モデルリスト内のいずれかの動画生成モデルをクリックしてください。

4つの初期動画モデルと17のバリエーション

xAIのGrok Imagineは高速で、指示への追従性に優れています。スタイル転送による動画の作成と編集を、すべて数秒で行えます。
AlibabaのWanは、参照ベースの生成とマルチショットのストーリーテリングを得意とし、シーン間でアイデンティティを保持できます。
Klingは、画像から動画への変換とネイティブオーディオに優れています。新しい3.0モデルは、自動シーントランジションを備えたマルチショット動画をサポートします。
GoogleのVeoは、高い視覚的忠実度と物理的なリアリズムを提供します。映画的な照明と物理特性を備えたネイティブオーディオ生成を実現します。

動画リクエストの理解

動画モデルには、単に望むものを記述する以上の情報が必要です。画像生成とは異なり、動画プロンプトにはモーションの手がかり（カメラの動き、オブジェクトの動作、タイミング）や、オプションで音声の指示を含めることができます。各プロバイダーはproviderOptionsを通じて異なる機能を公開しており、根本的に異なる生成モードを可能にします。モデル固有のオプションについては、ドキュメントを参照してください。

生成タイプ

AI Gatewayは当初、4種類の動画生成をサポートします。

タイプ	入力	説明	使用例
テキストからビデオ	テキストプロンプト	シーンを記述し、動画を取得	広告クリエイティブ、説明動画、ソーシャルコンテンツ
画像からビデオ	画像、テキストプロンプト（オプション）	静止画像にモーションを与えてアニメーション化	製品紹介、ロゴ表示、写真アニメーション
最初と最後のフレーム	2つの画像、テキストプロンプト（オプション）	開始状態と終了状態を定義し、モデルがその間を埋める	前後の変化の表示、タイムラプス、トランジション
参照からビデオ	画像または動画	参照画像や動画からキャラクターを抽出し、新しいシーンに配置	スポークスパーソンコンテンツ、一貫したブランドキャラクター

モデル作成者全体で、AI Gateway上のモデルにおける現在の機能は以下の通りです。

モデル作成者	機能
xAI	テキストからビデオ、画像からビデオ、ビデオ編集、オーディオ
Wan	テキストからビデオ、画像からビデオ、参照からビデオ、オーディオ
Kling	テキストからビデオ、画像からビデオ、最初と最後のフレーム、オーディオ
Veo	テキストからビデオ、画像からビデオ、オーディオ

テキストからビデオ

望むものを記述し、動画を取得します。モデルが視覚、モーション、およびオプションで音声を処理します。単純なテキストプロンプトだけで、超現実的でプロダクション品質の映像を作成するのに最適です。

例: プログラムによる大規模動画生成。アプリ、プラットフォーム、またはコンテンツパイプラインのために、オンデマンドで動画を生成します。ライセンス料や制作は不要で、プロンプトと出力のみで完了します。

この例では、klingai/kling-v2.6-t2vを使用して、テキストプロンプトから指定されたアスペクト比と長さの動画を生成しています。

例: クリエイティブコンテンツ生成。単純なプロンプトを、自然な動きと映画的な品質を備えたソーシャルメディア、広告、またはストーリーテリングのための洗練された動画クリップに変換します。

非常に具体的で記述的なプロンプトを設定することで、google/veo-3.1-generate-001は膨大な詳細と正確に望まれる動きを持つ動画を生成します。

画像からビデオ

開始画像を提供し、それをアニメーション化します。初期構成を制御し、その後モデルにモーション生成を行わせます。

例: 製品画像のアニメーション化。既存の製品写真をインタラクティブな動画に変換します。

klingai/kling-v2.6-i2vモデルは、画像URLとプロンプト内のモーション記述を渡した後、製品画像をアニメーション化します。

例: アニメーションイラスト。静的なアートワークに微妙な動きを与えて命を吹き込みます。テーマ別コンテンツや大規模マーケティングに最適です。
例: ライフスタイルおよび製品写真。食品、飲料、またはライフスタイルのショットに微妙な動きを加えて、ソーシャルコンテンツを作成します。

ここでは、コーヒーの写真が、照明の方向や細部のディテールとともに、よりインタラクティブな動画としてレンダリングされています。

最初と最後のフレーム

開始状態と終了状態を定義し、モデルがそれらの間のシームレスなトランジションを生成します。

例: 前後の変化の表示。服装の交換、製品比較、時間の経過による変化。2つの画像をアップロードし、シームレスなトランジションを取得します。

開始状態と終了状態は、プロンプトとプロバイダーオプションで使用された2つの画像でここに定義されています。

この例では、klingai/kling-v3.0-i2vを使用して、開始フレームをimageで、終了フレームをlastFrameImageで定義できます。モデルがそれらの間のトランジションを生成します。

参照からビデオ

人物やキャラクターの参照動画または画像を提供し、モデルがその外見と声を抽出して、一貫したアイデンティティを持つ彼らを主演させる新しいシーンを生成します。

この例では、犬の2つの参照画像を使用して最終動画を生成しています。

alibaba/wan-v2.6-r2v-flashを使用すると、プロンプト内の人物やキャラクターを活用するようモデルに指示できます。Wanは、最良の結果を得るために、マルチ参照から動画への変換にはプロンプト内でcharacter1、character2などを使用することを推奨しています。

ビデオ編集

スタイル転送で既存の動画を変換します。動画URLを提供し、望む変換を記述します。モデルは元の動きを保ちながら新しいスタイルを適用します。

ここでは、xai/grok-imagine-videoが、以前の生成からのソース動画を利用して、水彩画スタイルに編集しています。

始めましょう

動画モデルのさらなる例と詳細な設定オプションについては、動画生成ドキュメントを確認してください。動画生成クイックスタートでは、簡単な開始スクリプトも見つけることができます。

これらの動画モデルの変更履歴をチェックして、より詳細な例とプロンプトを確認してください。

Grok Imagine
Alibaba Wan
Veo
Kling

詳細を読む

原文を表示

AI Gateway now supports video generation, so you can create cinematic videos with photorealistic quality, synchronized audio, generate personalized content with consistent identity, all through AI SDK 6.

Two ways to get started

Video generation is in beta and currently available for Pro and Enterprise plans and paid AI Gateway users.

AI SDK 6: Generate videos programmatically with the same interface you use for text and images. One API, one authentication flow, one observability dashboard across your entire AI pipeline.

AI Gateway Playground: Experiment with video models with no code in the configurable AI Gateway playground that's embedded in each model page. Compare providers, tweak prompts, and download results without writing code. To access, click any video gen model in the model list.

Four initial video models; 17 variations

Grok Imagine from xAI is fast and great at instruction following. Create and edit videos with style transfer, all in seconds.

Wan from Alibaba specializes in reference-based generation and multi-shot storytelling, with the ability to preserve identity across scenes.

Kling excels at image to video and native audio. The new 3.0 models support multishot video with automatic scene transitions.

Veo from Google delivers high visual fidelity and physics realism. Native audio generation with cinematic lighting and physics.

Understanding video requests

Video models require more than just describing what you want. Unlike image generation, video prompts can include motion cues (camera movement, object actions, timing) and optionally audio direction. Each provider exposes different capabilities through providerOptions that unlock fundamentally different generation modes. See the documentation for model-specific options.

Generation types

AI Gateway initially supports 4 types of video generation:

Type

Inputs

Description

Example use cases

Text-to-video

Text prompt

Describe a scene, get a video

Ad creative, explainer videos, social content

Image-to-video

Image, text prompt optional

Animate a still image with motion

Product showcases, logo reveals, photo animation

First and last frame

2 images, text prompt optional

Define start and end states, model fills in between

Before/after reveals, time-lapse, transitions

Reference-to-video

Images or videos

Extract a character from reference images or videos and place them in new scenes

Spokesperson content, consistent brand characters

Across the model creators, their current capabilities across the models on AI Gateway are listed below:

Model Creator

Capabilities

xAI

Text-to-video, image-to-video, video editing, audio

Wan

Text-to-video, image-to-video, reference-to-video, audio

Kling

Text-to-video, image-to-video, first and last frame, audio

Veo

Text-to-video, image-to-video, audio

Text-to-video

Describe what you want, get a video. The model handles visuals, motion, and optionally audio. Great for hyperrealistic, production-quality footage with just a simple text prompt.

Example: Programmatic video at scale. Generate videos on demand for your app, platform, or content pipeline. No licencing fees or production required, just prompts and outputs.

This example uses klingai/kling-v2.6-t2v to generate video from a text prompt with a specified aspect ratio and duration.

Example: Creative content generation. Turn a simple prompt into polished video clips for social media, ads, or storytelling with natural motion and cinematic quality.

By setting a very specific and descriptive prompt, google/veo-3.1-generate-001 generates video with immense detail and the exact desired motion.

Image-to-video

Provide a starting image and animate it. Control the initial composition, then let the model generate motion.

Example: Animate product images. Turn existing product photos into interactive videos.

The klingai/kling-v2.6-i2v model animates a product image after you pass an image URL and motion description in the prompt.

Example: Animated illustrations. Bring static artwork to life with subtle motion. Perfect for thematic content or marketing at scale.

Example: Lifestyle and product photography. Add subtle motion to food, beverage, or lifestyle shots for social content.

Here, a picture of coffee is rendered for a more interactive video, with lighting direction and minute details.

First and last frame

Define the start and end states, and the model generates a seamless transition between them.

Example: Before/after reveals. Outfit swaps, product comparisons, changes over time. Upload two images, get a seamless transition.

The start and end states are defined here with two images that used in the prompt and provider options.

In this example, klingai/kling-v3.0-i2v lets you define the start frame in image and the end frame in lastFrameImage. The model generates the transition between them.

Reference-to-video

Provide reference videos or images of a person/character, and the model extracts their appearance and voice to generate new scenes starring them with consistent identity.

In this example, 2 reference images of dogs are used to generate the final video.

Using alibaba/wan-v2.6-r2v-flash here, you can instruct the model to utilize the people/characters within the prompt. Wan suggests using character1, character2, etc. in the prompt for multi-reference to video to get the best results.

Video Editing

Transform existing videos with style transfer. Provide a video URL and describe the transformation you want. The model applies the new style while preserving the original motion.

Here, xai/grok-imagine-video utilizes a source video from a previous generation to edit into a watercolor style.

Get started

For more examples and detailed configuration options for video models, check out the Video Generation Documentation. You can also find simple getting started scripts with the Video Generation Quick Start.

Check out the changelogs for these video models for more detailed examples and prompts.

Grok Imagine

Alibaba Wan

Veo

Kling

この記事をシェア

Vercel Blog2026年7月3日 10:00

Vercel サンドボックスが FUSE ベースのファイルシステムをサポート

Latent Space重要度42026年7月3日 09:08

Vercel のアンドリュー・クウ氏が語る、エージェントがもたらすソフトウェアの新たな形

Vercel Blog2026年7月3日 09:00

Vercel CLI でフラグセグメントを管理可能に

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

タイプ

入力

説明

使用例

テキストからビデオ

テキストプロンプト

シーンを記述し、動画を取得

広告クリエイティブ、説明動画、ソーシャルコンテンツ

画像からビデオ

画像、テキストプロンプト（オプション）

静止画像にモーションを与えてアニメーション化

製品紹介、ロゴ表示、写真アニメーション

最初と最後のフレーム

2つの画像、テキストプロンプト（オプション）

開始状態と終了状態を定義し、モデルがその間を埋める

前後の変化の表示、タイムラプス、トランジション

参照からビデオ

画像または動画

参照画像や動画からキャラクターを抽出し、新しいシーンに配置

スポークスパーソンコンテンツ、一貫したブランドキャラクター

モデル作成者

機能

xAI

テキストからビデオ、画像からビデオ、ビデオ編集、オーディオ

Wan

テキストからビデオ、画像からビデオ、参照からビデオ、オーディオ

Kling

テキストからビデオ、画像からビデオ、最初と最後のフレーム、オーディオ

Veo

テキストからビデオ、画像からビデオ、オーディオ

AI Gatewayによる動画生成

キーポイント

影響分析

編集コメント

関連記事

AI Gatewayによる動画生成

キーポイント

影響分析

編集コメント

関連記事