The Decoder·2026年4月14日 02:31·約1分で読める

新AIモデル、1枚の写真から45分間のリップシンク動画をリアルタイム生成

#生成AI #動画生成 #リップシンク #リアルタイムAI #アバター生成 #マルチモーダル

TL;DR

The Decoderは、単一の写真から45分間のリップシンク動画をリアルタイムで生成できる新AIモデル「LPM 1.0」が発表されたと報じ、表情や感情反応も再現するが現状は研究プロジェクト段階であると伝えた。

AI深層分析2026年4月14日 03:40

重要/ 5段階

深度40%

キーポイント

革新的な生成能力

単一の静止画を入力とするだけで、最大45分間のリップシンク（口パク）動画をリアルタイムで生成できる。

表情と感情の再現

生成される動画は口の動きだけでなく、豊かな表情変化や感情に応じた反応も含む。

リアルタイム処理

動画生成がリアルタイムで実行可能である点が、従来技術との大きな違いとなっている。

現状は研究段階

この技術「LPM 1.0」は現時点では研究プロジェクトであり、製品化や一般公開の時期は明らかでない。

影響分析・編集コメントを表示

影響分析

この技術は、アバター生成、バーチャルアシスタント、コンテンツ制作、遠隔コミュニケーションなど、多様な分野に大きな変革をもたらす可能性がある。特にリアルタイム性は、ライブ配信やインタラクティブアプリケーションへの応用を可能にし、AI生成メディアの新たな基準を設ける重要な進展と言える。

編集コメント

単一画像からの長時間・高品質な動画生成がリアルタイムで可能になった点は画期的。ただし研究段階であり、実際の応用や倫理的課題（ディープフェイク対策など）への言及が記事にない点は留意が必要。

一枚の画像が話すキャラクターに：LPM 1.0は、リップシンク、表情、感情的反応を備えたリアルタイム動画を生成します。現時点では、これはあくまで研究プロジェクトです。

記事「新AIモデル、1枚の写真から45分間のリップシンク動画をリアルタイム生成」は、The Decoderで最初に公開されました。

原文を表示

Researchers have introduced LPM 1.0, an AI model that generates real-time video of a speaking, listening, or singing figure from a single image.

The model processes text, audio, and reference images simultaneously, producing lip-synchronized speech, subtle facial expressions like hesitation or shifts in gaze, and emotional transitions. It can plug directly into voice-audio AI models from ChatGPT or Doubao to create a visual conversation partner in real time.

LPM 1.0 works across different image styles, photorealistic faces, anime, and 3D game characters, without any additional training. The entire video generation runs as a streaming process in real time rather than rendering a finished video all at once. Videos up to 45 minutes long should remain stable.

LPM 1.0 utilizes what the researchers call "multi-granularity identity conditioning:" alongside a main image, the model also receives reference images from different angles and with varying facial expressions. This means it doesn't have to invent details like teeth, wrinkles tied to specific emotions, or profile views on its own — it can pull them directly from the reference material.

The model recognizes three conversational states. When listening, it generates reactive facial expressions like nodding or gaze shifts based on incoming audio. When speaking, the response audio drives lip movements and body language. During pauses, LPM generates natural idle behavior based on text instructions.

Beyond real-time conversation, LPM 1.0 also supports offline video generation from existing audio, useful for podcasts or movie dialogs, according to project manager Ailing Zeng. This opens the door to content creation outside of live chats. Video-based input control isn't included in this version, but Zeng says the framework could support it in the future.

Still a research project with no public release planned

The development team stresses that LPM 1.0 is purely a research project. There are no plans to release weights, code, or a public demo. All faces shown are AI-generated, not real people. The researchers acknowledge that the generated videos still contain visible artifacts, and a quantitative analysis confirmed a noticeable gap compared to real video quality.

The team also says they'd only consider opening access "if and when adequate safeguards and responsible-use frameworks are firmly in place." More details are available on the project page and in the technical report.

Even as a research project, LPM 1.0 points to where things are heading: AI systems that don't just communicate through text or voice, but show up as visually believable characters with facial expressions, eye contact, and emotional reactions. That could prove valuable for education, gaming, customer service, or virtual companions.

At the same time, the technology carries serious risks. It edges dangerously close to real-time deepfake infrastructure that bad actors could exploit for fraud, manipulation, or impersonation. All of those things are already happening, what keeps shrinking is the barrier to entry. The researchers are explicit that the system is not meant to mislead, deceive, or impersonate real people.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Subscribe now

この記事をシェア

Ars Technica AI★42026年5月28日 02:36

YouTube、AI 生成動画の自動ラベル付けを開始

Google が運営する YouTube は、2024 年の試行に続き、アップローダーへの依存を減らし、AI 生成動画をより目立つ形で自動的にラベル付けする方針を発表した。

404 Media★42026年6月4日 23:33

存在しない判例を引用した弁護士を裁判官が厳しく批判する様子を見よ

生成 AI を使用した弁護士が、架空の判例や引用を含む誤った書類を提出し、裁判所から時間浪費と職業への恥辱として非難されている事例が増えている。

One Useful Thing★32026年5月27日 04:56

人間らしさを選ぶこと

One Useful Thing は、ソーシャルメディアの投稿やコメント、学術論文、ニューヨーク・タイムズの意見記事が AI によって生成され始めている現状を指摘し、人間らしさを維持する選択の重要性を論じています。

ニュース一覧に戻る元記事を読む