Jay Alammar·2023年1月1日 09:00·約9分

AI画像生成で古いコンピューターグラフィックスを再構築

#画像生成AI #Stable Diffusion #プロンプトエンジニアリング #Midjourney #Dall-E #レトロゲーム

TL;DR

AI画像生成ツール（Stable Diffusion、Dall-E、Midjourney）を用いて1987年のビデオゲーム『Nemesis 2』のオープニング映像を高解像度で再現する実証実験が行われ、プロンプトの調整やスタイル指定の重要性が示された。

AI深層分析2026年2月27日 22:44

注目/ 5段階

深度40%

キーポイント

AI画像生成によるレトロゲームグラフィックスの再現実験

Stable Diffusion、Dall-E、Midjourneyを使用して、1987年のゲーム『Nemesis 2』のオープニング映像を高解像度で再生成する試みが行われた。

プロンプト調整の重要性と課題

単に画像の主題を記述するだけでは不十分で、特定のスタイルを導く「arcane keywords」を含める必要があり、30枚以上の画像生成とプロンプト調整を要した。

プロンプト探索手法としてのLexica活用

効果的なプロンプトを探すために、Lexicaのようなギャラリーで数百万の例と対応するプロンプトを検索する手法が採用された。

Midjourneyの美的品質の高さ

Midjourneyは主題のみを含む元のプロンプトでも特に美しい結果を生成し、その品質が際立っていた。

AI画像生成ツールの限界

テキストの正確な再現や特定の要素配置が困難で、Photoshopなどの補助ツールが必要な場合がある。

ワークフローの課題

同じキャラクターやオブジェクトを複数のパネルで一貫して再現することが難しく、高度な手法はまだ使いにくい。

ツールの選択と適応

異なるAIツール（Midjourney、DALL-Eのアウトペインティングなど）を用途に応じて使い分け、プロンプト調整が必要。

影響分析・編集コメントを表示

影響分析

この記事は、AI画像生成技術がエンターテインメント資産の保存・再解釈に応用できる可能性を示す実証例であり、プロンプトエンジニアリングの実践的課題を具体的に浮き彫りにしている。ただし、現状では手動調整に依存する部分が大きく、完全自動化への道筋は明確ではない。

編集コメント

AI画像生成の実用的な課題を具体例で示した良質な実証レポート。プロンプト調整の泥臭さが伝わり、技術の現状理解に役立つ。

AI画像生成で古いコンピュータグラフィックスを再制作する

AI画像生成ツールは、古いビデオゲームのグラフィックスを再構想し、高解像度版として作り直すことができるだろうか？

ここ数日間、私はAI画像生成を使って子供時代の悪夢の一つを再現してみた。Stable Diffusion、Dall-E、Midjourneyと格闘し、これらの商用AI生成ツールが古いビジュアルストーリー――古いビデオゲーム（MSX用『ネメシス2』）のイントロ映像――を再表現するのにどのように役立つかを確かめた。この記事では、より高精細なグラフィックスでストーリーを再表現するためにこれらのモデル／サービスを使用した過程と私の経験を記述する。

この端正な紳士は、ビデオゲームの悪役である。ドクター・ベノムは、1987年のビデオゲーム『ネメシス2』のイントロ映像に登場する。この画像は特に、映像の中で劇的な登場場面で現れる。

これらのグラフィックスをビジュアル生成AIツールで更新し、それぞれを比較してどこが成功しどこが失敗するか見てみよう。

AI画像生成で古いコンピュータグラフィックスを再制作する

以下は、オリジナル映像のパネル（左列）とAIツールによって生成された最終的なパネル（右列）を並べて比較したものである：

この図には最終的なドクター・ベノムのグラフィックは示していない。なぜなら、適切な文脈と適切な音楽と共に、私が体験したようにあなたにも目撃してほしいからだ。それはこちらで視聴できる：

最終的な画像は、Dream Studioを使用したStable Diffusionによって生成された。

しかし、この画像に至る道のりでは、30枚以上の画像を生成しプロンプトを調整する必要があった。私が最初に使用した種類のプロンプトは次のようなものだ：

黒い空に星々が輝く宇宙で、赤い惑星の上を飛ぶ戦闘機

これはDall-Eに以下の候補を生成させる。

同様のプロンプトをDream Studioに貼り付けると、これらの候補が生成される：

これは現在の画像生成モデルの現実を示している。画像の主題を記述するだけではプロンプトとして不十分なのだ。画像生成のプロンプト／呪文には、モデルを特定のスタイルへと導く正確な難解なキーワードを記載する必要がある。

Lexicaでプロンプトを検索する

現在の解決策は、プロンプトガイドを読み過去に人々が成功したスタイルを学ぶか、Lexicaのような数百万の例とそれぞれのプロンプトを含むギャラリーを検索することだ。私は後者を選ぶ。特定のモデルの特定のバージョンで機能する難解なキーワードを学ぶことは、長期的には勝利の戦略ではないからだ。

ここから、気に入った画像を見つけ、プロンプトのスタイル部分を保持したまま主題を編集する。最終的には次のようになる：

赤い惑星の上を飛ぶ戦闘機、宇宙、後方に炎を噴くジェット、黒い空の星々、溶岩、USSR、ソビエト、リアルなSF宇宙船として!!!、宇宙に浮かぶ、広角ショットのアート、ヴィンテージレトロSF、リアルな宇宙、デジタルアート、Artstationでトレンド、シンメトリー!!! ドラマチックなライティング。

Midjourneyの結果は常に特に美しいことで際立っている。主題のみを含むオリジナルのプロンプトで試してみた。結果は驚くべきものだった。

これらは信じられないほどに見えるが、オリジナル画像の本質をStable Diffusionのものほどには捉えていない。しかし、これによりストーリーの残りの部分ではまずMidjourneyを試してみようと確信した。生成すべき画像は約8枚あり、それぞれについてまずまずの結果を得るための時間は限られていた。

オリジナル画像：

失敗した試み

Midjourneyはドクター・ベノムの外見をおおよそ再現できたが、ポーズと抑制された感じを得るのは難しかった。その試みは次のような見た目だった：

そのため、代わりに彼が鉄格子の向こうにいるように画像を調整した。

オリジナル画像：

モデルに横長の画像を生成するよう指示するには、–ar 3:2コマンドで希望のアスペクト比を指定する。

オリジナル画像：

Midjourneyは多くの戦闘機の図面における「かっこよさ」の要素を本当に捉えている。テキストは意味をなさないが、エイリアンのようなものを目指しているならそれは有利に働くかもしれない。

このワークフローでは、後のパネルで同じ飛行機を再現するのは難しいだろう。最近の、より高度な手法（テキストualインバージョンやPhotoboothなど）がこれを助ける可能性はあるが、現時点ではテキストから画像へのサービスよりも使用が難しい。

オリジナル画像：

この画像は、現在のAI画像ツール群で可能なことの限界を示している：

1- 画像内のテキストを正確に再現することは、まだ広く利用可能ではない（GoogleのImagenで実証されたように技術的には可能だが）

2- 要素の特定の配置や操作が必要な場合、テキストから画像への生成は最適なパラダイムではない

そのため、この最終画像を得るには、星の画像をPhotoshopにインポートし、そこでテキストと線を追加する必要があった。

オリジナル画像：

この画像の最も象徴的な部分である三つの目を再現することに失敗した。私が試したどのプロンプトを使っても、モデルはその見た目を生成しなかった。

そこで、Dream Studioでインペインティングを試みた。

インペインティングは、モデルに画像の一部のみを生成するよう指示する。この場合、それは上記のDream Studio内でブラシで削除した部分だ。

時間内に良い結果を得ることはできなかった。ただし、ギャラリーを見ると、モデルは目に関わる恐ろしい画像を生成する能力がかなりあるようだ。

オリジナル画像：

候補生成：

オリジナル画像：

候補生成：

この画像は、DALL-Eのアウトペインティングツールを試し、キャンバスを拡張して周囲の空間をコンテンツで埋める良い機会を提供した。

DALL-Eアウトペインティングでキャンバスを拡張する

船長としてこの画像を選ぶことにしたとしよう。

これをDALL-Eのアウトペインティングエディターにアップロードし、何世代にもわたって画像の周囲のイメージを拡張し続けることができる（連続性を保つために画像の一部を考慮に入れながら）。

アウトペインティングのワークフローはテキスト2画像とは異なり、プロンプトを変更して画像の各部分ごとに制作している部分を記述する必要がある。

商用AI画像生成ツールに関する現在の印象

大多数の人々がAI画像生成ツールに広くアクセスできるようになってから数ヶ月が経った。ここでの主要なマイルストーンはStable Diffusionのオープンソースリリースだ（ただし、以前からDALL-Eにアクセスできた人もおり、OpenAI GLIDEのようなモデルは公開されていたが速度が遅く能力も低かった）。この間、私はこれらの画像生成サービスのうち三つを使用する機会を得た。

Stability AIによるDream Studio

これは過去数ヶ月間、私が最も使用してきたものだ。

彼らはStable Diffusionを作り、その管理バージョンを提供している――ワークフローの大きな利便性と改善だ。

APIを持っているため、モデルにプログラムでアクセスできる。機能を拡張し、画像生成コンポーネントを使用するより高度なシステムを構築するための重要なポイントである。

Stable Diffusionの製作者であることから、彼らが今後もより良くなると期待される将来バージョンの管理バージョンを最初に提供し続けることが期待される。

Stable Diffusionがオープンソースであるという事実も、彼らに有利なもう一つの大きなポイントだ。管理モデルはプロトタイピングの場（または特定のユースケースにおける本番ツール）として使用できるが、ユースケースが独自モデルのファインチューニングを必要とする場合にはオープンソース版に戻れるという知識がある。

現在、最も多くのオプションを持つ最高のユーザーインターフェース（いくつかのオープンソースUIのように圧倒的ではない）。調整に必要な主要なスライダーがあり、生成する候補の数を選択できる。彼らはインペインティングのような高度な機能のためのユーザーインターフェースコンポーネントを素早く追加した。

Dream Studioはまだ、ユーザーが生成したすべての画像の履歴を堅牢に保持していない。

古いバージョンのStable Diffusion（例：1.4および1.5）の方が、良い結果を得やすいままである（Lexicaのようなギャラリーに助けられて）。新しいモデルは、コミュニティによってまだ理解されつつあるようだ。

これまでで最高の生成品質で、プロンプト調整が最も少ない。

UIは生成のアーカイブを保存する。

ウェブサイトのコミュニティタブフィードは、コミュニティが生み出しているアートワークの素晴らしい展示場だ。ある意味、それは

原文を表示

Remaking Old Computer Graphics With AI Image Generation

Can AI Image generation tools make re-imagined, higher-resolution versions of old video game graphics?

Over the last few days, I used AI image generation to reproduce one of my childhood nightmares. I wrestled with Stable Diffusion, Dall-E and Midjourney to see how these commercial AI generation tools can help retell an old visual story - the intro cinematic to an old video game (Nemesis 2 on the MSX). This post describes the process and my experience in using these models/services to retell a story in higher fidelity graphics.

This fine-looking gentleman is the villain in a video game. Dr. Venom appears in the intro cinematic of Nemesis 2, a 1987 video game. This image, in particular, comes at a dramatic reveal in the cinematic.

Let’s update these graphics with visual generative AI tools and see how they compare and where each succeeds and fails.

Remaking Old Computer graphics with AI Image Generation

Here’s a side-by-side look at the panels from the original cinematic (left column) and the final ones generated by the AI tools (right column):

This figure does not show the final Dr. Venom graphic because I want you to witness it as I had, in the proper context and alongside the appropriate music. You can watch that here:

The final image was generated by Stable Diffusion using Dream Studio.

The road to this image, however, goes through generating over 30 images and tweaking prompts. The first kind of prompt I’d use is something like:

fighter jets flying over a red planet in space with stars in the black sky

This leads Dall-E to generate these candidates

Pasting a similar prompt into Dream Studio generates these candidates:

This showcases a reality of the current batch of image generation models. It is not enough for your prompt to describe the subject of the image. Your image creation prompt/spell needs to mention the exact arcane keywords that guide the model toward a specific style.

Searching for prompts on Lexica

The current solution is to either go through a prompt guide and learn the styles people found successful in the past, or search a gallery like Lexica that contains millions of examples and their respective prompts. I go for the latter as learning arcane keywords that would work on specific versions of specific models is not a winning strategy for the long term.

From here, I find an image that I like, and edit it with my subject keeping the style portion of the prompt, so finally it looks like:

fighter jets flying over a red planet in space flaming jets behind them, stars on a black sky, lava, ussr, soviet, as a realistic scifi spaceship!!!, floating in space, wide angle shot art, vintage retro scifi, realistic space, digital art, trending on artstation, symmetry!!! dramatic lighting.

The results of Midjourney have always stood out as especially beautiful. I tried it with the original prompt containing only the subject. The results were amazing.

While these look incredible, they don’t capture the essence of the original image as well as the Stable Diffusion one does. But this convinced me to try Midjourney first for the remainder of the story. I had about eight images to generate and only a limited time to get an okay result for each.

Original Image:

Failed attempts

While Midjourney could approximate the appearance of Dr. Venom, it was difficult to get the pose and restraint. My attempts at that looked like this:

That’s why I tweaked the image to show him behind bars instead.

Original Image:

To instruct the model to generate a wide image, the –ar 3:2 command specifies the desired aspect ratio.

Original Image:

Midjourney really captures the cool factor in a lot of fighter jet schematics. The text will not make sense, but that can work in your favor if you’re going for something alien.

In this workflow, it’ll be difficult to reproduce the same plane in future panels. Recent, more advanced methods like textual inversion or photobooth could aid in this, but at this time they are more difficult to use than text-to-image services.

Original Image:

This image shows a limitation in what is possible with the current batch of AI image tools:

1- Reproducing text correctly in images is still not yet widely available (although technically possible as demonstrated in Google’s Imagen)

2- Text-to-image is not the best paradigm if you need a specific placement or manipulation of elements

So to get this final image, I had to import the stars image into photoshop and add the text and lines there.

Original Image:

I failed at reproducing the most iconic portion of this image, the three eyes. The models wouldn’t generate the look using any of the prompts I’ve tried.

I then proceeded to try in-painting in Dream Studio.

In-painting instructs the model to only generate an image for a portion of the image, in this case, it’s the portion I deleted with the brush inside of Dream Studio above.

I couldn’t get to a good result in time. Although looking at the gallery, the models are quite capable of generating horrific imagery involving eyes.

Original Image:

Candidate generations:

Original Image:

Candidate generations:

This image provided a good opportunity to try out DALL-E’s outpainting tool to expand the canvas and fill-in the surrounding space with content.

Expanding the Canvas with DALL-E Outpainting

Say we decided to go with this image for the ship’s captain

We can upload it to DALL-E’s outpainting editor and over a number of generations continue to expand the imagery around the image (taking into consideration a part of the image so we keep some continuity).

The outpainting workflow is different from the text2image in that the prompt has to be changed to describe the portion you’re crafting at each portion of the image.

My Current Impressions of Commercial AI Image Generation Tools

It’s been a few months since the vast majority of people started having broad access to AI image generation tools. The major milestone here is the open source release of Stable Diffusion (although some people had access to DALL-E before, and models like OpenAI GLIDE were publicly available but slower and less capable). During this time, I’ve gotten to use three of these image generation services.

Dream Studio by Stability AI

This is what I have been using the most over the last few months.

They made Stable Diffusion and serve a managed version of it – a major convenience and improvement in workflow.

They have an API and so the models can be accessed programmatically. A key point for extending the capability and building more advanced systems that use an image generation component.

Being the makers of Stable Diffusion, it is expected they will continue to be the first to offer the managed version of upcoming versions which are expected to keep getting better.

The fact that Stable Diffusion is open source is another big point in their favor. The managed model can be used as a prototyping ground (or a production tool for certain use cases), yet you have the knowledge that if your use cases requires fine-tuning your own model you can revert to the open source versions.

Currently the best user interface with the most options (without being overwhelming like some of the open source UIs). It has the key sliders you need to tweak and you can pick how many candidates to generate. They were quick to add user interface components for advanced features like in-painting.

Dream Studio still does not robustly keep a history of all the images the user generates.

Older versions of Stable Diffusion (e.g. 1.4 and 1.5) remain easier to get better results with (aided by galleries like Lexica). The newer models are still being figured out by the community, it seems.

By far the best generation quality with the least amount of prompt tweaking

The UI saves the archive of generation

Community tab feed in the website is a great showcase of the artwork the community is pumping out. In a way, it is Midjourney’s own Lexica.

Can only be accessed via Discord, as far as I can tell. I don’t find that to be a compelling channel. As a trial user, you need to generate images in public “Newbie” channels (which didn’t work for me when I tried them a few months ago – understandable given the meteoric growth the platform has experienced). I revisited the service only recently and paid for a subscription that would allow me to directly generate images using a bot.

No UI components to pick image size or other options. Options are offered as commands to add to the prompt. I found that to be less discoverable than Dream Studio’s UI which shows the main sliders and describes them.

Can’t access it via API (as far as I can tell) or generate images in the browser.

DALL-E was the first to dazzle the world with the capabilities of this batch of image generation models.

Inpainting and outpainting support

Keeps the entire history of generated images

Feels a little slower than Stable Diffusion, but good that it generates four candidate images

Because it lags behind Midjourney in quality of images generated in response to simple prompts, and behind Stable Diffusion in community adoption and tooling (in my perception), I haven’t found a reason to spend a lot of time exploring DALL-E. Outpainting feels kinda magical, however. I think that’s where I may spend some more time exploring.

That said, do not discount DALL-E just yet, however. OpenAI are quite the pioneers and I’d expect the next versions of the model to dramatically improve generation quality.

This is a good place to end this post although there are a bunch of other topics I had wanted to address. Let me know what you think on @JayAlammar or @JayAlammar@sigmoid.social.

![](/images/image-gen/

2464700474_Two_astronauts_exploring_the_dark__cavernous_interior_of_a_huge_derelict_spacecraft__digital_art__ne.png)

この記事をシェア

TechCrunch AI重要度42026年7月5日 03:00

ミッドジャーニー、ハリウッドスタジオに AI 利用の詳細開示を要求

The Verge AI2026年7月3日 20:49

ミッドジャーニーの医療用スキャナー、多くの疑問を残した裏側レポート

Jay Alammar2025年3月26日 09:00

Substackへの移行

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Jay Alammar·2023年1月1日 09:00·約9分

AI画像生成で古いコンピューターグラフィックスを再構築

#画像生成AI #Stable Diffusion #プロンプトエンジニアリング #Midjourney #Dall-E #レトロゲーム

TL;DR

AI深層分析2026年2月27日 22:44

注目/ 5段階

深度40%

キーポイント

AI画像生成によるレトロゲームグラフィックスの再現実験

Stable Diffusion、Dall-E、Midjourneyを使用して、1987年のゲーム『Nemesis 2』のオープニング映像を高解像度で再生成する試みが行われた。

プロンプト調整の重要性と課題

プロンプト探索手法としてのLexica活用

効果的なプロンプトを探すために、Lexicaのようなギャラリーで数百万の例と対応するプロンプトを検索する手法が採用された。

Midjourneyの美的品質の高さ

Midjourneyは主題のみを含む元のプロンプトでも特に美しい結果を生成し、その品質が際立っていた。

AI画像生成ツールの限界

テキストの正確な再現や特定の要素配置が困難で、Photoshopなどの補助ツールが必要な場合がある。

ワークフローの課題

同じキャラクターやオブジェクトを複数のパネルで一貫して再現することが難しく、高度な手法はまだ使いにくい。

ツールの選択と適応

異なるAIツール（Midjourney、DALL-Eのアウトペインティングなど）を用途に応じて使い分け、プロンプト調整が必要。

影響分析・編集コメントを表示

影響分析

編集コメント

AI画像生成の実用的な課題を具体例で示した良質な実証レポート。プロンプト調整の泥臭さが伝わり、技術の現状理解に役立つ。

AI画像生成で古いコンピュータグラフィックスを再制作する

AI画像生成ツールは、古いビデオゲームのグラフィックスを再構想し、高解像度版として作り直すことができるだろうか？

これらのグラフィックスをビジュアル生成AIツールで更新し、それぞれを比較してどこが成功しどこが失敗するか見てみよう。

AI画像生成で古いコンピュータグラフィックスを再制作する

以下は、オリジナル映像のパネル（左列）とAIツールによって生成された最終的なパネル（右列）を並べて比較したものである：

最終的な画像は、Dream Studioを使用したStable Diffusionによって生成された。

黒い空に星々が輝く宇宙で、赤い惑星の上を飛ぶ戦闘機

これはDall-Eに以下の候補を生成させる。

同様のプロンプトをDream Studioに貼り付けると、これらの候補が生成される：

Lexicaでプロンプトを検索する

ここから、気に入った画像を見つけ、プロンプトのスタイル部分を保持したまま主題を編集する。最終的には次のようになる：

Midjourneyの結果は常に特に美しいことで際立っている。主題のみを含むオリジナルのプロンプトで試してみた。結果は驚くべきものだった。

オリジナル画像：

失敗した試み

Midjourneyはドクター・ベノムの外見をおおよそ再現できたが、ポーズと抑制された感じを得るのは難しかった。その試みは次のような見た目だった：

そのため、代わりに彼が鉄格子の向こうにいるように画像を調整した。

オリジナル画像：

モデルに横長の画像を生成するよう指示するには、–ar 3:2コマンドで希望のアスペクト比を指定する。

オリジナル画像：

この画像は、現在のAI画像ツール群で可能なことの限界を示している：

1- 画像内のテキストを正確に再現することは、まだ広く利用可能ではない（GoogleのImagenで実証されたように技術的には可能だが）

2- 要素の特定の配置や操作が必要な場合、テキストから画像への生成は最適なパラダイムではない

そのため、この最終画像を得るには、星の画像をPhotoshopにインポートし、そこでテキストと線を追加する必要があった。

オリジナル画像：

そこで、Dream Studioでインペインティングを試みた。

インペインティングは、モデルに画像の一部のみを生成するよう指示する。この場合、それは上記のDream Studio内でブラシで削除した部分だ。

オリジナル画像：

候補生成：

オリジナル画像：

候補生成：

この画像は、DALL-Eのアウトペインティングツールを試し、キャンバスを拡張して周囲の空間をコンテンツで埋める良い機会を提供した。

DALL-Eアウトペインティングでキャンバスを拡張する

船長としてこの画像を選ぶことにしたとしよう。

商用AI画像生成ツールに関する現在の印象

Stability AIによるDream Studio

これは過去数ヶ月間、私が最も使用してきたものだ。

彼らはStable Diffusionを作り、その管理バージョンを提供している――ワークフローの大きな利便性と改善だ。

Dream Studioはまだ、ユーザーが生成したすべての画像の履歴を堅牢に保持していない。

これまでで最高の生成品質で、プロンプト調整が最も少ない。

UIは生成のアーカイブを保存する。

ウェブサイトのコミュニティタブフィードは、コミュニティが生み出しているアートワークの素晴らしい展示場だ。ある意味、それは

原文を表示

Remaking Old Computer Graphics With AI Image Generation

Can AI Image generation tools make re-imagined, higher-resolution versions of old video game graphics?

Let’s update these graphics with visual generative AI tools and see how they compare and where each succeeds and fails.

Remaking Old Computer graphics with AI Image Generation

Here’s a side-by-side look at the panels from the original cinematic (left column) and the final ones generated by the AI tools (right column):

This figure does not show the final Dr. Venom graphic because I want you to witness it as I had, in the proper context and alongside the appropriate music. You can watch that here:

The final image was generated by Stable Diffusion using Dream Studio.

The road to this image, however, goes through generating over 30 images and tweaking prompts. The first kind of prompt I’d use is something like:

fighter jets flying over a red planet in space with stars in the black sky

This leads Dall-E to generate these candidates

Pasting a similar prompt into Dream Studio generates these candidates:

Searching for prompts on Lexica

From here, I find an image that I like, and edit it with my subject keeping the style portion of the prompt, so finally it looks like:

The results of Midjourney have always stood out as especially beautiful. I tried it with the original prompt containing only the subject. The results were amazing.

Original Image:

Failed attempts

While Midjourney could approximate the appearance of Dr. Venom, it was difficult to get the pose and restraint. My attempts at that looked like this:

That’s why I tweaked the image to show him behind bars instead.

Original Image:

To instruct the model to generate a wide image, the –ar 3:2 command specifies the desired aspect ratio.

Original Image:

Midjourney really captures the cool factor in a lot of fighter jet schematics. The text will not make sense, but that can work in your favor if you’re going for something alien.

Original Image:

This image shows a limitation in what is possible with the current batch of AI image tools:

1- Reproducing text correctly in images is still not yet widely available (although technically possible as demonstrated in Google’s Imagen)

2- Text-to-image is not the best paradigm if you need a specific placement or manipulation of elements

So to get this final image, I had to import the stars image into photoshop and add the text and lines there.

Original Image:

I failed at reproducing the most iconic portion of this image, the three eyes. The models wouldn’t generate the look using any of the prompts I’ve tried.

I then proceeded to try in-painting in Dream Studio.

In-painting instructs the model to only generate an image for a portion of the image, in this case, it’s the portion I deleted with the brush inside of Dream Studio above.

I couldn’t get to a good result in time. Although looking at the gallery, the models are quite capable of generating horrific imagery involving eyes.

Original Image:

Candidate generations:

Original Image:

Candidate generations:

This image provided a good opportunity to try out DALL-E’s outpainting tool to expand the canvas and fill-in the surrounding space with content.

Expanding the Canvas with DALL-E Outpainting

Say we decided to go with this image for the ship’s captain

The outpainting workflow is different from the text2image in that the prompt has to be changed to describe the portion you’re crafting at each portion of the image.

My Current Impressions of Commercial AI Image Generation Tools

Dream Studio by Stability AI

This is what I have been using the most over the last few months.

They made Stable Diffusion and serve a managed version of it – a major convenience and improvement in workflow.

They have an API and so the models can be accessed programmatically. A key point for extending the capability and building more advanced systems that use an image generation component.

Being the makers of Stable Diffusion, it is expected they will continue to be the first to offer the managed version of upcoming versions which are expected to keep getting better.

Dream Studio still does not robustly keep a history of all the images the user generates.

By far the best generation quality with the least amount of prompt tweaking

The UI saves the archive of generation

Community tab feed in the website is a great showcase of the artwork the community is pumping out. In a way, it is Midjourney’s own Lexica.

Can’t access it via API (as far as I can tell) or generate images in the browser.

DALL-E was the first to dazzle the world with the capabilities of this batch of image generation models.

Inpainting and outpainting support

Keeps the entire history of generated images

Feels a little slower than Stable Diffusion, but good that it generates four candidate images

That said, do not discount DALL-E just yet, however. OpenAI are quite the pioneers and I’d expect the next versions of the model to dramatically improve generation quality.

This is a good place to end this post although there are a bunch of other topics I had wanted to address. Let me know what you think on @JayAlammar or @JayAlammar@sigmoid.social.