読み込み中…

Simon Willison Blog·2026年4月22日 05:32·約4分

「ハムラジオを持つアライグマはどこ？（ChatGPT Images 2.0）」

#画像生成AI #マルチモーダルモデル #プロンプトエンジニアリング #OpenAI #Google Gemini

TL;DR

Simon WillisonはOpenAIの画像生成モデル「ChatGPT Images 2.0」をテストし、複雑な「ウォーリーを探せ」スタイルのプロンプトにおいて、旧モデルの失敗とGeminiの新モデルの成功を比較検証している。

AI深層分析2026年4月22日 06:08

重要/ 5段階

深度40%

キーポイント

ChatGPT Images 2.0の公開と性能主張

OpenAIがgpt-image-2をリリースし、Sam Altman氏はGPT-3からGPT-5相当の飛躍と公式に主張している。

複雑な空間認識タスクによる実証テスト

「ウォーリーを探せ」スタイルでハムラジオを持つアライグマを探すプロンプトを用い、モデルの細部把握能力を評価した。

旧モデルと他社モデルの性能比較

gpt-image-1はアライグマを検出できず、GeminiのNano Banana 2は明確に描き分け、競合モデルの実力を示した。

LLMによる生成画像の分析活用

Claude Opus 4.7がGeminiの出力を評価し、細部の配置やコールサインなどの意図的な設計が指摘された。

重要な引用

OpenAI released ChatGPT Images 2.0 today, their latest image generation model.

Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5.

Yes — there's at least one raccoon in the picture, but it's very well hidden

Honestly, this one wasn't really hiding — he's the star of the booth.

影響分析・編集コメントを表示

影響分析

本記事は、OpenAIの最新画像モデルが実際の複雑な空間認識タスクで期待された性能を発揮していない可能性を示唆し、Geminiなどの競合モデルとの実力差を浮き彫りにしている。開発者やエンドユーザーにとって、ベンチマーク設計の重要性と、マルチモーダルAIの現状を客観的に評価する視点を提供する。業界全体として、画像生成モデルの「細部制御」と「空間推論」が次の競争焦点となることを示している。

編集コメント

OpenAIの公式主張と実テスト結果のズレを指摘する実践的な検証記事であり、マルチモーダルモデルの評価には「複雑な空間配置タスク」が鍵となる。開発者はベンチマーク設計の工夫と、競合モデルの実力差に注目すべきだ。

OpenAIは本日、最新の画像生成モデル (image generation model) であるChatGPT Images 2.0をリリースしました。ライブ配信において、Sam Altmanはgpt-image-1からgpt-image-2への飛躍がGPT-3からGPT-5へのジャンプに相当すると語っていました。以下、私がこのモデルを実際にテストした結果です。

私のプロンプト (prompt) ：

「Where's Waldo（ウォールーを探せ）」スタイルの画像を作成してください。ただし、ハムラジオを持っているアライグマがどこにいるか探すものです」

gpt-image-1

まずベースラインとして、ChatGPTから直接取得した古いgpt-image-1の出力結果は以下の通りです：

アライグマを見つけられませんでした。すぐに気づいたのですが、「Where's Waldo（イギリスではWhere's Wally）」スタイルの画像で画像生成モデルをテストするのは、かなりストレスがたまる作業になることがわかりました！

私はClaude Opus 4.7に、新しい高解像度入力を活用してこれを解決させてみましたが、画像左上の指示カードのおかげで、見つけられないアライグマが存在すると確信していました：

はい — 画像には少なくとも1匹のアライグマがいます。ただし、非常にうまく隠れています。**私の注意深いズームインしたセクションの探索では、正直に言って、ハムラジオを持っているアライグマを明確に見つけることはできませんでした。[...]

Nano Banana 2 and Pro

次に、GoogleのNano Banana 2をGemini経由で試しました：

これはかなり明らかなもので、アライグマは画像中央の「Amateur Radio Club」ブースにいます！

Claudeは次のように述べていました：

正直、これは本当に隠れているわけではなく、彼こそがブースの主演者です。最後の不可能なシーンの後、イラストレーターが私たちにかわいそうと思って描いたような気分です。ブースの看板にある「W6HAM」コールサインのジョークも良いタッチですね。

また、Nano Banana ProをAI Studioで試したところ、これまでにどのモデルよりも最悪の結果が得られました。ここで何が間違えたのかは不明です！

gpt-image-2

ベースラインが確立されたので、新しいモデルを試してみましょう。

私はopenai_image.pyスクリプトの更新版を使用しました。これはOpenAI Pythonクライアントライブラリ (client library) の薄いラッパーです。彼らのクライアントライブラリはまだgpt-image-2を含むように更新されていませんが、幸いモデルID (model ID) を検証しないため、そのまま使用できます。

以下が実行コマンドです：

OPENAI_API_KEY="$(llm keys get openai)" \

uv run https://tools.simonwillison.net/python/openai_image.py \

-m gpt-image-2 \

"Do a where's Waldo style image but it's where is the raccoon holding a ham radio"

以下が返ってきた結果です。アライグマがいるとは*思いません*。私は見つけられませんでしたし、Claudeも同様でした。

OpenAI image generation cookbookが、gpt-image-2に関するノート（出力品質設定 (outputQuality) と利用可能なサイズを含む）で更新されました。

outputQualityをhighに、解像度を3840x2160（最大値だと思われます）に設定して試したところ、以下の結果が得られました。17MBのPNGファイルですが、5MBのWEBPに変換しています：

OPENAI_API_KEY="$(llm keys get openai)" \

uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \

-m gpt-image-2 "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \

--quality high --size 3840x2160

かなり素晴らしい結果です！アライグマがハムラジオ (ham radio) を持っているのが写っています（左下、かなり見つけやすいです）。

この画像の生成には13,342出力トークン (output tokens) を使用しており、単価は100万トークンあたり30ドルなので、総コストは約40セントとなりました。

Takeaways

この新しいChatGPT画像生成モデルは、少なくとも現時点ではGeminiから王座を奪ったと思います。

「ウォールド君を探せ」スタイルの画像は、これらのモデルをテストする際にもどかしくかつやや愚かな方法ですが、テキストと細部を組み合わせた複雑なイラストレーションにおいて、これらのモデルがどれほど優秀になりつつあるかを説明する助けにはなります。

Update: asking models to solve this is risky

rizaco氏はHacker News上で、私がアライグマを見つけられなかった画像の1つについて、ChatGPTにそのアライグマの周りに赤い円を描くよう依頼しました。以下は、その結果と元の画像をアニメーションで組み合わせたものです：

image

どうやら、これらのモデルに自らのパズルを有用に解かせることは絶対に信頼できないようです！

Tags: ai, openai, generative-ai, chatgpt, llms, text-to-image, llm-release, nano-banana

原文を表示

OpenAI released ChatGPT Images 2.0 today, their latest image generation model. On the livestream Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5. Here's how I put it to the test.

My prompt:

Do a where's Waldo style image but it's where is the raccoon holding a ham radio

gpt-image-1

First as a baseline here's what I got from the older gpt-image-1 using ChatGPT directly:

I wasn't able to spot the raccoon - I quickly realized that testing image generation models on Where's Waldo style images (Where's Wally in the UK) can be pretty frustrating!

I tried getting Claude Opus 4.7 with its new higher resolution inputs to solve it but it was convinced there was a raccoon it couldn't find thanks to the instruction card at the top left of the image:

Yes — there's at least one raccoon in the picture, but it's very well hidden. In my careful sweep through zoomed-in sections, honestly, I couldn't definitively spot a raccoon holding a ham radio. [...]

Nano Banana 2 and Pro

Next I tried Google's Nano Banana 2, via Gemini:

That one was pretty obvious, the raccoon is in the "Amateur Radio Club" booth in the center of the image!

Claude said:

Honestly, this one wasn't really hiding — he's the star of the booth. Feels like the illustrator took pity on us after that last impossible scene. The little "W6HAM" callsign pun on the booth sign is a nice touch too.

I also tried Nano Banana Pro in AI Studio and got this, by far the worst result from any model. Not sure what went wrong here!

gpt-image-2

With the baseline established, let's try out the new model.

I used an updated version of my openai_image.py script, which is a thin wrapper around the OpenAI Python client library. Their client library hasn't yet been updated to include gpt-image-2 but thankfully it doesn't validate the model ID so you can use it anyway.

Here's how I ran that:

code

OPENAI_API_KEY="$(llm keys get openai)" \
  uv run https://tools.simonwillison.net/python/openai_image.py \
  -m gpt-image-2 \
  "Do a where's Waldo style image but it's where is the raccoon holding a ham radio"

Here's what I got back. I don't *think* there's a raccoon in there - I couldn't spot one, and neither could Claude.

The OpenAI image generation cookbook has been updated with notes on gpt-image-2, including the outputQuality setting and available sizes.

I tried setting outputQuality to high and the dimensions to 3840x2160 - I believe that's the maximum - and got this - a 17MB PNG which I converted to a 5MB WEBP:

code

OPENAI_API_KEY="$(llm keys get openai)" \
  uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
  -m gpt-image-2 "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
  --quality high --size 3840x2160

That's pretty great! There's a raccoon with a ham radio in there (bottom left, quite easy to spot).

The image used 13,342 output tokens, which are charged at $30/million so a total cost of around 40 cents.

Takeaways

I think this new ChatGPT image generation model takes the crown from Gemini, at least for the moment.

Where's Waldo style images are an infuriating and somewhat foolish way to test these models, but they do help illustrate how good they are getting at complex illustrations combining both text and details.

Update: asking models to solve this is risky

rizaco on Hacker News asked ChatGPT to draw a red circle around the raccoon in one of the images in which I had failed to find one. Here's an animated mix of their result and the original image:

The circle appears around a raccoon with a ham radio who is definitely not there in the original image!

Looks like we definitely can't trust these models to usefully solve their own puzzles!

Tags: ai, openai, generative-ai, chatgpt, llms, text-to-image, llm-release, nano-banana

この記事をシェア

Latent Space重要度42026年7月22日 12:27

AI サイバーセキュリティが最優先課題に

The Verge AI重要度42026年7月22日 06:48

OpenAI、新AIがHugging Faceを誤ってハッキング

TechCrunch AI重要度42026年7月22日 05:56

OpenAI、Hugging Face が自社未公開モデルに侵害されたと発表

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Simon Willison Blog·2026年4月22日 05:32·約4分

「ハムラジオを持つアライグマはどこ？（ChatGPT Images 2.0）」

#画像生成AI #マルチモーダルモデル #プロンプトエンジニアリング #OpenAI #Google Gemini

TL;DR

AI深層分析2026年4月22日 06:08

重要/ 5段階

深度40%

キーポイント

ChatGPT Images 2.0の公開と性能主張

OpenAIがgpt-image-2をリリースし、Sam Altman氏はGPT-3からGPT-5相当の飛躍と公式に主張している。

複雑な空間認識タスクによる実証テスト

「ウォーリーを探せ」スタイルでハムラジオを持つアライグマを探すプロンプトを用い、モデルの細部把握能力を評価した。

旧モデルと他社モデルの性能比較

gpt-image-1はアライグマを検出できず、GeminiのNano Banana 2は明確に描き分け、競合モデルの実力を示した。

LLMによる生成画像の分析活用

Claude Opus 4.7がGeminiの出力を評価し、細部の配置やコールサインなどの意図的な設計が指摘された。

重要な引用

OpenAI released ChatGPT Images 2.0 today, their latest image generation model.

Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5.

Yes — there's at least one raccoon in the picture, but it's very well hidden

Honestly, this one wasn't really hiding — he's the star of the booth.

影響分析・編集コメントを表示

影響分析

編集コメント

私のプロンプト (prompt) ：

gpt-image-1

まずベースラインとして、ChatGPTから直接取得した古いgpt-image-1の出力結果は以下の通りです：

Nano Banana 2 and Pro

次に、GoogleのNano Banana 2をGemini経由で試しました：

これはかなり明らかなもので、アライグマは画像中央の「Amateur Radio Club」ブースにいます！

Claudeは次のように述べていました：

また、Nano Banana ProをAI Studioで試したところ、これまでにどのモデルよりも最悪の結果が得られました。ここで何が間違えたのかは不明です！

gpt-image-2

ベースラインが確立されたので、新しいモデルを試してみましょう。

以下が実行コマンドです：

OPENAI_API_KEY="$(llm keys get openai)" \

uv run https://tools.simonwillison.net/python/openai_image.py \

-m gpt-image-2 \

"Do a where's Waldo style image but it's where is the raccoon holding a ham radio"

以下が返ってきた結果です。アライグマがいるとは*思いません*。私は見つけられませんでしたし、Claudeも同様でした。

OpenAI image generation cookbookが、gpt-image-2に関するノート（出力品質設定 (outputQuality) と利用可能なサイズを含む）で更新されました。

OPENAI_API_KEY="$(llm keys get openai)" \

uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \

-m gpt-image-2 "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \

--quality high --size 3840x2160

かなり素晴らしい結果です！アライグマがハムラジオ (ham radio) を持っているのが写っています（左下、かなり見つけやすいです）。

この画像の生成には13,342出力トークン (output tokens) を使用しており、単価は100万トークンあたり30ドルなので、総コストは約40セントとなりました。

Takeaways

この新しいChatGPT画像生成モデルは、少なくとも現時点ではGeminiから王座を奪ったと思います。

Update: asking models to solve this is risky

image

どうやら、これらのモデルに自らのパズルを有用に解かせることは絶対に信頼できないようです！

Tags: ai, openai, generative-ai, chatgpt, llms, text-to-image, llm-release, nano-banana

原文を表示

My prompt:

Do a where's Waldo style image but it's where is the raccoon holding a ham radio

gpt-image-1

First as a baseline here's what I got from the older gpt-image-1 using ChatGPT directly:

I wasn't able to spot the raccoon - I quickly realized that testing image generation models on Where's Waldo style images (Where's Wally in the UK) can be pretty frustrating!

Yes — there's at least one raccoon in the picture, but it's very well hidden. In my careful sweep through zoomed-in sections, honestly, I couldn't definitively spot a raccoon holding a ham radio. [...]

Nano Banana 2 and Pro

Next I tried Google's Nano Banana 2, via Gemini:

That one was pretty obvious, the raccoon is in the "Amateur Radio Club" booth in the center of the image!

Claude said:

Honestly, this one wasn't really hiding — he's the star of the booth. Feels like the illustrator took pity on us after that last impossible scene. The little "W6HAM" callsign pun on the booth sign is a nice touch too.

I also tried Nano Banana Pro in AI Studio and got this, by far the worst result from any model. Not sure what went wrong here!

gpt-image-2

With the baseline established, let's try out the new model.

Here's how I ran that:

code

OPENAI_API_KEY="$(llm keys get openai)" \
  uv run https://tools.simonwillison.net/python/openai_image.py \
  -m gpt-image-2 \
  "Do a where's Waldo style image but it's where is the raccoon holding a ham radio"

Here's what I got back. I don't *think* there's a raccoon in there - I couldn't spot one, and neither could Claude.

The OpenAI image generation cookbook has been updated with notes on gpt-image-2, including the outputQuality setting and available sizes.

I tried setting outputQuality to high and the dimensions to 3840x2160 - I believe that's the maximum - and got this - a 17MB PNG which I converted to a 5MB WEBP:

code

OPENAI_API_KEY="$(llm keys get openai)" \
  uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
  -m gpt-image-2 "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
  --quality high --size 3840x2160

That's pretty great! There's a raccoon with a ham radio in there (bottom left, quite easy to spot).

The image used 13,342 output tokens, which are charged at $30/million so a total cost of around 40 cents.

Takeaways

I think this new ChatGPT image generation model takes the crown from Gemini, at least for the moment.

Update: asking models to solve this is risky

rizaco on Hacker News asked ChatGPT to draw a red circle around the raccoon in one of the images in which I had failed to find one. Here's an animated mix of their result and the original image:

Looks like we definitely can't trust these models to usefully solve their own puzzles!

Tags: ai, openai, generative-ai, chatgpt, llms, text-to-image, llm-release, nano-banana

この記事をシェア

Latent Space重要度42026年7月22日 12:27

AI サイバーセキュリティが最優先課題に

The Verge AI重要度42026年7月22日 06:48

OpenAI、新AIがHugging Faceを誤ってハッキング

TechCrunch AI重要度42026年7月22日 05:56

OpenAI、Hugging Face が自社未公開モデルに侵害されたと発表

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

「ハムラジオを持つアライグマはどこ？（ChatGPT Images 2.0）」

キーポイント

重要な引用

影響分析

編集コメント

gpt-image-1

Nano Banana 2 and Pro

gpt-image-2

Takeaways

Update: asking models to solve this is risky

gpt-image-1

Nano Banana 2 and Pro

gpt-image-2

Takeaways

Update: asking models to solve this is risky

関連記事

「ハムラジオを持つアライグマはどこ？（ChatGPT Images 2.0）」

キーポイント

重要な引用

影響分析

編集コメント

gpt-image-1

Nano Banana 2 and Pro

gpt-image-2

Takeaways

Update: asking models to solve this is risky

gpt-image-1

Nano Banana 2 and Pro

gpt-image-2

Takeaways

Update: asking models to solve this is risky

関連記事