The Verge AI·2026年4月30日 22:42·約3分

OpenAI、モデルの「ゴブリン」発言禁止問題について言及

#LLM #OpenAI #モデルの挙動 #透明性 #安全対策

TL;DR

OpenAI は、Wired の報道により明らかになったコーディングモデルにおける「ゴブリン」などの生物に関する言及を禁止する奇妙な習慣について、その背景と対策を公式に説明した。

AI深層分析2026年4月30日 23:06

注目/ 5段階

深度40%

キーポイント

奇妙な禁止事項の発覚

Wired の調査により、OpenAI のコーディングモデルが「ゴブリン、グレイムリン、アライグマ、トロール」などの生物や動物に関する言及を避けるよう指示されていることが明らかになった。

OpenAI の公式見解

OpenAI はこれらの参照を「奇妙な習慣」と呼び、モデルが学習データや微調整プロセスの中で偶然に形成した行動パターンであると説明している。

透明性の向上への取り組み

今回の騒動を受け、同社は内部の制限事項やモデルの挙動についてよりオープンな対話を行う姿勢を示し、ユーザーとの信頼構築を図っている。

影響分析・編集コメントを表示

影響分析

このニュースは、大規模言語モデルの内部挙動がブラックボックス化していることへの懸念を再燃させ、開発企業の透明性に対するユーザーの要求が高まっていることを示しています。また、AI が学習データからどのような偏見や奇妙なルールを無意識に習得するかという技術的な課題が、実際の運用リスクとして浮き彫りになりました。

編集コメント

一見するとファンタジー要素に関する些細な話題に思えますが、これは AI モデルの学習プロセスにおける予期せぬバイアスや制限がどのように形成されるかを示す重要な事例です。開発側が「なぜそうなるのか」を説明する透明性の欠如が、信頼失墜のリスクとなり得ることを示唆しています。

GPT-5.1 の「おたく」人格のリリースをきっかけに、ゴブリンやグレムリンへの言及が急増し、その後他のモデルにも広がりました。

by Emma Roth

2026 年 4 月 30 日午後 10:42 GMT+9

image

画像: The Verge

Emma Roth

ストリーミング戦争、消費者向けテクノロジー、暗号資産（クリプト）、ソーシャルメディアなどを担当するニュースライターです。以前は MUO でライターおよび編集者を務めていました。

OpenAI は自社の「ゴブリン問題」について語り始めました。Wired による報道 [1] が、OpenAI のコーディングモデルに対して「ゴブリン、グレムリン、アライグマ、トロール、オーガ、ハト、その他の動物や生物については決して話してはならない」という指示が組み込まれていることを明らかにした直後、同 AI スタートアップは自社のウェブサイトに解説を掲載し、これらの生物への言及はモデルの学習結果として生じた「奇妙な習慣」であると説明しました。

[1] https://www.wired.com/story/openai-really-wants-codex-to-shut-up-about-goblins/

[2] https://openai.com/index/where-the-goblins-came-from/

ブログ記事で概説されている通り、OpenAI は GPT-5.1 モデルから [「Nerdy」の人格オプションを使用する際] に、ゴブリンや他の生物を参照する比喩に気づき始めました。OpenAI によると、この問題は後続のモデルリリースでも悪化し続け、最終的に強化学習（Reinforcement Learning）が「Nerdy」の人格を持つモデルにおいて、奇抜な比喩に対して報酬を与えていることが判明しました。その結果、新しいモデルはそれを学習対象としてしまいました。

**この報酬は「Nerdy」条件でのみ適用されましたが、強化学習では、学習された行動が生成された条件にきれいに限定されて維持されるとは限りません。一度スタイルの癖（Style Tic）が報酬で強化されると、後のトレーニングによってそれが他の場所へ広まったり、さらに強化されたりする可能性があります。特に、その出力が教師あり微調整（Supervised Fine-tuning）や選好データ（Preference Data）で再利用される場合、その傾向は顕著になります。

OpenAI が「Nerdy」という人格を3月に廃止した後も、ゴブリンやグレムリンへの言及は完全に消えたわけではありませんでした。GPT-5.5 が Codex コーディングツール内に登場した際も、同社は「根本原因」を発見する前にモデルのトレーニングを開始していたためです。その結果、会社は Codex に対して、これらの神話上の生物について言及しないよう非常に具体的な指示を出す必要がありました。しかし、AI にゴブリンを少し混ぜてコードを書きたいとお考えの場合は、OpenAI はその指示を元に戻す方法を共有しています。

この記事のトピックや著者をフォローして、パーソナライズされたホームページフィードで類似の記事をもっとご覧いただき、メールでの更新通知を受け取ってください。

エマ・ロス

The Verge Daily

最も重要なニュースを毎日お届けする無料ダイジェスト。

メールアドレス（必須）

原文を表示

References to goblins and gremlins spiked with the release of GPT-5.1’s ‘Nerdy’ personality, and then spread to other models.

by Emma Roth

Apr 30, 2026, 10:42 PM GMT+9

Vector illustration of the Chat GPT logo.

Image: The Verge

Emma Roth

is a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO.

OpenAI is opening up about its goblin problem. After a report from Wired revealed instructions to OpenAI’s coding model to “never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures,” the AI startup published an explanation on its website, calling references to the creatures a “strange habit” its models developed as a result of their training.

As outlined in the blog post, OpenAI began noticing metaphors referencing goblins and other creatures starting with its GPT-5.1 model — specifically when using the “Nerdy” personality option. OpenAI says the problem continued to worsen with subsequent model releases, until it found that its reinforcement training rewarded the quirky metaphors with the Nerdy personality, which newer models were training on.

The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them. Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.

Though references to goblins and gremlins dropped off after OpenAI discontinued the Nerdy personality in March, they didn’t disappear completely with GPT-5.5 inside its Codex coding tool, as OpenAI started training the model before finding the “root cause.” The company had to give Codex very specific instructions not to talk about the mythological creatures as a result. But if you’d prefer to have your AI code with some goblin sprinkled in, OpenAI has shared a way to reverse its instructions.

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.

Emma Roth

The Verge Daily

A free daily digest of the news that matters most.

Email (required)

この記事をシェア

TLDR AI重要度42026年5月1日 09:00

空間生物学における新 Frontier モデルは高速化されたが信頼性は向上せず

The Register AI/ML重要度42026年5月1日 20:42

OpenAI、GPT-5.5-Cyber を限定公開へ、Anthropic の手法を批判しながらも同様の制限を実施

TechCrunch AI重要度42026年6月26日 08:34

ホワイトハウス、安全性の懸念から OpenAI の新モデルリリースを徐々に行うよう要請

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

The Verge AI·2026年4月30日 22:42·約3分

OpenAI、モデルの「ゴブリン」発言禁止問題について言及

#LLM #OpenAI #モデルの挙動 #透明性 #安全対策

TL;DR

AI深層分析2026年4月30日 23:06

注目/ 5段階

深度40%

キーポイント

奇妙な禁止事項の発覚

OpenAI の公式見解

OpenAI はこれらの参照を「奇妙な習慣」と呼び、モデルが学習データや微調整プロセスの中で偶然に形成した行動パターンであると説明している。

透明性の向上への取り組み

今回の騒動を受け、同社は内部の制限事項やモデルの挙動についてよりオープンな対話を行う姿勢を示し、ユーザーとの信頼構築を図っている。

影響分析・編集コメントを表示

影響分析

編集コメント

GPT-5.1 の「おたく」人格のリリースをきっかけに、ゴブリンやグレムリンへの言及が急増し、その後他のモデルにも広がりました。

by Emma Roth

2026 年 4 月 30 日午後 10:42 GMT+9

image

画像: The Verge

Emma Roth

[1] https://www.wired.com/story/openai-really-wants-codex-to-shut-up-about-goblins/

[2] https://openai.com/index/where-the-goblins-came-from/

エマ・ロス

The Verge Daily

最も重要なニュースを毎日お届けする無料ダイジェスト。

メールアドレス（必須）

原文を表示

References to goblins and gremlins spiked with the release of GPT-5.1’s ‘Nerdy’ personality, and then spread to other models.

by Emma Roth

Apr 30, 2026, 10:42 PM GMT+9

Vector illustration of the Chat GPT logo.

Image: The Verge

Emma Roth

is a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO.

The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them. Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.

Emma Roth

The Verge Daily

A free daily digest of the news that matters most.

Email (required)

この記事をシェア

TLDR AI重要度42026年5月1日 09:00

空間生物学における新 Frontier モデルは高速化されたが信頼性は向上せず

The Register AI/ML重要度42026年5月1日 20:42

OpenAI、GPT-5.5-Cyber を限定公開へ、Anthropic の手法を批判しながらも同様の制限を実施

TechCrunch AI重要度42026年6月26日 08:34

ホワイトハウス、安全性の懸念から OpenAI の新モデルリリースを徐々に行うよう要請

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

OpenAI、モデルの「ゴブリン」発言禁止問題について言及

キーポイント

影響分析

編集コメント

The Verge Daily

The Verge Daily

関連記事

OpenAI、モデルの「ゴブリン」発言禁止問題について言及

キーポイント

影響分析

編集コメント

The Verge Daily

The Verge Daily

関連記事