Berkeley AI Research·2024年11月12日 18:00·約8分

言語モデルのためのバーチャルペルソナ：バックストーリー集によるアプローチ

#LLM #Prompt Engineering #Agent Simulation #Social Science AI

TL;DR

Berkeley AI Research は、LLM に単なる人口統計データではなく詳細な人生背景（バックストーリー）を付与する「Anthology」という手法を発表し、より個別化された仮想人物のシミュレーションと社会科学研究への応用を実現した。

AI深層分析2026年5月3日 04:13

重要/ 5段階

深度40%

キーポイント

詳細なバックストーリーによる条件付け

従来の人口統計データ（年齢、性別など）に依存する手法を脱却し、個々の価値観や経験を含む自然な人生背景を生成・利用することで、LLM を特定の個人像として動作させる。

ステレオタイプからの脱却と個別化

単なる変数による条件付けでは生じるステレオタイプ的な応答を防ぎ、個々の人間サンプルの分布や一貫性をより忠実に再現することを可能にする。

LLM 自身によるバックストーリー生成

多様な人口統計をカバーする大規模なバックストーリーセットを効率的に作成するために、LLM 自体を用いて背景物語を生成する方法も提案されている。

詳細なバックストーリーによる個人近似

「自分について教えて」といったオープンエンドなプロンプトから生成された豊かなバックストーリーを用いることで、モデルは人口統計的特徴や文化的背景など、個人のアイデンティティの明示的・暗示的マーカーを捉えることができる。

既存手法を上回る評価結果

Wasserstein距離、相関行列のFrobeniusノルム、Cronbach's alphaのすべての指標において、提案された「Anthology」アプローチは他の条件付け手法やベースラインモデルを凌駕し、よりニュアンスに富んだ回答を引き出すことが示された。

マッチング手法の影響

最大重みマッチングよりも貪欲法（greedy matching）の方が平均Wasserstein距離で優れている傾向があり、これは最大重みマッチングが1対1の対応を厳格に要求する制約により、割り当てられる重みが低くなるためであると説明されている。

倫理的リスクと慎重な利用

バイアスの永続化やプライバシー侵害のリスクがあるため、生成された結果は注意深く解釈し使用する必要がある。

影響分析・編集コメントを表示

影響分析

この研究は、LLM を単なる情報処理ツールから、特定の個人像を忠実に模倣する「エージェント」として機能させる重要な転換点となる。特に社会科学研究や市場調査において、多様な人間サンプルを低コストかつ倫理的にシミュレーションできるため、実証実験の手法そのものを再定義する可能性を秘めている。

編集コメント

単なるプロンプトエンジニアリングの枠を超え、LLM の「人格」形成に物語性を組み込むアプローチは、次世代のアシスタントや研究支援ツールの基盤技術として極めて注目すべき進展です。

私たちはAnthologyを紹介します。これは、個人の価値観や経験の豊かな詳細を含む自然主義的なバックストーリー（背景物語）を生成・活用することで、LLMを代表的で一貫性があり多様な仮想ペルソナに条件付けする手法です。

何百万、何十億もの個性的な人間の著者たちによって共同で生み出された膨大なテキストコーパスで大規模言語モデル（LLM）が訓練されるとは、どういう意味を持つのでしょうか？

「エージェントモデルとしての言語モデル」において、最近の言語モデルはエージェントのモデルと見なすことができるという説得力のある証拠が示唆されています。つまり、テキストの文脈が与えられると、LLMはその文脈を生み出した可能性のあるエージェントの特性を表す条件付きテキストを生成することができるのです。これは、適切な条件付けによって、LLMを、通常現れる声の混合ではなく、特定の人間の声の応答に近づけるように導くことができる可能性を示しています。もし実現されれば、LLMのこの能力は、ユーザーリサーチや社会科学に大きな影響を与えるでしょう。人間の被験者の仮想ペルソナとして条件付けされた言語モデルは、費用対効果の高いパイロット研究として、また、例えば正義と慈愛に関するベルモント原則のような、人間を対象とした研究におけるベストプラクティスを支援するものとして機能し得ます。

本研究では、モデルへの条件付け文脈として個人の詳細に富んだ人生の物語を提供することで、LLMを代表的で一貫性があり多様な仮想ペルソナへと導くアプローチ、Anthologyを紹介します。その過程で、広範な人間の人口統計学的属性をカバーする大規模なセットを効率的に生成する手段として、LLM自身からバックストーリーを生成する方法も提示します。言語モデルを自然主義的なバックストーリーに根ざすことにより、AnthologyはLLMが個々の人間のサンプルを、人間の回答の分布と一貫性に合致するという点で、より高い忠実度でシミュレートすることを可能にします。

私たちのアプローチ: Anthology

個人の人生の物語による言語モデル生成の条件付け

仮想ペルソナへとLLMを導く従来の手法の大きな限界は、個々の人間のサンプルを確実に近似できないことにありました。従来のアプローチでは、例えば「私はカリフォルニア出身の25歳です。最終学歴は高校中退です」といった広範な人口統計学的情報でLLMにプロンプトをかけますが、これは本質的には人口統計学的変数のタプルから生成されたテキストの塊です。これらの手法では、個々のレベルではなく集団レベルでのみ人間のサンプルを近似することしかできず、その結果として以下が生じます：

人口統計学的変数（例：人種、性別）のみで条件付けられているため、LLMがステレオタイプ的および／または典型的な描写に陥りやすい回答

共分散や統計的有意性といった重要な関心指標を提供できない（そのような計算には個々の回答が必要なため）

Anthologyは、詳細に富んだバックストーリーによる条件付けによって、個々の被験者を近似することを可能にします。これらのバックストーリーを通じて、モデルは人口統計学的特性や、文化的・社会経済的背景、人生哲学への自発的な言及など、個人のアイデンティティの暗黙的および明示的なマーカーを捉えます。私たちのアプローチは、「自分自身について教えてください」といった制限のない自由回答形式のプロンプトで問い合わせた言語モデルを通じて、広範な人口統計学的属性を表す膨大なバックストーリーのセットを生成し、その後、各バックストーリーで条件付けられた仮想ペルソナを現実世界の調査サンプルとマッチングさせることを含みます。

結果: 世論調査のより近い近似

評価のために、3つのピュー・リサーチ・センターATP調査（第34、92、99回）を近似するという文脈において、仮想ペルソナを条件付ける異なる手法の有効性を比較します。

ピュー・リサーチ・センターATP調査の人間の回答を近似する結果。太字および下線付きの結果は、それぞれ人間の値に最も近い値、および2番目に近い値を示す。

仮想ペルソナによる人間のサンプル近似の成功度の指標として、以下の指標を考慮します：

代表性の指標としての回答分布間の平均ワッサースタイン距離（WD）

一貫性の指標としての相関行列間のフロベニウスノルム（Fro.）

内的整合性の追加指標としてのクロンバックのα係数

仮想被験者を分析する前に、各評価指標の下限を推定します。そのために、人間の集団をランダムに2つの等サイズのグループに繰り返し分割し、サブグループ間でこれらの指標を計算します。100回の反復から得られた平均値を、下限推定値を表すものとします。

Llama-3-70BとMixtral-8x22Bの両方において、Anthologyが全ての指標に関して他の条件付け手法を一貫して上回ることを観察しています。2つのマッチング手法を比較すると、貪欲マッチング法は全ての調査回（Waves）で平均ワッサースタイン距離においてより良い性能を示す傾向があります。マッチング手法間の差異は、最大重みマッチングの一対一対応条件と、利用可能な仮想ユーザーの限られた数に起因すると考えられます。具体的には、最大重みマッチングでマッチされた仮想被験者に割り当てられる重みは、貪欲マッチングのものよりも必然的に低くなります。なぜなら後者は一対一対応の制約を緩和するからです。この不一致は、貪欲マッチングによるものと比較して、マッチされた人間と仮想ユーザー間の人口統計学的類似性が低くなる結果をもたらす可能性があります。これらの結果は、私たちのアプローチで生成されたバックストーリーの豊かさが、ベースラインと比較してよりニュアンスに富んだ回答を引き出していることを示唆しています。

Anthologyは、LLMにおける仮想ペルソナの条件付けにおいて、スケーラブルで、時には従来の人間を対象とした調査に代わる倫理的な選択肢を提供することで、ユーザーリサーチ、世論調査、その他の社会科学応用の実施方法を再形成する可能性を秘めた、有望な新たな方向性を示しています。しかし、社会科学における言語モデルの他のあらゆる応用と同様に、Anthologyの使用もいくつかの考慮事項を前面に押し出します：生成されたバックストーリーはより代表的なペルソナの作成に役立ちますが、バイアスを永続させるリスクやプライバシーを侵害するリスクが残っているため、結果は注意して使用・解釈されるべきです。

今後のステップとしては、私たちのアプローチが、より広範で多様なバックストーリーのセット、すなわちそれぞれが個人の一貫した人生の物語を表すものから恩恵を受けることを想定しています。さらに、この研究の貴重な拡張として、自由回答形式の生成を考慮することが挙げられます。これにより、多肢選択式のような構造化された調査形式を超えて、より自然でニュアンスに富んだペルソナシミュレーションが可能になるでしょう。最後に、行動研究にLLMを応用する次のエキサイティングな次元は、長期的な効果のシミュレーションを含むものであり、仮想ペルソナが時間の経過に伴う変化をモデル化し、回顧的に検討することを可能にするでしょう。

これらの方向性はすべて、多くの技術的課題を提示しています。もし私たちの研究に協力いただける、またはさらに議論したいという方がいらっしゃいましたら、ぜひお知らせください！

私たちの研究について詳しくは：論文全文へのリンク

@article{moon2024virtual, title={Virtual personas for language models via an anthology of backstories}, author={Moon, Suhong and Abdulhai, Marwa and Kang, Minwoo and Suh, Joseph and Soedarmadji, Widyadewi and Behar, Eran Kohen and Chan, David M}, journal={arXiv preprint arXiv:2407.06576}, year={2024} }

原文を表示

We introduce Anthology, a method for conditioning LLMs to representative, consistent, and diverse virtual personas by generating and utilizing naturalistic backstories with rich details of individual values and experience.

What does it mean for large language models (LLMs) to be trained on massive text corpora, collectively produced by millions and billions of distinctive human authors?

In “Language Models as Agent Models”, compelling evidence suggests that recent language models could be considered models of agents: provided with a textual context, LLMs are capable of generating conditional text that represents the characteristics of an agent likely to have produced that context. This suggests that, with appropriate conditioning, LLMs could be guided to approximate the responses of a particular human voice, rather than the mixture of voices that otherwise emerges. If realized, this capability of LLMs would have significant implications for user research and social sciences—conditioned language models as virtual personas of human subjects could serve as cost-effective pilot studies and supporting best practices in human studies, e.g. the Belmont principles of justice and beneficence.

In this work, we introduce Anthology, an approach for steering LLMs to representative, consistent, and diverse virtual personas by providing richly detailed life narratives of individuals as conditioning context to models. In doing so, we also present methods to generate backstories from LLMs themselves as a means to efficiently produce massive sets covering a wide range of human demographics. By grounding language models in naturalistic backstories, Anthology allows LLMs to simulate individual human samples with increased fidelity, measured in terms of matching the distributions and consistencies of human responses.

Our Approach: Anthology

Conditioning Language Model Generation with Individual Life Narratives

A significant limitation of earlier methods in steering LLMs to virtual personas has been the inability to reliably approximate individual human samples. Prior approaches prompt LLMs with broad demographic information, e.g., “I am a 25-year-old from California. My highest level of education is less than high school,” which are essentially bodies of text generated from a tuple of demographic variables. With these methods, we are only able to approximate human samples at a population level, not at the individual level, which results in:

Responses prone to LLMs defaulting to stereotypical and/or prototypical portrayals, as they are only conditioned on demographic variables (e.g., race and gender)

Inability to provide important metrics of interest such as covariance and statistical significance, as individual responses are required for such compuatations

Anthology enables the approximation of individual subjects by conditioning with richly detailed backstories. Through these backstories, the model captures implicit and explicit markers of personal identity, including demographic traits and spontaneous references to cultural, socioeconomic backgrounds, and life philosophies. Our approach involves generating a vast set of backstories representing a wide range of demographic attributes via language models queried with unrestricted, open-ended prompts such as, “Tell me about yourself.” We then match virtual personas conditioned by each backstory to real-world survey samples.

Results: Closer Approximation of Public Opinion Polls

For evaluation, we compare the effectiveness of different methods for conditioning virtual personas in the context of approximating three Pew Research Center ATP surveys: Waves 34, 92, and 99.

Results on approximating human responses for Pew Research Center ATP surveys. Boldface and underlined results indicate values closest and the second closest to those of humans, respectively.

As measures of success in approximating human samples with virtual personas, we consider the following metrics:

Average Wasserstein distance (WD) between response distributions as a measure of representativeness

Frobenius norm (Fro.) between correlation matrices as a measure of consistency

Cronbach’s alpha as an additional measure of internal consistency

Prior to analyzing virtual subjects, we estimate the lower bounds of each evaluation metric by repeatedly dividing the human population into two equal-sized groups at random and calculating these metrics between the subgroups. We take averaged values from 100 iterations to represent the lower-bound estimates.

We consistently observe that Anthology outperforms other conditioning methods with respect to all metrics, for both the Llama-3-70B and the Mixtral-8x22B. When comparing two matching methods, the greedy matching method tends to show better performance on the average Wasserstein distance across all Waves. We attribute differences in matching methods to the one-to-one correspondence condition of maximum weight matching and the limited number of virtual users available. Specifically, the weights assigned to matched virtual subjects in maximum weight matching are inevitably lower than those in greedy matching, as the latter relaxes the constraints on one-to-one correspondence. This discrepancy can result in a lower demographic similarity between matched human and virtual users compared to the counterpart from greedy matching. These results suggest that the richness of the generated backstories in our approach elicits more nuanced responses compared to baselines.

Anthology marks a promising new direction in conditioning virtual personas in LLMs that could potentially reshape how we conduct user research, public opinion surveys, and other social science applications by offering a scalable, and at times, ethical alternative to traditional human surveys. However, the use of Anthology, as in any other application of language models in the social sciences, also brings several considerations to the forefront: although the generated backstories help create more representative personas, there remains a risk of perpetuating biases or infringing on privacy, so results should be used and interpreted with caution.

In terms of future steps, we envision our approach benefiting from a more expansive and diverse set of backstories, each representing a consistent life narrative of individuals. Additionally, a valuable extension of the work would be to consider free-form response generation, enabling more natural and nuanced persona simulations beyond structured survey formats such as multiple-choice. Finally, an exciting next dimension in applying LLMs in behavioral studies would involve simulating longer-term effects, allowing virtual personas to model and retrospectively examine changes over time.

All of these directions present multitudes of technical challenges; please let us know if you are interested in collaborating or want to discuss our work further!

Learn more about our work: link to full paper

この記事をシェア

Simon Willison Blog重要度42026年7月4日 03:51

Fable の判断力を活用する重要性について

Simon Willison Blog重要度42026年7月3日 03:25

DSPy を用いた Datasette Agent の SQL システムプロンプトの評価と改善

LY Corp Tech Blog2026年7月2日 11:40

生成AIの利活用事例に関するLT会を開催しました！ Hacking Fest 2026 Spring 開催レポート

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む