有害な操作から人々を保護する
Google DeepMindは、金融や医療などの分野におけるAIによる有害な操作リスクを研究し、新たな安全対策を開発している。
キーポイント
有害操作リスクの研究
Google DeepMindがAIによる有害な操作リスクを研究しており、金融や医療などの分野で具体的な調査が行われている。
新たな安全対策の開発
研究結果に基づいて、AIの有害操作から人々を保護するための新たな安全対策が開発されている。
複数分野への影響
金融や医療など複数の重要な分野でAIの操作リスクが調査されており、幅広い社会的影響が懸念されている。
影響分析・編集コメントを表示
影響分析
この研究はAIの倫理的・安全面での課題に取り組む重要な一歩であり、AI技術の社会的受容性向上に貢献する可能性がある。特に金融や医療といったセンシティブな分野でのAI応用において、安全性確保の枠組み構築に影響を与えるだろう。
編集コメント
AIの安全性研究の進展を示す記事だが、具体的な技術内容や実装方法の詳細が不足しているため、実用性の評価は控えめとなった。
Google DeepMindは、金融や医療などの分野におけるAIの有害な操作リスクを研究し、新たな安全対策を導いています。
原文を表示
March 26, 2026 Responsibility & SafetyHelen KingAs AI models get better at holding natural conversations, we must examine how these interactions affect people and society.Building on a breadth of scientific research, today, we are releasing new findings on the potential for AI to be misused for harmful manipulation*, specifically, its ability to alter human thought and behavior in negative and deceptive ways. With this latest study, we have created the first empirically validated toolkit to measure this kind of AI manipulation in the real world, which we hope will help protect people and advance the field as a whole. We’re publicly releasing all materials necessary to run human participant studies using the same methodology. (Note: The behaviors observed during this study took place in a controlled lab setting, and do not necessarily predict real-world behaviors.)Why harmful manipulation mattersConsider two scenarios: One AI model gives you facts to make a well-informed healthcare decision that improves your well-being. Another AI model uses fear to pressure you to make an ill-informed decision that harms your health. The first educates and helps you; the second tricks and harms you.These scenarios highlight the difference between two types of persuasion in human-AI interactions (also defined in earlier research):Beneficial (rational) persuasion: Using facts and evidence to help people make choices that align with their own interestHarmful manipulation: Exploiting emotional and cognitive vulnerabilities to trick people into making harmful choicesOur latest work helps us and the wider AI community better understand the risk of AI developing capabilities for harmful manipulation and build a scalable evaluation framework to measure this complex area. To do this effectively, we simulated misuse in high-stakes environments, explicitly prompting AI to try to negatively manipulate people's beliefs and behaviours on key topics.Developing new evaluations for a complex challengeTesting the outcomes of AI harmful manipulationTesting for harmful manipulation is inherently difficult because it involves measuring subtle changes in how people think and act, varying heavily by topic, culture and context.This is what motivated our latest research, which involved conducting nine studies involving over 10,000 participants across the UK, the US, and India. We focused on high-stakes areas such as finance, where we used simulated investment scenarios to test if AI could influence how people would behave in complex decision-making environments, and health, where we tracked if AI could influence which dietary supplements people preferred. Interestingly, the AI was least effective at harmfully manipulating participants on health-related topics.Our findings show that success in one domain does not predict success in another, validating our targeted approach to testing for harmful manipulation in specific, high-stakes environments where AI could be misused.How could AI manipulate?In addition to tracking efficacy (whether the AI successfully changes minds), we also measured its propensity (how often it even tries to use manipulative tactics). We tested propensity in two scenarios: when we explicitly told the model to be manipulative, and when we didn’t.As detailed in our research, we counted manipulative tactics in experimental transcripts, confirming the AI models were most manipulative when explicitly instructed to be.Our results also suggest that certain manipulative tactics may be more likely to result in harmful outcomes, though further research is required to understand these mechanisms in detail.By measuring both efficacy and propensity, we can better understand how AI manipulation works and build more targeted mitigations.Putting research into practiceAs AI becomes a part of our everyday lives, we need to know it can’t be misused to harmfully manipulate people.Beyond this latest study, we recently introduced an exploratory Harmful Manipulation Critical Capability Level (CCL) within our Frontier Safety Framework to help us track models with capabilities which could be misused to systematically change beliefs and behaviors in direct human-AI interactions in ways which could lead to severe harm.These evaluations also serve as the foundation for how we test our models, including Gemini 3 Pro, for harmful manipulation. You can read more about this in this safety report. Like all our safety evaluations, this is an ongoing process. We will continue to refine our models and methodologies to keep pace with advancing AI.Looking aheadUnderstanding and mitigating harmful manipulation is a complex challenge. As model capabilities evolve, so too must our evaluation and mitigation techniques. For example, we’re currently exploring how to ethically evaluate the efficacy of harmful manipulation in even higher-stakes situations—like discussions involving deeply held personal beliefs—where users might be more susceptible to influence. Next, we will be expanding our research to investigate how audio, video, and image inputs as well as agentic capabilities, factor into AI manipulation.We’ll continue to share findings and iterate based on feedback from the Frontier Model Forum and academic community. Our goal is to lead collective progress to prevent harmful manipulation, advancing AI models that prioritize safety and empower people.*Notes: The scope of this particular research focuses exclusively on demonstrating general manipulation capabilities to help further the scientific study of evaluating harmful manipulation. This does not relate to testing safeguards around model outputs or manipulation in policy-violating and dangerous topics (e.g. terrorism and child safety) as this work is covered elsewhere and tested separately.You can also read more about our harmful manipulation work in this interview with our researchers and in the Gemini 3 Pro Frontier Safety Report.AcknowledgmentsCanfer Akbulut, Rasmi Elasmar, Abhishek Roy, Anthony Payne, Priyanka Suresh, Lujain Ibrahim, Seliem El-Sayed, Charvi Rastogi, Ashyana Kachra, Will Hawkins, Kristian Lum, Laura Weidinger, William Isaac, Dawn Bloxwich, Lewis Ho, Eva Lu, Jenny Brennan, Mahmoud Hassan, Mark Graham
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み