Stability AIの年次インテグリティ透明性報告書
Stability AI は、2024 年 4 月から 2025 年 4 月までの期間における動画・画像・3D・音声モデルの安全性対策と透明性報告を発表し、悪用防止や CSAM 対策などの具体的な取り組みを明らかにした。
キーポイント
Safety by Design の徹底
データ選定、リスク評価、生成制限、利用規約の執行という 3 つの柱で有害コンテンツ防止に取り組んでおり、設計段階から安全性を重視している。
CSAM 対策と法執行機関連携
児童性的搾取物(CSAM)を検知した場合、米国 NCMEC の CyberTipline を通じて即座に報告し、適切な法執行機関へ引き継ぐ方針を明言している。
データソースの透明化
基盤モデルの開発に用いるデータが「インターネット上の公開データ」「第三者パートナーとの共有データ」「研究者による合成データ」の 3 つから構成されていることを開示した。
トレーニングデータのソースとフィルタリング
公開データ、パートナーからのデータ、合成データを基に開発しており、有害コンテンツや有料壁の背後にあるサイトから収集していない。NSFW分類器とCSAMハッシュリストを使用してデータをフィルタリングし、検出された児童性的虐待画像はゼロである。
多層的な安全性対策の実施
プラットフォームAPIレベルではリアルタイムのコンテンツフィルターやCSAM検知システムを、モデルレベルでは安全LoRAやレッドチームングに基づく微調整を適用し、有害生成を防いでいる。
外部機関との連携によるレッドチームング
内部・外部の専門家および英国警察のOCCITと協力してモデルを検証しており、Stable Diffusion 3においてCSAM生成を試みたが成功しなかった。有害な機能が検出された場合はリリース前に安全微調整を行う。
CSAM/CSEM対策の徹底と実績
報告期間中に生成AIモデルの100%が児童性的虐待・搾取(CSAM/CSEM)関連のストレステストを実施し、その結果として問題が発見された割合は0%であった。
影響分析・編集コメントを表示
影響分析
このレポートは、生成 AI 業界における「透明性」と「説明責任」の重要性を再認識させるものであり、特に安全性(Safety)と倫理(Ethics)を最優先する開発姿勢を示す重要な事例です。規制が強化される中で、Stability AI が自社のリスク管理プロセスを可視化することで、ユーザーや政策立案者との信頼構築を図ろうとする意図が読み取れます。
編集コメント
Stability AI が公開したこのレポートは、単なる広報資料ではなく、AI の安全性確保に向けた具体的な実装プロセスと法執行機関との連携体制を明瞭に示しており、業界全体が直面する「信頼性の課題」に対する一つの回答と言えます。
タイトル: Stability AI 年次インテグリティ透明性報告書(続き 2/2)
Stability AIは、当社のAUP(利用規約)で禁止されているオンラインCSAM(児童性的虐待素材)への対策に注力しています。当社はCSAMのすべての事例をNational Center for Missing and Exploited Children(NCMEC)に報告しており、同機関はこれらの報告を適切な法執行機関に取り次ぎます。この方針を確実に実行するため、当社のAPIを通じて検出されたCSAM事例が迅速かつ正確にNCMECに報告されるよう、包括的なポリシーと厳格なトレーニングプログラムを整備しています。
インテグリティ部門の全従業員は、CSAMを識別し、直ちに報告するための重要な手順について教育を受けています。このトレーニングでは、CSAM検出に関する法的義務と、NCMECへの報告を提出する正確な手順を網羅しています。NCMECとの緊密な連携を通じて、当社は児童搾取に対する世界的な取り組みに積極的に貢献しています。
報告期間における当社のNCMEC関連指標は以下の通りです:
Stability AIからNCMECに送信された報告の総数:13
注:同一ユーザーに対して複数の報告が提出される場合があります(例:複数の画像アップロード試行が検出された場合)。
ユーザー報告
誰でも、当社プラットフォーム上で発生している可能性のある不正利用を疑い、報告し、当社の安全チームにフィードバックを提供できます。
報告期間中、CSAMおよびCSEM(児童性的搾取素材)の生成に関連するモデル違反について、Stability AIに提出されたユーザー報告はありませんでした。
協力関係
当社は、不正利用を防止するため、業界内外および政府機関との主導的な協力関係を構築しています。これには以下が含まれます:
- 2024年4月、当社はSafety by Designの原則に基づき生成AI(Gen AI)の児童安全へのコミットメントを実施するため、ThornおよびAll Tech Is Humanへの参加を約束しました。
- 2024年7月、当社はAI生成されたオンライン児童性的虐待画像への対策として、Internet Watch Foundation(IWF)とのパートナーシップを発表しました。
- 2024年7月、当社はオンライン児童性的搾取・虐待対策に関する専門家の助言、リソース、機会を得るため、Tech CoalitionのPathwaysプログラムに参加しました。
今後の展望
責任あるAIの開発と展開への継続的なコミットメントの一環として、当社は自社の実践を新たな責任あるAIフレームワークに適合させるための措置を積極的に講じています。これには、内部監査の実施、リスク管理プロセスの更新、技術のスケーリングに伴う透明性・安全性・人間による監視プロトコルの改善が含まれ、これらは進化する倫理基準に対応するためのものです。当社はまた、規制の進展を注意深く監視し、自社のシステム、文書、運用実践を継続的に見直し、コンプライアンスを確保していきます。
完全な報告書は、当社のChild Safetyページでご覧いただけます。
原文を表示
Key Takeaways:
At Stability AI, we are committed to building and deploying generative AI responsibly, and we believe that transparency is foundational to safe and ethical AI.
This transparency report is part of our ongoing effort to share meaningful information about how our models are developed and released with safety-by-design principles at the forefront.
You can read the full report on our Child Safety page.
image
Purpose:
At Stability AI, we are committed to building and deploying generative AI responsibly, and we believe that transparency is foundational to safe and ethical AI. This transparency report is part of our ongoing effort to share meaningful information about how our models are developed and released with Safety by Design principles at the forefront. We want to provide visibility into our safety practices, including how we design, test, and monitor our AI systems. We also share how we prevent and respond to misuse. Through this report, we aim to foster accountability and build trust with users, developers, researchers, and policymakers.
Scope:
Video, Image, 3D and Audio models, also available through our Application Programming Interface (API).
Time Period:
April 2024 - April 2025
Model Safety Approach
Stability AI is deeply committed to preventing the misuse of our technology. We take our ethical responsibilities very seriously and have implemented robust safeguards to enhance our safety standards to protect against bad actors.
Our mission to prevent harmful content starts when we are assessing datasets and conducting risk assessments prior to the release of any new model. Our approach to preventing harmful content focuses on three key areas: 1) eliminating harmful content from our training data, (2) preventing users from using our models to generate harmful content, and (3) enforcement of our Acceptable Use Policy (AUP), which prohibits harmful content.
Our policy is to report any Child Sexual Exploitation Material (CSAM) to the National Center for Missing and Exploited Children (NCMEC) via their CyberTipline who triages and disseminates these reports to appropriate law enforcement agencies.
Safety and Responsible AI Practices
Our foundational models are developed using three primary sources of data information: (1) data that is publicly available on the internet, (2) data that we partner with third parties to access, and (3) synthetic data that our researchers generate.
Our training data used for our image, video, and 3D models is derived from open datasets and from responsibly sourced and publicly available websites. Model cards are available online. We do not collect data from sources that proliferate harmful content, like the dark web or adult websites. We also do not intentionally collect data from sources that are behind paywalls.
We use not-suitable-for-work (NSFW) classifiers built in-house and open source classifiers to filter training data. We have also run industry CSAM hashlists from Thorn's Safer and from the Internet Watch Foundation (IWF) across a subset of our current training data, and have not detected any CSAM to date.
Here are our training data metrics for the reporting period:
The number of instances of CSAM and CSEM detected in our training datasets: 0%
Model and Platform API Safety
With respect to our efforts to ensure that our models do not generate harmful content, we apply multiple layers of mitigation, both at the platform API level and model level.
At the platform API level, we implement real-time safeguards such as content filters and classifiers to detect policy-violating inputs and outputs. We also integrate CSAM hashing systems to detect, block and report known CSAM. Together, these layered mitigations help enforce our safety policies and support responsible use of our technology.
At the model level, we use techniques such as fine-tuning and safety LoRAs, informed by insights from structured red teaming (probing the model for policy-violating or harmful outputs), prior to releasing the model.
Red Teaming
Our Integrity team assesses the model’s risks by red teaming. Red teaming is a core part of our safety evaluation process that focuses on identifying and mitigating severe risks.This involves engaging both internal and external experts to test our models for potential harms. These structured evaluations help us uncover potential risk failure modes, improve our safeguards, and inform our deployment decisions. Red teaming is an ongoing process that evolves alongside our models, allowing us to proactively address emerging risks as capabilities advance.
We have developed an approach to access CSAM/CSEM generation capabilities by red teaming using adult nudity/sexual activity prompts as indicators. We have also collaborated with Online CSEA Covert Intelligence Team (OCCIT, a UK law enforcement unit) to conduct red teaming exercises on our Stable Diffusion 3 model prior to release and no CSAM was able to be generated. If harmful capabilities are identified through our red-teaming process, the model undergoes further safety fine-turning to remove those concepts prior to any release.
Here are our red teaming metrics for the reporting period:
The percentage of generative AI models that have been stress-tested for CSAM and CSEM capabilities (leveraging prompts containing depictions of adult nudity and adult sexual activities): 100%
The percentage of generative AI models that were discovered to have CSAM and CSEM related issues, as a result of this stress-testing: 0%
Age Requirements
Consumers using any Stability AI technology to create content must first agree to the Company’s AUP. As outlined in the AUP, users must be 18 years of age or older and must agree to not use, or allow others to use, our technology to, among other things, (1) violate the law; (2) facilitate hateful or discriminatory content, exploit or harm children; or (3) deceive or mislead others, including facilitating disinformation.
Provenance
At Stability AI, we implement Coalition for Content Provenance and Authenticity (C2PA) through our API to help users and content distribution platforms identify AI-generated content. Images, video, as well as our API wav. generated audio media (which is focused on sound effects and instrument riffs, without CSEM risks) generated through our API are tagged with metadata indicating the content was produced with a generative AI tool. This metadata includes the model name and version number used to generate the content. Once generated, the metadata is digitally sealed with a cryptographic Stability AI certificate and stored within the file.
Content provenance has not been implemented during the content generation process for our openly released models. These are areas that require further work to strengthen provenance and traceability across our systems.
While we have found challenges with other types of watermarking solutions (non-C2PA) that resulted in quality degradation of image output, we are continuously exploring more effective and reliable ways to address provenance and content authenticity. As we advance our research and deployment, we remain committed to improving provenance tools that are both robust and preserve the integrity of generated content.
Content Moderation
Our Integrity team engages in content moderation that involves both automated tools and human review to evaluate or enforce suspected or attempted misuse of its products.
Automated Detection: We enforce our policies through model refusals by blocking violatory content. We have built in-house text filters and NSFW image classifiers with performance to detect prompts, images and videos that violate our policies. We focus on controls that operate at the point where a user is seeking to upload or generate an image:
We have implemented prompt filters, which apply to the textual prompts and instructions a user provides to generate an image. These filters seek to block users from creating images that would potentially violate our AUP, including CSAM.
We have developed an NSFW image classifier that flags image and video uploads that could potentially violate the AUP and blocks any generation of content.
Stability AI compares all uploaded images to a hash database of known CSAM images maintained by third-party service provider Thorn. If a user attempts to upload an image that matches, the image gets rejected.
Human Review: To allow us to monitor user activities, we have a content moderation team in-house and externally. Our content moderators review flagged prompts and images as well as a subset of non-flagged content, and apply enforcement actions as needed. When CSAM is detected in a user’s Stability AI account, we take appropriate action, including submitting a CyberTipline report to NCMEC. We may also enforce additional measures on the account, such as with warnings or disabling the account entirely. Our content moderation specialists also engage directly with business customers when downstream users attempt to misuse our product. For example, our API allows businesses to pass a unique identifier that helps them trace activity back to specific end users and take action.
Notices and Appeals: We believe in transparent communication when enforcement actions are taken. We communicate decisions to the user in writing, and also provide the user an option to appeal the decision.
NCMEC Reporting
Stability AI is dedicated to combating online CSAM, which is prohibited by our AUP. We report all instances of CSAM to the National Center for Missing and Exploited Children (NCMEC), who then forward these reports to law enforcement agencies globally. To uphold this commitment, comprehensive policies and rigorous training programs have been established to ensure that any instance of detected CSAM through our APIs is promptly and accurately reported to NCMEC.
All Integrity employees are educated on the identification of CSAM and the critical steps for its immediate reporting. This training covers the legal obligations surrounding its detection, and the precise procedures for submitting reports to NCMEC. By close collaboration with NCMEC we are actively contributing to the global fight against child exploitation.
Here are our NCMEC metrics for the reporting period:
Total number of reports sent from Stability AI to NCMEC: 13
Note: Multiple reports may be submitted for the same user such as when more than one image upload attempt was detected.
User Reporting
Anyone can report misuse that they may suspect is taking place on our platform and provide feedback to our safety team.
There has been no user reports submitted to Stability AI for CSAM and CSEM related model violations.
Collaboration
We have established leading collaborations across industry and government to prevent misuse, including:
In April 2024, we announced our commitment to join Thorn and All Tech Is Human to enact child safety commitments for Gen AI through Safety by Design.
In July 2024, we announced our partnership with the Internet Watch Foundation (IWF), to tackle the creation of AI generated child sexual abuse imagery online.
In July 2024, we joined Tech Coalition's Pathways program for expert advice, resources and opportunities to further build capacity to combat online child sexual exploitation and abuse.
Looking Ahead
As part of our ongoing commitment to responsible AI development and deployment, we are actively taking steps to align our practices with emerging responsible AI frameworks. This includes conducting internal audits, updating risk management processes, scaling our technology, and refining our transparency, safety, and human oversight protocols to meet evolving ethical standards. We are also closely monitoring regulatory developments and will continue to adapt our systems, documentation, and operational practices to ensure our compliance.
You can read the full report below as well as on our Child Safety page.
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み