Amazon Science·2026年4月30日 02:59·約16分

AI 学習データのプライバシー保護

#LLM #Federated Learning #Differential Privacy #Data Privacy #HIPAA

TL;DR

Amazon Science は、大規模言語モデルや連合学習における訓練データ漏洩の深刻なリスクを分析し、差分プライバシーと安全多計算による対策の必要性を説いている。

AI深層分析2026年5月5日 05:04

重要/ 5段階

深度40%

キーポイント

単一モデルへの攻撃リスク

敵対的な推論クエリを通じて、訓練データ自体が資産である医療や金融などの専門モデルから個人情報を抽出する攻撃が可能であることを示している。

連合学習（FL）の脆弱性

中央サーバーが更新を統合する際や、参加者間でグローバルモデルを共有する際に、訓練データが再構築されるリスクがあり、プライバシー保護の約束が崩壊する。

実証済みの事例と規制違反

Google DeepMind の GPT-3.5-turbo による訓練データの再生産実験や、HIPAA/GDPR 違反のリスクなど、現実的な脅威がすでに存在することを指摘している。

技術的解決策の提示

差分プライバシーと安全多計算（Secure Multiparty Computation）を組み合わせることで、上記の攻撃シナリオに対する防御が可能であることを示唆している。

影響分析・編集コメントを表示

影響分析

この記事は、AI の普及に伴い深刻化している「訓練データのプライバシー侵害」問題の現実性と緊急性を浮き彫りにしており、企業や医療機関が AI を導入・運用する際の必須コンプライアンス要件として捉えるべき内容です。単なる理論的なリスク提示に留まらず、具体的な攻撃シナリオと実装可能な防御技術（差分プライバシー等）を提示している点で、業界全体のセキュリティ基準引き上げに寄与する重要な指針となります。

編集コメント

AI の利活用が進む中で、データ保護の重要性が再認識されるべきタイミングです。Amazon Science が示すように、プライバシー保護技術はもはやオプションではなく、責任ある AI 開発の前提条件となっています。

大規模言語モデルは、現在使用されている最も注目度の高い機械学習（ML）モデルですが、膨大な量の公開データを用いて訓練されています。しかし、多くの ML モデルは、より小規模で独自性の高いデータセット上で訓練されており、これらは非常に機密性が高く、非公開に保つ必要があります。例としては、患者の放射線画像を基に診断モデルを微調整する病院、取引履歴に基づいて不正検出器を訓練する銀行、臨床試験記録から薬物相互作用モデルを構築する製薬会社などが挙げられます。いずれの場合も、訓練データ自体が保護すべき資産ですが、これらのモデルに対する巧妙な攻撃によって、その背後にある訓練データに関する情報が抽出される可能性があります。このような攻撃は、攻撃者が単一のデータ所有者によって訓練されたモデルに対して敵対的な推論クエリを送信することに制限されている場合に可能です。あるいは、複数のデータ所有者が連合学習（FL）を通じてモデルを共同で訓練する場合、つまり中央サーバーが生産したモデル更新を統合してグローバルモデルを生成する際（生データを集約するのではなく）、敵対的なサーバーがモデル更新から訓練データを再構築できる攻撃が存在します。3 つの病院が患者記録を集約することなく共有のがんスクリーニングモデルを共同で訓練することを考えてみましょう。もし集約サーバーが 1 つの病院の訓練画像を再構築できれば、連合学習によるプライバシー保証は崩壊し、各病院の患者同意契約への準拠も破られます。最後に、敵対的な FL パーティシパントが、グローバルモデルから誠実な参加者のプライベートな訓練データを再構築できる可能性さえあります。これらのリスクは仮説的なものではありません。Google DeepMind の 2023 年の論文では、GPT-3.5-turbo が個人を特定可能な情報を含む訓練データをそのまま再生するようプロンプトされる可能性があることが示されました。集中した機密データセット上で訓練された小規模なドメイン固有モデルは、さらに脆弱です。組織が敏感な金融記録、患者の健康データ、独自性の高いビジネスインテリジェンスを用いてモデルを訓練することが増えるにつれ、攻撃対象領域も比例して拡大します。医療モデルに対する成功した攻撃は、特定の患者の記録が訓練に使用されたかどうかを明らかにし、米国健康保険携行性と責任法（HIPAA）や EU の一般データ保護規則（GDPR）などの規制違反となります。連合学習システムに対する攻撃は、本来ソースから決して離れるべきではない生訓練サンプルを再構築する可能性があります。プライベートなデータ上で訓練を行うあらゆる組織にとって、これらの脅威を理解し緩和することはもはや任意のものではなく、責任ある AI 導入のために不可欠です。本稿では、3 つの段階的にエスカレートする攻撃シナリオ、すなわち単一モデルに対するメンバーシップ推論、連合学習勾配からのデータ再構築、共有グローバルモデルからの訓練データ抽出について解説します。また、差分プライバシーと安全多者計算がそれぞれどのようにこれらの攻撃を無力化するかを示します。

モデル推論への攻撃

クエリアクセス権を持つ誰でも、特定のレコードがモデルの訓練に使用されたかどうかを決定できる可能性があります。この攻撃はメンバーシップ推論として知られています。ある病院が診断モデルを参照医向けの API としてデプロイしたと想像してください。悪意のあるアクターがこの API をプローブして、特定の患者の記録が訓練データに含まれていたかどうかを特定しようとします。これにより、その患者がその病院で治療を受けたことが確認され、医療歴の詳細も明らかになります。2023 年のニューラル情報処理システム学会（NeurIPS）での論文において、Amazon Web Services の研究者たちはこれが実際にどのように機能するかを示しました。訓練済みのモデルは、通常、訓練された入力に対してより高い信頼度の予測を生成する傾向があり、これは攻撃者が悪用できる過学習の一種です。攻撃者はまず、モデルの訓練データの分布を近似するデータセットを生成し、そのサンプルに対するモデルの信頼度スコアを記録します。これらのスコアをラベルとして使用して、攻撃者は訓練データと非訓練データを分離する信頼度スコアの閾値を学習するプロキシモデルを訓練します。候補となるレコードが与えられた場合、攻撃者はプロキシモデルを評価して閾値を取得し、ターゲットモデルにクエリを送信します。もしターゲットモデルの信頼度スコアがこの閾値を超えた場合、そのレコードは訓練セットに含まれていた可能性が高いです。著者らはこの手法を ImageNet-1k で訓練された ResNet-50 モデルに対して実証しました：攻撃者が訓練データとしてフラグ付けしたレコードの 97% が実際に訓練データでした。

差分プライバシーによる緩和

このようなメンバーシップ推論攻撃を、差分プライバシー（DP）を用いてどのように緩和するかを示します。差分プライバシーは、任意の入力が結果に与える影響を制限しながら集計統計量（例えば平均値など）を計算するための数学的枠組みです。核心的なアイデアは、データセットから 1 つのレコードを追加または削除しても関数の出力の分布がほとんど変化しないように関数をランダム化できれば、攻撃者がそのレコードが含まれていたかどうかを確信を持って決定できなくなるという点です。形式的には、ランダム化された関数が微分プライバシーを持つとは、入力データセットに 1 つのレコードを追加または削除した場合でも、任意の特定の出力が生じる確率が最大で e^ε の倍率だけ変化するという条件を満たすことを意味します（ここで e は自然対数の底、ε はプライバシーバジェットです）。より小さな ε はより厳格なプライバシーを意味しますが、計算におけるノイズも増大し、その逆もまた然りです。NIST のガイダンスでは ε < 1 が一般的に十分な低リスクのプライバシーを確保すると示唆していますが、多くの実世界での展開は 1 から 10 の間で運用され、状況に応じたプライバシー結果をもたらします。経験的研究によると、メンバーシップ推論のような攻撃に対して意味のあるデータプライバシーを提供するには ε が最大 3 程度でも十分であることが示されていますが、DP の有効な保証に対する理解はこうした攻撃に対しても進化し続けています。

差分プライバシーは、モデルの訓練データに対する信頼度と未見のデータとの間のギャップに依存するメンバーシップ推論攻撃を無力化します。DP は、特定のレコードが含まれていたかどうかに関わらずモデルがほぼ同じパラメータを学習したはずであることを保証することで、そのギャップを狭めます。

このアプローチはどのように機械学習に応用できるのでしょうか？ニューラルネットワークは確率的勾配降下法（SGD）を用いて訓練されます。これは、モデルの出力とサンプルの目標出力との差がモデルを通じて逆伝播され、モデルパラメータがこの差を減らすように調整されるプロセスです。このサンプルに対応する調整分を勾配と呼びます。実際には、モデルパラメータは通常、バッチ勾配に基づいて調整されます。これは、一連のサンプルに対するサンプル固有の勾配の平均値です。2016 年の画期的な論文において、Google の研究者たちは DP-SGD を導入しました。これは訓練中に各バッチ勾配に較正されたガウスノイズを追加する手法です。私たちは DP-SGD を実装し、EMNIST 手書き文字データセット上でニューラルネットワークを訓練しました。DP モデルは ε = 1.5 でテスト精度 78%、ε = 3.0 で 82% を達成しましたが、DP なしの場合は 90% でした。

差分プライバシーは単一モデルに対する攻撃に対処しますが、複数の組織が共同で 1 つのモデルを訓練する場合どうなるでしょうか？連合学習は、異なる攻撃対象領域をもたらします。それは訓練プロセスそのものを標的とするものです。

連合学習からのデータ漏洩

連合学習は、データセットを直接共有することなく、複数の当事者に分散されたデータセット上でグローバルモデルが訓練される非中央集権的な機械学習の方法です。各当事者はローカルな訓練バッチで初期モデルを訓練し、ローカル勾配を取得します。その後、これらのローカル勾配は中央サーバーに送信され、そこで平均化されてグローバル勾配となります。次に、各当事者はこのグローバル勾配を用いてローカルモデルを更新することで、グローバルモデルのコピーを作成します。

しかし、2019 年の NeurIPS 論文において、MIT の研究者チームが驚くべき結果を示しました：各当事者のローカル勾配は、それらが計算された訓練サンプルに関する情報を漏洩しており、これによりサーバーが当事者の訓練サンプルを再構築できるモデル逆転攻撃が可能になるのです。サーバーが敵対的と見なされないシナリオであっても、この攻撃は勾配が当事者の訓練データを漏洩することを示しており、連合学習のプライバシー目標を無効化します。

この攻撃は、勾配が計算されたサンプルに関する情報を直接含んでいるという観察に基づいています。したがって、サンプルはその勾配から一般的に再構築可能であり、意味的に異なる 2 つの訓練バッチが同じバッチ勾配を受け入れることは稀です。そのため、攻撃者は当事者のローカル勾配からそのバッチサンプルを再構築する問題を最適化問題として定式化します：ターゲット勾配に対して最小限の距離を持つ訓練バッチを見つけるのです。その後、攻撃者は SGD を適用することで解（訓練バッチ）を概算計算できます。

EMNIST データセットでの実験において、この攻撃は単一サンプルのバッチを正確に回復し、サイズ 7 のバッチからは 3 つのサンプルを回復しました。このデータ漏洩を防ぐには、サーバーを含むいかなる当事者も他者の勾配を平文で決して見ないことを保証する必要があります。

安全多者計算による緩和

安全多者計算（MPC）は、関数の出力以外の情報を一切開示することなく、複数の当事者がプライベートな入力上で共同して関数を計算することを可能にする暗号プロトコルです。直感的には、当事者は暗号化された中間値のみを交換するため、いかなる当事者も他者の生入力を決して見ません。

簡単な例で核心的なアイデアを示しましょう：3 つの当事者がそれぞれプライベートな値 x, y, z を持っている場合を考えます。各当事者はその値を 3 つのランダムなシェアに分割し（それらの合計は元の値になる）、1 つずつ他の当事者に配布します。各当事者は受け取ったシェアを合計します。結果として得られる合計自体もランダムですが、それらを足すと x + y + z になります。これらの合計を交換した後、すべての当事者が総和を知りますが、他者の個別入力については何も知りません。

プライベート連合学習（PFL）は、この安全な合計手法を連合学習に応用します：生ローカル勾配をサーバーに送信する代わりに、当事者は勾配を秘密共有し、MPC を介して集約するため、サーバーが決して見るのは合計結果のみとなります。より効率的な PFL プロトコルも存在しており、Amazon のシニアプリンシパルサイエンティストである Tal Rabin が共著した 2023 年の論文で提示されたものもありますが、核心的なセキュリティ原則は同じです。

私たちは、EMNIST データセットを再度使用して、PFL プロトコルの下で計算された当事者のローカル勾配に対してモデル逆転攻撃を実行しました。その結果、攻撃はいかなる訓練サンプルも再構築できませんでした。MPC は FL 中に交換される勾配を保護しますが、グローバルモデル自体はすべての参加者と共有されます。敵対的な参加者がこのモデルを利用して他者のデータを回復できるでしょうか？この問題については次のセクションで探求します。

連合学習グローバルモデルへの攻撃と差分プライバシーによる緩和

PFL が n 人の当事者に安全にグローバル FL モデルを計算させる方法をすでに確認しました。しかし、Fowl らの 2022 年の論文と Shi らの 2025 年の論文は合わせて、敵対的な FL 参加者がグローバルモデルそのものから他者の訓練データを再構築できる攻撃を記述しています。

この攻撃では、攻撃者は ReLU 活性化関数（負の入力をゼロ出力し、正の入力をそのまま出力する一般的なニューラルネットワークの活性化関数）を持つ前処理層をモデルに追加します。この層は nB 個のニューロンで構成されます。ここで B はバッチサイズです。これは、n 人の各当事者が B 個のサンプル固有勾配の平均であるローカル勾配を生成するため、グローバル FL グラデントは nB 個のサンプル固有勾配の平均となるからです。前処理層の nB 個のニューロンのそれぞれが、異なる訓練サンプルの再構築に使用されます。

攻撃者は慎重に前処理層のパラメータを設計し、ReLU がグローバル勾配の最初のニューロンではすべてのサンプルの信号を活性化し、2 つ目のニューロンでは 1 つを除くすべてのサンプルを、3 つ目のニューロンでは 2 つを除くすべてのサンプルを、というように活性化するようにします。したがって、攻撃者は単にグローバル勾配に対応する nB 個のニューロンのエントリを確認し、隣接するニューロンの成分を順次引き算することで、nB 個のサンプル固有勾配を分離します。

前述したように、訓練サンプルはその勾配から直接回復可能です。EMNIST データセットでの実験において、この攻撃はグローバル勾配から当事者のローカルバッチサンプルのうち 1 つを除くすべてを回復しました。しかし、私たちのプライベート FL プロトコルを変更して、プライバシーバジェット 1.5 で DP-SGD を介して計算された差分プライバシーを持つグローバル勾配を出力するようにしたところ、攻撃はグローバル勾配から意味のある情報を一切回復できませんでした。

まとめると、DP と MPC は補完的な防御層を形成します：MPC は訓練中に交換されるものを保護し、DP は最終モデルが明らかにするものを保護します。

攻撃の拡大前に防御を構築する

上記の実験には明確な示唆があります：ML 訓練データに対する攻撃は今日でも実用的であり、それに対抗するためのプライベート計算ツールはすでに運用可能な成熟度にあります。プライバシーと有用性のトレードオフは現実的なものです：私たちの DP-SGD モデルは、意味のあるプライバシーバジェットで 78〜82% の精度を維持しましたが、DP なしの場合は 90% です。

DP の精度への影響はタスクやデータセットに大きく依存することに留意する必要があります。私たちの EMNIST 実験では、比較的小さなモデルを手書き文字に使用しており、ノイズの影響が過大でした。実際には、より豊富なデータセット上で訓練された大規模モデルは、DP ノイズをより滑らかに吸収します。

NIST SP 800-226 は、公開データで事前訓練された大規模モデルが DP-SGD で微調整される場合、強いプライバシーと有用性のトレードオフを示すと指摘しています。不正検出や臨床リスクスコアリングなどの多くの生産環境でのユースケースでは、上記のような攻撃に保護データを曝すという代替手段と比較して、 modest な精度の低下は許容できるコストです。

適切なプライバシーバジェットは最終的にアプリケーション依存となります：放射線画像をスクリーニングするモデルは、不審な取引をフラグ付けするモデルよりも精度低下を許容しにくいかもしれません。組織は特定のリスクと規制要件に応じて ε を調整すべきです。

これらの技術はすでに Amazon で使用されています。私たちは、プライバシー保護計算能力—組織境界を超えた連合学習のための差分プライバシー訓練パイプラインと安全な集約—を生産システムに構築しています。例えば、不正防止チームは顧客の金融データを保護しつつ検出精度を維持するために差分プライバシー訓練を使用しています。

あなたの組織が敏感なデータ上でモデルを訓練している場合、AWS のプライバシー保護 ML 機能を探求し、私たちのチームと連絡を取ることをお勧めします。

原文を表示

Large language models, the highest-profile machine learning (ML) models used today, are trained on huge corpora of public data. But many ML models are trained on smaller, proprietary datasets, which can be highly sensitive and should be kept private. Examples include a hospital fine-tuning a diagnostic model on patient radiology scans, a bank training a fraud detector on transaction histories, or a pharmaceutical company building a drug interaction model from clinical trial records. In each case, the training data itself is the asset that must be protected, but a well-constructed attack on these models can potentially extract information about their underlying training data. Such attacks are possible when the attacker is restricted to submitting adversarial inference queries to a model trained by a single data owner. Alternatively, when multiple data owners collaborate to train a model through federated learning (FL), in which a central server produces a global model by aggregating model updates generated from siloed datasets (instead of collocating the raw data), there exist attacks in which an adversarial server can reconstruct training data from the model updates. Consider three hospitals collaborating to train a shared cancer-screening model without pooling patient records. If the aggregation server can reconstruct one hospital's training images, then the privacy promise of federated learning is broken, and so is each hospital's compliance with patient consent agreements. Finally, an adversarial FL participant could even potentially reconstruct an honest participant's private training data from the global model. These risks are not hypothetical. A 2023 paper from Google DeepMind demonstrated that GPT-3.5-turbo could be prompted to regurgitate verbatim training data, including personally identifiable information. Smaller, domain-specific models trained on concentrated, sensitive datasets are even more vulnerable. As organizations increasingly train models on sensitive financial records, patient health data, and proprietary business intelligence, the attack surface grows proportionally. A successful attack against a healthcare model could reveal whether a specific patient's records were used in training, a violation of regulations such as the US Health Insurance Portability and Accountability Act (HIPAA) and the EU's General Data Protection Regulation (GDPR). An attack against a federated-learning system could reconstruct raw training samples that should never have left their source. For any organization training on private data, understanding and mitigating these threats is no longer optional; it is necessary for responsible AI deployment. In this post, we walk through three escalating attack scenarios: membership inference against a single model, data reconstruction from federated-learning gradients, and training-data extraction from a shared global model. We show how differential privacy and secure multiparty computation defeat each one. An attack on model inference Anyone with query access to a model can potentially determine whether a specific record was used to train it, an attack known as membership inference. Imagine that a hospital deploys a diagnostic model as an API for referring physicians. A malicious actor could probe the API to determine whether a particular patient's records were included in the training data. This would confirm that the patient was treated at the hospital and reveal details about their medical history. In a 2023 paper at the Conference on Neural Information Processing Systems (NeurIPS), Amazon Web Services researchers showed how this works in practice. A trained model tends to produce higher-confidence predictions for inputs it was trained on, a form of overfitting the attacker can exploit. The attacker first generates a dataset that approximates the distribution of the model's training data, then records the model's confidence scores on those samples. Using these scores as labels, the attacker trains a proxy model that learns a confidence-score cutoff separating training data from non-training data. Given a candidate record, the attacker evaluates the proxy model to obtain a cutoff, then queries the target model. If the target model's confidence score exceeds the cutoff, the record was likely in the training set. The authors demonstrated this against a ResNet-50 model trained on ImageNet-1k: 97% of records their attack flagged as training data were indeed training data. Mitigation through differential privacy We’ll show how to mitigate such membership inference attacks with differential privacy (DP), a mathematical framework for computing aggregate statistics (e.g., an average) while bounding how much any single input can influence the result. The core idea: if we can randomize the function so that adding or removing one record from the dataset barely changes the distribution of the function output, an attacker cannot confidently determine whether that record was included. Formally, a randomized function is differentially private if, for any single record added to or removed from the input dataset, the probability of any given output changes by at most a factor of eε, where e is the base of the natural logarithm and ε is the privacy budget. A smaller ε means tighter privacy but more noise in the computation, and vice versa. While NIST guidance suggests that ε < 1 will generally enforce a low enough privacy risk, many real-world deployments operate between 1 and 10, with situation-dependent privacy outcomes. Empirical studies indicate that ε as high as 3 can still provide meaningful data privacy against attacks like membership inference, though our understanding of the effective guarantees of DP against such attacks continues to evolve. DP defeats membership inference because the attack relies on a gap between the model's confidence on training data and on unseen data. DP narrows that gap by ensuring the model would have learned nearly the same parameters whether or not any particular record was included in its training data. How can this approach be applied to ML? Neural networks are trained using stochastic gradient descent (SGD), in which the difference between the model’s output on a training sample and the target output for the sample is propagated back through the model, and the model parameters are adjusted to reduce the difference; the adjustment corresponding to the sample is called a gradient. In practice, the model parameters are typically adjusted according to a batch gradient — the average of sample-specific gradients for a batch of samples. In a landmark 2016 paper, Google researchers introduced DP-SGD, which adds calibrated Gaussian noise to each batch gradient during training. We implemented DP-SGD and trained a neural network on the EMNIST handwritten-letter dataset. The DP model achieved 78% test accuracy at ε = 1.5 and 82% at ε = 3.0, compared to 90% without DP. DP addresses attacks on a single model, but what happens when multiple organizations collaborate to train one? Federated learning introduces a different attack surface, one that targets the training process itself. Data leakage from federated learning Federated learning is a method of decentralized ML in which a global model is trained on datasets distributed across multiple parties, without direct sharing of the datasets. Each party trains an initial model on a local training batch, obtaining a local gradient. The local gradients are then sent to a central server, which averages them into a global gradient. The parties then produce copies of the global model by updating their local models with the global gradient. However, in a 2019 NeurIPS paper, a team of MIT researchers demonstrated a surprising result: the parties' local gradients leak information about the training samples from which they're computed, enabling model inversion attacks in which the server can reconstruct the parties' training samples. Even in scenarios in which the server is not viewed as adversarial, this attack demonstrates that the gradients leak the parties' training data, defeating the privacy goals of FL. This attack relies on the observation that a gradient directly contains data about the sample from which it is computed. Consequently, a sample can generally be reconstructed from its gradient, and two semantically distinct training batches are unlikely to admit the same batch gradient. Therefore, the attacker frames the problem of reconstructing a party's batch samples from its local gradient as an optimization problem: find the training batch whose gradient is minimally distant from the target gradient. The attacker can then approximately compute the solution (the training batch) by applying SGD. In our experiments on the EMNIST dataset, the attack recovered single-sample batches exactly and three samples from a batch of size seven. Preventing this data leakage requires ensuring that no party, including the server, ever sees another party's gradient in the clear. Mitigation through secure multiparty computation Secure multiparty computation (MPC) is a cryptographic protocol that lets multiple parties jointly compute a function over their private inputs, without revealing anything beyond the function's output. Intuitively, the parties exchange only encrypted intermediate values, so no party ever sees another's raw input. A simple example illustrates the core idea: suppose three parties hold private values x, y, and z. Each party splits its value into three random shares that sum to it, then distributes one share to each party. Each party sums the shares it receives. The resulting sums are themselves random, but they add up to x + y + z. After exchanging these sums, all parties learn the total but nothing about each other's individual inputs. Private federated learning (PFL) applies this secure-sum technique to FL: instead of sending raw local gradients to a server, the parties secret-share their gradients and aggregate them via MPC, so the server only ever sees the summed result. More efficient PFL protocols exist, including one presented in a 2023 paper coauthored by Amazon senior principal scientist Tal Rabin, but the core security principle is the same. We ran our model inversion attack against a party's local gradient computed under our PFL protocol, again using the EMNIST dataset. The attack was unable to reconstruct any training samples. MPC protects the gradients exchanged during FL, but the global model itself is shared with all participants. Can an adversarial participant exploit the model to recover others' data? We’ll explore this problem in the next section. An attack on FL global models and mitigation with DP We've seen how PFL enables n parties to securely compute a global FL model. However, the 2022 paper of Fowl et al. and 2025 paper of Shi et al. together describe an attack that enables an adversarial FL participant to reconstruct another participant's training data from the global model itself. In this attack, the attacker adds a preprocessing layer with ReLU activation (a common neural-network activation function that outputs positive inputs verbatim but outputs zeros for negative inputs) to the model. That layer consists of nB neurons, where B is the batch size. This is because each of the n parties produces a local gradient that is an average of B sample-specific gradients, so the global FL gradient is an average of nB sample-specific gradients; each of the nB neurons in the preprocessing layer will be used to reconstruct a distinct training sample. The attacker carefully crafts the preprocessing layer's parameters so that ReLU activates the signals of all samples in the first neuron of the global gradient, all but one sample in the second neuron of the global gradient, all but two samples in the third neuron of the global gradient, etc. Therefore, the attacker simply examines the entries of the global gradient corresponding to the nB neurons and successively subtracts the components between adjacent neurons to tease apart the nB sample-specific gradients. As we mentioned earlier, a training sample can be directly recovered from its gradient. In our experiments on the EMNIST dataset, the attack recovered all but one of the parties' local batch samples from the global gradient. But after altering our private FL protocol to instead output a differentially private global gradient — computed via DP-SGD with privacy budget of 1.5 — the attack failed to recover any meaningful information from the global gradient. Taken together, DP and MPC form complementary layers of defense: MPC protects what is exchanged during training, and DP protects what the final model reveals. Building defenses before attacks scale The experiments above have clear implications: attacks on ML training data are practical today, and the private-computing tools to defeat them are mature enough to deploy. The privacy-utility tradeoff is real: our DP-SGD models retained 78–82% accuracy at meaningful privacy budgets, compared to 90% without DP. It is worth noting that the accuracy impact of DP depends heavily on the task and dataset. Our EMNIST experiments used a relatively small model on handwritten letters, where the noise has an outsized effect. In practice, larger models trained on richer datasets absorb DP noise more gracefully. NIST SP 800-226 notes that large models pretrained on public data show strong privacy-utility tradeoffs when fine-tuned with DP-SGD. For many production use cases, such as fraud detection or clinical risk scoring, a modest accuracy reduction is an acceptable cost when the alternative is exposing protected data to the attacks described above. The right privacy budget is ultimately application dependent: a model screening radiology scans may tolerate less accuracy loss than one flagging suspicious transactions, and organizations should calibrate ε to their specific risk and regulatory requirements. These techniques are already in use at Amazon. We are building private-computing capabilities — differentially private training pipelines and secure aggregation for federated learning across organizational boundaries — into production systems. For instance, our fraud prevention teams use differentially private training to protect customer financial data while maintaining detection accuracy. If your organization trains models on sensitive data, we invite you to explore AWS's privacy-preserving ML capabilities and connect with our team.

この記事をシェア

KDnuggets重要度42026年6月27日 00:00

Apple Silicon で MLX を用いた言語モデルのファインチューニング

The Zvi重要度42026年6月26日 23:51

ホワイトハウスが個別に GPT-5.6 のアクセス権をその場しのぎで決定する方針へ

AWS Machine Learning Blog重要度42026年6月26日 23:42

AWS を活用した保険仲介向けドメイン特化型 AI の先駆者、Cara の取り組み

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む