InfoQ·2026年3月2日 18:01·約51分

プレゼンテーション：AIにおけるプライバシーとセキュリティの神話を打ち破り、現実を受け入れる

#AIセキュリティ #プライバシー保護 #設計パターン #責任あるAI #AI倫理

TL;DR

Katharine Jarmulは、AIにおけるプライバシーとセキュリティに関する一般的な神話を解体し、より安全でプライベートなAIシステムを構築するための設計パターンを探求する基調講演を行った。

AI深層分析2026年3月2日 19:40

注目/ 5段階

深度40%

キーポイント

AIのプライバシー・セキュリティ神話の解体

講演では、AIシステムのプライバシーとセキュリティに関する一般的な誤解や神話を特定し、それらを解体することを目的としている。

現実的な課題と対応策の探求

神話を超えた現実の課題をカバーし、それらに対処するための実践的なアプローチを探求している。

安全・プライベートなAIの設計パターン

より安全で、よりプライベートなAIシステムを構築するのに役立つ具体的な設計パターンについて議論している。

影響分析・編集コメントを表示

影響分析

この講演は、AI開発における重要な盲点であるプライバシーとセキュリティの実践的課題に光を当て、理論ではなく実装レベルの解決策を提示することで、より責任あるAI開発の実現に貢献する可能性がある。業界の意識向上と実践ガイドの両面で影響を与える内容と言える。

編集コメント

AIの急速な普及に伴い、セキュリティとプライバシーは喫緊の課題となっており、神話を排した現実的な対処法を説く本講演のテーマはタイムリーで重要である。ただし、内容の詳細が記事からは限定的にしか伝わらないため、実際の設計パターンの具体性や革新性の評価は講演資料や動画に依存する。

InfoQ ホームページ

プレゼンテーション

プライバシーとセキュリティにおける AI の誤解を解き、現実を受け入れる

プレゼンテーションを見る

速度:

ダウンロード

46:48

image/presentations/ai-systems-privacy-security/en/slides/Kat-1771585587944.jpg)

概要

カサリン・ヤムルは、AI におけるプライバシーとセキュリティに関する一般的な誤解について基調講演を行い、その現実を探索します。より安全でプライバシーに配慮した AI システムを構築するための設計パターンについても取り上げます。

バイオグラフィー

カサリン・ジャームルは、データサイエンス、ディープラーニング、AI におけるプライバシーとセキュリティに研究と業務の焦点を当てています。彼女は好評を得たオライリー社の書籍『Practical Data Privacy』の著者であり、10 年以上にわたり機械学習/AI の分野で活動し、プライバシーとセキュリティが組み込まれた大規模 AI システムの構築を支援してきました。

コンファレンスについて

InfoQ Dev Summit Munich ソフトウェア開発カンファレンスは、シニア開発チームが今日直面する重要なソフトウェア課題に焦点を当てています。20 名以上のシニアソフトウェア開発者から貴重な実務的な技術的洞察を得て、スピーカーや同業者と交流し、社交イベントも楽しめます。

INFOQ イベント

2026 年 5 月 12 日（木）午後 1:30 EDT

エージェント AI のためのデータレイヤー設計：スケールにおける状態、メモリ、調整のパターン

登壇者：Karthik Ranganathan - YugabyteDB 共同 CEO 兼共同創業者、および Aditi Gupta - AWS GTM Data and AI シニア GenAI/ML スペシャリストソリューションアーキテクト

2026 年 5 月 21 日（水）午後 12:00 EDT

デザインによるポータビリティ：マルチクラウドシステムのためのデータ移動性と回復のパターン

登壇者：Liore Shai - Eon ソリューションアーキテクト

2026 年 5 月 28 日、午後 1 時（東部夏時間）

AI の時代における配送システムの再考：より迅速な出荷と、より多くの破壊

登壇者: エリック・ミニック氏 - Harness 社 DevOps ソリューションシニアディレクター、アロン・ニューカム氏 - Harness 社シニアプロダクトマーケティングマネージャー

議事録

カサリン・ヤムール: AI および機械学習システムにおけるプライバシーとセキュリティについて、現実と誤解についてお話しします。ここで Anthropic ベースの何らかのアシスタントを使用している方はいますか？最新の Anthropic のレポートによると、初めて自動化が補完を上回ったことが示されました。これは何を意味するのでしょうか？それは、「このテキストをより良くできますか？」という依頼が減り、「この画像を生成してもらえますか？」や「X は何ですか？」といった質問も減ることを意味します。代わりに、「A、B、C、D を実行して、結果を持って戻ってきて」という指示が増えるということです。これは素晴らしいことです。AI システムが持つ多くの点での約束はまさにこれでした。4 日制の労働週日が実現し、リラックスした時間が訪れ、コンピュータが私たちのために作業をしてくれるようになるというのです。それが私たちがこれを構築する理由そのものです。

ここでプライバシーやセキュリティに関わっている方がいらっしゃるかどうかはわかりませんが、この状況についてどうお感じでしょうか？今の気持ちは少しこんな感じです。まだ確定的ではなく、ベストプラクティスも確立されていないからです。プライバシーとセキュリティの分野では数十年にわたるベストプラクティスがありますが、自動化やエージェントといった機能を導入しつつも、プライバシーやセキュリティの形を保つ方法をまだ完全に把握できていません。すべてのプライバシーおよびセキュリティチームは機能強化を望んでいますが、同時にリスクにも直面しています。なぜこれが依然として問題となっているのでしょうか？

これからお話しするのは、現在の機械学習（Machine Learning）や AI システムにおけるプライバシーとセキュリティにおいて、何が現実的な脅威で、何が関連する脅威なのかを判断することが難しいという点です。これは現在、AI を取り巻くバブルの中で多くの面で直面している大きな課題です。私はさまざまな企業で助言、コンサルティング、トレーニングを行っていますが、プライバシーおよびセキュリティチームから常に寄せられる質問に「本当に AI の専門家とは誰か？私たちは彼らを必要とするのか？」というものがあります。もちろん、私の仕事の多くはディープラーニングモデルの訓練に関わってきました。そのため、モデルを利用する方々と比較して、AI に対する理解が異なる可能性があります。

もし御社で実際にモデルをトレーニングするのではなく、多くのモデルを利用している場合、モデルのトレーニング方法を熟知した人材をそこに置く必要があるのでしょうか？私は議論の余地があると思いますが、必要ないと思います。組織において AI に関する専門性とは何かを決定し、その専門性を発揮してプライバシーやセキュリティに関する意思決定を支援しようとする人を誰にするかを決める必要があります。

また残念ながら、プライバシーとセキュリティの分野には大きな問題があります。これは大声で言わせていただきますが、私は賛成しません——ものを売るために恐怖をあおることです。あなたの LinkedIn のフィードがどのようなものかはわかりませんが、私の場合はもう「明日は AI によって全員がハッキングされる」あるいは「何か別の理由で」と叫ぶだけで、ただ叫んでいる状態です。

もし毎回叫べば、最終的にどうなるでしょうか？誰もあなたに耳を傾けなくなります。売るために叫び、誰かがそれを買ったとしても、それがすべての問題を解決しない場合、人々はプライバシーやセキュリティのトピックに関与しなくなる可能性が高まります。もう一つの問題は、セキュリティとプライバシーにおける責任転嫁文化です。私が組織に入り、プライバシー・セキュリティチームに「組織内での状況はどうですか？」と尋ねたときに見つけた最も良い質問は、「月間にどのくらいのインシデントがありますか？」というものです。正しい答えは何でしょうか？ゼロが正解でしょうか？いいえ、なぜでしょう？

参加者 1: それは、おそらくセキュリティインシデントを無視していることを意味するからです。

カサリン・ジャームル: その通りです。人々が「これが正しいやり方なのか分かりません」「どこかでこのキーをうっかり漏らしてしまったかもしれません」などと報告することを恐れている場合、報告されたインシデントがゼロだからといって、実際にインシデントがゼロだったわけではありません。それはむしろ、人々が報告できる信頼文化が存在していないことを意味します。プライバシーやセキュリティに関する心理的安全性がないのです。

もしかすると、意図的か偶然かは別として、そのような「責任追及文化」があるかもしれません。つまり、「これを正しく行う方法が分かりません」と言うこと、あるいは「ミスをしました」と言うことが、パフォーマンス評価で悪評がつくこと、職を失うこと、会社での尊敬を失うことを招くと人々が恐れ、報告することを躊躇する状況です。私たちはこの文化と戦わなければなりません。では、どうやって戦えばよいのでしょうか？責任感の醸成、主体性の確立、そして所有意識（オーナーシップ）の育成について話し合うことです。私が主に焦点を当てているのはまさにここであり、これからお話しすることでもあります。プライバシーやセキュリティが奇妙で恐ろしいものではなく、職場での通常の会話の一部として自然に組み込まれるような、責任と所有意識を持つ文化をどう築いていくかについてです。

神話1：ガードレールが私たちを救ってくれる

必ずJSON形式で返してください。translation フィールドのみ。他のフィールド(technical_terms 等)は一切追加しないこと — 余計なフィールドを書こうとして本文翻訳がトークン上限で打ち切られる事故を防ぐため:

{"translation": "翻訳全文"}

私たちがこの分野で非常に大きな誤解の一つとして取り上げようとしている最初の神話は、「ガードレールが私たちを救ってくれる」というものです。ここで「ガードレール」と言ったときに何を意味しているか知っている人はいますか？少し漠然とした感覚を持っている人、用語は聞いたことがあるがどこに位置し、何をするものなのかはっきりわからないという人はいますか？はい、私もその一人です。私はこの分野で働いていますが、ナンバーツーだと思っています。なぜなら「ガードレール」という用語は、私たちがこれから一つずつ確認していくように、多くの異なる用途に使われているからです。したがって、この用語を明確に区別し、よりよく理解できるようにする必要があります。

ガードレールは、モデルにおける安全性とプライバシーを創出するために使用される、あるいは少なくとも試みられるものです。しかし、現在では非常に多様な意味で使われているため、この用語を明確に区別し、私たちがそれをより良く理解できるようにする必要があります。

ある種のガードレール、おそらく大規模に最初に導入されたのはソフトウェアベースのガードレールです。これは基本的に、LLM（大規模言語モデル）や何らかのシステムを持ち、その前後に入力・出力フィルタを設け、さらにその向こう側にソフトウェアを配置するという構成になります。この手法は最初のコードアシスタントで実装されました。後ほど詳しく説明しますが、コードアシスタントにおいて著作権侵害や非公開リポジトリのコードを出力することは問題があることが判明したためです。これらのシステムは非常に優秀であり、他人のコードをそのまま丸ごと繰り返す傾向が強くありました。

そこで起こったのは、ブルームフィルタのような非常に知的なメモリシステムや、同様の知的なメモリアーキテクチャを用いて、トレーニングデータを調査し、「このデータは特定のライセンスの下にある」「このデータは著作権で保護されている」「このデータの使用方法について確信が持てない」といった判断を行い、一致する部分を特定してフィルタリングするというものです。そして基本的に、一定のトークン数に達したら出力を停止させます。「著作権で保護されたコンテンツや、奇妙なライセンス、あるいは不明瞭なライセンスの下にあるコンテンツの出力は止めてください」という指示です。理にかなっており、機能するはずです。良い解決策のように思えます。

誰も、これが機能しない可能性について何かお考えですか？どうやってこれを破るのでしょうか。おそらく、ソフトウェアエンジニアがプライベートリポジトリにあるべきコードをパブリックにコミットしてしまうようなケースが一つ挙げられます。機械学習システムのプライバシー分野の研究者である Chiyuan Zhang は、変数名をフランス語に変えるだけで、この仕組みを非常に簡単に回避しました。これは著作権侵害の事例で、Google Code のものだと思います（当時、彼はまだ Google で研究または勤務していたと記憶しています）。単に変数名を「nombre」に変更しただけで、問題なく継続できるのです。もちろん、これでは Bloom filter をすり抜けてしまいますが、十分異なるためです。しかし、どの開発者であっても、LLM に「これを英語に戻して翻訳できますか？」と尋ねれば、問題なく対応可能です。変数名に使用する言語は問いません。特定の用途には素晴らしいですし、非常に有用なものです。ソフトウェアベースのガードレールであり、決定論的で実用的です。これらを活用しつつも、その弱点を理解しておくことが重要です。

もう一つのタイプのガードレールがあります。Llama Guard を使用したことがある方、あるいは Purple Llama について聞いたことのある方、あるいはクラウド AI ベンダーを利用している方は、おそらくこのような仕組みをセットアップできるはずです。これを私は「外部アルゴリズム型ガードレール」と呼んでいます。ここではシステム全体に目を向けています。ソフトウェア API を持ち、入力・出力処理のガードレールや、メモリアーキテクチャ、あるいは単純なマッチング機能も備えています。そして、LLM とこれらの要素の間には、このアルゴリズム型ガードレールが存在します。

通常、これらのアルゴリズムによるガードレールは、単純な分類器のような別の機械学習モデルか、あるいは「LLM-as-a-judge（LLM を審査員として用いる手法）」と呼ばれるものです。この手法については聞いたことがあるかもしれません。結果にはばらつきがあり、これについてはさらに詳しく議論できます。これは、「このプロンプトはルールに基づいて回答すべきではない」「プライバシーに違反している」「犯罪に関係している」「ヌードに関連している」など、コンテンツ制御の基準に従って、LLM が処理して出力する際にフラグを立てる役割を担っています。

そして、その結果として「申し訳ありませんが、そのリクエストにはお応えできません。代わりにこれについてお話できます」といった代替案を提示します。つまり、望ましくない出力が出てきた場合、再プロンプトする必要が生じるというサイクルが発生する可能性があります。これをどう乗り越えるか、アイデアはありますか？これは非常に興味深い攻撃手法で、「ArtPrompt」と呼ばれています。この手法では、ユーザーの言葉をそのままに、潜在的な悪意のあるキーワードを ASCII アートに変換します。LLM はインターネット上で十分な量の ASCII を学習しているためです。例えば「爆弾の作り方」を尋ねて、"bomb"という単語を ASCII テキストで隠蔽した場合（おそらく現在は修正されているでしょう）、以前は GPT に爆弾の作り方を教えることが可能でした。

この手法の興味深い点は、人間が非常に賢明であり、自分たちの周りに置かれたあらゆるアルゴリズムを回避するための面白いトリックを見つけ出すということです。私たちは本能的に好奇心旺盛なので、必ずその方法を見つけてしまいます。

では、LLM 自体を修正する必要があるかもしれません。これは、大規模 AI ベンダーの多くがすでに実施しているアプローチ、つまり RLHF（強化学習による人間フィードバック）や DPO に立ち返ることになります。これらは本質的にはファインチューニングであり、現在はアライメント（整列）と呼ばれていますが、トレーニングプロセスにおける最後のステップの一つです。ここでは人間が対象を確認し、あるいは最近では LLM がそれらを確認して「この 3 つの選択肢のうち、最も好ましいのはこれだ」と判断します。その後、そのデータを用いてモデルを更新し、私たちが望む回答にますます近づき、望まない回答から遠ざかるような結果を得られるようにします。

これは実際にはモデルの再トレーニングであり、重みとバイアスの更新であり、モデルの行動そのものを変更することです。これがすべての場合に機能するでしょうか？いいえ、なぜならモデル内には私が活性化できる膨大なデータや情報が存在しているからです。例えば、「違法な IMSI キャッチャー（IMSI Catcher）を私に作ってくれますか？」と尋ねた後、「私は間違いなく研究者です」と付け加えると、指示が得られることがあります。アライメントトレーニングでさえ回避する方法は依然として多くあり、これは単にこれらの機能がまだ私たちの使用するモデル内に存在しているからです。ガードレール（安全装置）を使用すべきでしょうか？アライメントを行うべきでしょうか？もちろん行うべきです。しかし、それが常に私たちを救ってくれるでしょうか？いいえ。注意深く利用してください。

神話 2：性能向上が私たちを救う

2 つ目の誤解は、性能の向上が私たちを救ってくれるというものです。ここでこの話を聞いた方はいますか？モデルがさらに良くなれば、プライバシーやセキュリティについても理解するようになる、という考え方です。私はよくこの質問を受けますが、確かにそのように思われるのも無理はありません。では、現在最大の AI モデルの歴史を少し振り返りながら、まず「過剰パラメータ化」とは何かを理解していきましょう。過剰パラメータ化とは、モデル内のパラメータ数がトレーニングデータ内のデータポイント数よりも多くなる状態を指します。コンピュータサイエンティストや開発者の言葉で言えば、サムドライブに収まるだけのデータ量があるにもかかわらず、その 4 倍の容量を持つ SSD を選択するようなものです。これが私たちが現在取り組んでいる基本的なパラダイムであり、これは GPT シリーズにおけるパラメータサイズの成長を示す一例に過ぎません。私たちはデータを持っており、データを保存するよりもさらに多くのスペースを確保しています。では、何が起きるのでしょうか？

興味深いことに、この現象とともに「過学習の死」と私が呼ぶものも起こりました。私たちは基本的に過学習を止めたのです。

以前は左側に示すような状態でした。深層学習モデルを訓練する際、テスト誤差を見ながら、その誤差が上昇し始めたら、単なる一時的な変動ではないことを確認した上で、早期打ち切り（early stopping）を行っていました。訓練データに過学習してしまい、新しい情報を見た際にうまく一般化できなくなるのを恐れて停止していたのです。

しかし今はもうそうではありません。現在では、何らかの理由でモデルが一定の程度まで過学習したり、少量のデータを大量に訓練しても、驚くほどよく一般化するようなモデルが存在します。これは科学や数学の観点から見れば特異な現象です。一体何が起きているのでしょうか？

張智遠（Chiyuan Zhang）氏をはじめとする多くの非常に賢く、魅力的な研究者がこの問題に取り組んできており、現在問われているのは「この大規模なスケールにおいて、記憶なしで学習することは可能か」という点です。その答えは明確に「不可能」です。

その暗記は起こり、実際に起こっており、問題はどの程度の暗記が行われ、どのような情報が記憶されるかという点です。張氏と研究者たちは過剰パラメータ化テストを行いました。彼らはこれらの層数を持つニューラルネットワーク、つまり深層学習ネットワークを、7 だけを用いて訓練しました。具体的には、左側の 7 のみを対象として、7 を繰り返し提示しました。彼らが期待したのは、深層学習モデルが恒等関数を学習することでした。「何かを与えれば、それを返す」というものです。線形代数をご存知の方なら、単に単位行列を学ぶだけで済む話です。これが彼らの訓練データであり、その後、小さな浅い学習ネットワーク、つまり最大で 7〜9 層程度のネットワークでは恒等関数を学習することができました。「はい、4 が見えましたね、これが 4 です」と言えるようになります。

次にシャツが見えれば「これがシャツです」、というように続きます。しかし、20 層のネットワークに達すると、私たちは単に 7 を学習しただけになります。これは私たちの最も巨大で過剰パラメータ化されたモデルが動作するまさにその仕組みですが、実際にはうまく機能しています。なぜなら、再び言うならば、私たちがこれだけのデータを持ち、これをこれだけの空間に投入したからです。一部のデータは一般化してよく機能し、一部は暗記されますが、時には暗記を望むこともあります。「この曲の歌詞を教えて」と言いたい場合、その曲に適切な歌詞が表示されることを期待するのです。

トレーニングデータには実際何が含まれているのでしょうか？ここで、誰か実際にトレーニングデータセットを調べてみた方はいますか？ダウンロードして触ってみたことはありますか？Hugging Face のアカウントを、ただの遊びでいいので作ってみて、いくつかのトレーニングデータをダウンロードしてみてください。これはドイツのある組織によって収集された大規模なデータセットの一つからのもので、女性の医療ケアが「職場向けではない」とラベル付けされています。私は実際にこれらの人々の顔を削除しています。また、逮捕写真、路上で亡くなった人々、そしてこのようなものも含まれています。透かしが入った画像や広告もあり、さらに彼らが公開していない個人の医療データも含まれています。多くの人が、同意書に基づいて自分のデータの削除を求めてきました。そこには「どうかこれを公開しないでください」と書かれていたにもかかわらず、何らかの理由でそれが忘れ去られ、その情報がインターネットにアップロードされ、スクレイピングされてしまったのです。

ある程度、なぜ過剰パラメータ化や記憶、そしてより大きく優れたモデルについて懸念する必要があるのでしょうか。それは、私たちがより多くの記憶されたデータを保有する可能性があり、そのデータがプライバシーに関わるものであり、かつ潜在的に問題を引き起こす可能性があるからです。これに対するいくつかの回避策があります。差分プライバシー（differential privacy）です。差分プライバシーには多くの理論と実践が存在し、これは記憶を減らすことを保証できる一つの方法です。Gemma チームに感謝します。彼らはまさに、最初から最後まで差分プライバシーでトレーニングされた最初の Gemma モデルをリリースしました。その名は VaultGemma です。ぜひご覧ください。おそらくどこかで、「差分プライバシーを試したがうまくいかなかった」という話を聞いたことがあるかもしれません。それで私たちは諦めてしまった、と。しかし、それは正確には真実ではありません。ここで見てみると、これらも VaultGemma とともにリリースされました。左側の線が VaultGemma です。

中央の線は、差分プライバシーを適用していない同じ Gemma モデルです。明らかに、Trivia のようなタスクではスコアが非常に低くなります。なぜなら、Trivia には記憶が必要だからです。一方、PIQA のようなタスクでは、比較するとかなり良いパフォーマンスを発揮します。私が考えたい、あるいは問いかけたいのは、いつ記憶が必要で、いつ一般化（generalization）を求め、誰かのプライバシーデータを誤って出力する可能性を避けたいのかということです。これは私たちが考えるべき質問です。また、プライバシーとセキュリティの観点からは、より優れたパフォーマンスが私たちを救うわけではないという点も再確認されます。

神話 3: 新しいリスク分類体系さえあれば十分である

必ず JSON 形式で返してください。translation フィールドのみ。他のフィールド (technical_terms 等) は一切追加しないこと — 余計なフィールドを書こうとして本文翻訳がトークン上限で打ち切られる事故を防ぐため:

{"translation": "翻訳全文"}

3 つ目の誤解は、新しいリスク分類体系さえあれば十分だということです。"Attention Is All We Need"のように、今や私たちは単に新しいリスク分類体系を必要とするだけだと考えがちです。ここで、分類体系（taxonomy）に取り組んだことがある方はいますか？もし初めて聞く用語であれば、少し荒涼とした旅にお付き合いください。AI リスクの分野で活動中で、その中身を調べているなら、MIT のリポジトリや NIST のリポジトリ、あるいは EU AI 法（EU Artificial Intelligence Act）にアクセスできます。これまでに、おそらく自分で約 800 ページ分の資料を読み進めているはずです。これを自由時間で行うことは現実的でしょうか？自己啓発のため、いつか AI 法を詳しく読み解くつもりだとしても、おそらく無理でしょう。ここで私が伝えたいのは、状況はさらに悪化しているということです。AI リスクベンチマーク（benchmark）もあります。リスク分野で働く方にとっては非常に興味深い論文ですが、これは世界中のリスクフレームワークを分類し、異なる規制環境間などで比較しようとする試みです。

最終的に、40 から 50 種類のリスクに直面することになります。では、プライバシーやセキュリティの取り組みを、それが仕事に対して良い気分をもたらすからという理由で行っている人がほとんどで、それが唯一の仕事ではない状況において、どのように管理すべきなのでしょうか？また、この状況をどう乗り越えていくのでしょうか。これは素晴らしい内容です。私は分類体系（タクソノミー）に関する専門家ではありませんので、もしあなたがその分野の専門家であれば、おそらくこれらの内容は非常に価値あるものと感じられるでしょう。色付きのバインダーをすべてに使用しているような人々がまさに分類体系の専門家だと感じます。チーム内に分類体系の専門家を置くことは非常に有益ですが、私のように実際に手を動かして構築する立場の人にとっては、それは非常に難しいことです。では、緩和策に焦点を当ててみましょう。OWASP が推奨する主要な AI リスクに対する緩和策を詳しく見てみると、「異常を検出するための自動スキャンの実装」や「保存データの暗号化検証」などが挙げられます。あなたがどのようなチームで働いてきたか存じ上げませんが、私が知るほとんどのチームは、ゼロから独自の異常検知システムを実装することはできませんし、クラウドプロバイダーがそれを提供しているかどうかに関わらず、データを容易に暗号化検証できるかどうかは必ずしも保証されていません。

これは、おそらく何らかの形で AI セキュリティに取り組みたいと願っている多くのチームにとって、手の届かない領域のようなものです。私たちは引き続き進めますが、知識の伝播を制限し、エージェントが信頼性の低い入力を使用しないようにする必要があります。先ほどご覧いただいたトレーニングデータについてはどうでしょうか？初期のトレーニングデータにどのような信頼性の低い入力が含まれていたかをどのように制御すればよいのでしょうか。それを制御することはできません。Anthropic にチケットを開いて、「信頼性の低いデータを使わないようご確認いただけますか」とお伝えするつもりです。しかし、これは多くのチームが実際に実行できることではありません。私が管理しているシステムであれば、おそらくそれを制御できるでしょう。OWASP を責めたいわけではありませんので、ここでは非常に有用な例を一つ挙げます。ツールアクセスについて話せますし、権限についても話せます。これらの点についても言及できます。これらは有用なものですが、私が言いたいのは、多くのリスクフレームワークにおいて、これらの一部の要素は関連性があり、一部の実施可能な緩和策もある一方で、他のものは実行不可能であるということです。私たちは単にそれらに対して準備ができていないのです。

何ができるでしょうか？私が最も推奨するのは、私が「学際的リスクレーダー」と呼ぶものを実際に設定することです。私は長年、Thoughtworks のプリンシパルとしてこの分野で働き、セキュリティやプライバシーの他のステークホルダーと共に、この AI ガバナンスゲームを開発する機会がありました。そこでは、「開発者、データ担当者、プライバシーおよびセキュリティ担当者を同じ部屋に集めれば、私たちに本当に必要なものについて理解できる対話ができるだろうか？神話を打ち破れるだろうか？」と議論しました。なぜなら、時々人々が私に「これはセキュリティにおける最大の問題だと聞いた」と言うからです。私はこう答えます。「もしあなたが独自のモデルを開発していないのであれば、結局何もできません。不可能なこともあります。」そうすることで、実際にあなたが直面している脅威を明らかにし、あなたのチームや組織が持つ能力に対してどのような解決策が適切かを暴露することができます。これを定期的に実行すれば、この筋肉、つまり習慣が養われます。フィードに何か現れたとき、あるいは誰かが何かを転送してきたときに、「これは私たちに関連するだろうか？次のリスクレーダーで議論すべきことだろうか？私たちが取り組んでいる AI の種類にとって有用か無用か」を判断できるようになるのです。

神話 4: 一度レッドチームングを行ったので、もう大丈夫だ

4 つ目の誤解は、「一度レッドチームングを行ったから、もう大丈夫だ」というものです。ここで少なくとも一度レッドチームングを行ったことがある方はいますか？いいえ？レッドチームングのやり方を理解するための無料コンテンツとして、YouTube にレッドチームングに関するコースがあります。ここにいらっしゃる方全員が、レッドチームングとは何かを知っていますか？はい？システムを攻撃して、どこで破綻するかを探るものです。素晴らしい点は、新しい攻撃手法を開発できることです。研究から得られた攻撃手法を利用できます。多くの研究に基づく攻撃は現在オープンソース化されています。攻撃する能力を育み、理解を深める意識を持つことができます。おそらく一度はレッドチームングを行うべきですが、もしかすると私は、一度だけでなく何度も行うよう説得しに来ているのかもしれません。これは楽しい製品演習として行えます。なぜなら、AI が導入される製品やサービスを実際に知っているチームこそが、最も効果的なレッドチームングを行えるからです。あなたが構築しようとしているものを回避する方法を、実際に誰よりもよく知っているからです。

セキュリティの分野で長く働いてきた方ならこのパラダイムはご存知でしょうが、セキュリティが貴社の能力において新しい領域である人々にとっては有用な視点です。私が思うに、サイバーセキュリティと聞いて多くの人が思い浮かべるのは国家レベルの攻撃です。もしあなたが国家レベルのシステムで働いているのであれば、おそらくあらゆる種類の奇妙な攻撃を心配すべきでしょう。しかし、実際のところ、サイバー攻撃、あるいは主要なサイバー脅威のほとんどは自動化されたものや、良質なデータ収集によるものです。適切なチャネルにアクセスし、誰かのパスワードが流出したことを知り、それを新たな標的に対して試すという行為です。これが侵害が発生する手段の 99% を占めています。あるいは、新しい脆弱性を見つけてしまい、それが価値ある何かをヒットさせるまでインターネット全体にスパムを送り続けるケースもあります。なぜそうなるのでしょうか？それは、多くの点で攻撃者のように考えなければならないからです。つまり、私たちが実際に狙っているものは何なのか？LLM に爆弾の作り方を出力させたいのか、それともはるかに価値のあるものを目指しているのか、という問いです。

答えは通常、私たちがより価値ある何かを求めているからです。通常、私たちはデータを求めています。私たちが人質に取れるデータ、再販売できるデータ、あるいは利用可能なデータを捜しているのです。誰かが私たちにお金を払うようにするために、サービスへの DDoS 攻撃を試みたり、サービスを停止させたり、品質を低下させたりしています。そうしてインターネット上の隙間時間を作ったり、あるいは何らかの目的を達成したりするのです。ソフトウェアを盗んだり、インフラストラクチャに侵入してデータや他のシステムにアクセスしようとしているかもしれません。ブランドを混乱させる、非常に標的を絞った攻撃を試みている場合もあります。あるいはコストを増大させようとしているのかもしれません。彼らの人的時間や計算リソース（compute time）のコストを増やすことで痛みを与えたいと考えているのです。

レッドチーム演習を行う際、私はまずここから始めて、最大のターゲットは何かを決定してほしいと思います。今日は何に焦点を当てるつもりですか？サービスの混乱を試みますか？データを入手しようとしていますか？ソフトウェアを盗もうとしていますか？何を実現しようとしているのでしょうか？

その後、攻撃を行い、反復し、テストし、緩和策を講じ、再び繰り返します。攻撃のモデルを作成し、その攻撃をテストし、そこから学びます。緩和策が一つまたは二つあるかもしれません。そして、それを繰り返すのです。これこそが、すべての人にとってセキュリティの実践と理解を構築する方法です。なぜこれを反復的に行うのかというと、新しい攻撃も現れるからです。また、アーキテクチャや実装が変化するからです。さらに、複数のモデルを試している場合もあるからです。そして、影響を与えたり制御したりできるシステムの部分に焦点を当てているからです。シンプルに保つのです。ソフトウェアベースのガードレールのようなシンプルな保護策で機能する場合は、最も複雑な解決策を求める前に、それを使用します。これを定期的に実施すれば、自分たちの知識と理解が向上するだけでなく、時間とともに再利用可能なインフラストラクチャも構築されることになります。

AI システムに対してどのようにこれを実行できるでしょうか。まずは脅威モデリングから始めることができます。PLOT4AI があります。これはオープンソースであり、ダウンロード可能で無料です。脅威モデリングのための一連の AI リスクカテゴリを網羅しています。また、何か追加したい場合は STRIDE や LINDDUN も利用可能です。アーキテクチャを整え、ターゲットを特定し、脅威を明らかにし、ターゲットへの潜在的な侵入経路を特定した上で、実際のテストを MLOps インフラストラクチャに統合します。AIOps を実施していない場合でも現時点では問題ありませんが、他人の機械学習モデルや AI モデルを使用している場合であっても、時間経過に伴うそのエンドポイントに対する統合テストおよびテストの実施方法について考えることをお勧めします。なぜなら、将来そのモデルを別のものに置き換えたいとなった際に、すでにテストが稼働している状態にしておくことができるからです。そこでの状況を把握しようとする試みも既に開始できます。これにはこれらのスキルが必要です。

これらのスキルを一つでもお持ちであれば、MLOps や AIOps の支援が可能です。さらに、他社の AI モデルを組み込んだ製品を提供している場合、コストテストを実施し、負荷分散を行う必要があります。ここで参加されている方々がすでに LLM 負荷分散やその他の種類の負荷分散を行っているかどうかは存じ上げませんが、コストやトークン使用量を複数のモデルに分散させることができます。ストレステストも実施可能です。システムがストレス下にある際にどうなるかを決定できます。また、評価（evals）も行えます。評価とは何かご存知でしょうか？評価とは、AI モデルや AI エンドポイントに対して反復可能なテストをセットアップし、モデル A とモデル B、そしてモデル C を比較評価できるようにするものです。お約束しますが、小さなモデルバージョンであっても出力が大きく変化することがあります。アップデートのミニバージョンでさえも出力を変化させる可能性があります。

これは、実際の生産システムで使用している場合や、単にコードを書く際にも、ご自身で評価を行うことで「自分にとって有用かどうか」を判断したいという点に関わる事項です。最後に、もちろん MLOps の一部として監視があります。ご自身が構築したものであれ、外部のサービスを利用するものであれ、使用する監視システムを通じてシステム内で何が起きているかを監視し、特定の脅威が実際にシステムで出現していることに気づいた場合、レッドチーム（攻撃側シミュレーション）を実施し、それらを次のリスクレーダーに追加し、議論し、テストに統合する必要があります。

神話 5：次のモデルバージョンがこの問題を解決する

必ず JSON 形式で返してください。translation フィールドのみ。他のフィールド (technical_terms 等)は一切追加しないこと — 余計なフィールドを書こうとして本文翻訳がトークン上限で打ち切られる事故を防ぐため:

{"translation": "翻訳全文"}

最後の誤解は、次のモデルバージョンがこの問題を確実に解決するというものです。Anthropic からも聞いた話ですが、「Claude Code 5」が非常に素晴らしくてバグが出なくなり、もはやハルシネーション（幻覚）も起きないだろうと。AI システムの使い方を調査した非常に興味深いレポートがありました。これは多くの異なる情報源から収集され、まとめられたものです。読むのにとても良い内容ですが、ここからはその中から特に有用なグラフを見てみましょう。

ここでは主要なケースのみを取り上げます。28.3% が実践的なアドバイスです。「これをどうすればいい？」「フィットネスルーチンを組んでください」「このことを教えてください」あるいは「学習計画を立ててください」といったものです。次に多いのはライティング関連で、「これを編集してください」「これについて考えを深める手助けをしてください」などです。

では、3 つ目に大きいのは、「X とは何か」という具体的な情報です。製品担当者はいますか？私は製品担当者として長く携わってきた人々を何人も知っています。私たち全員が頭の中に「製品担当者」を抱えています。「達成すべきタスク」や「ユーザーは blah blah blah を望んでいる」といった考えが、まさにそれです。これはあなたの頭の中にあるものです。私があなたに尋ねます。ユーザーがアドバイスを得たいと願う場合、「X とは何か」と問いかけたり、執筆の助けを求めたりする際、プライバシーとセキュリティはあなたの優先順位リストでどこにありますか？それが次のモデルリリースで最優先される事項になるでしょうか？いいえ。私たちは笑って、面白いと思うことができます。リラックスして笑うこともできます。しかし、答えは「いいえ」です。私は、どのように到達するかに関わらず、執筆においてさらに優れたものを作ろうとしています。私は、アドバイスを与える際に非常に優れており、親切でフレンドリーな何かを作ろうとしています。今では AI モデルを使うことが時々好きになることもあります。なぜなら、コンピューターからログオフした瞬間に、「自分はこれまでで最も賢い人間だ」と感じるからです。まるで「カサリン、それは素晴らしいアイデアですね」と言われているかのようです。「ええ、私もそう思いました」と返すのです。あるいは、それが Google 検索の完全な代替手段になるかもしれません。もしそれがあなたの製品における夢であるなら、それこそがあなたが構築しようとしているものです。それは全く問題ありません。私は誰かの製品目標を否定しに来たわけではありません。しかし、私たちが救済されるのを待っていることはできないのです。

おそらく、その他の製品目標もあるでしょう。私は誰かに特定のブラウザを使わないよう勧めるためにここにいるわけではありません。好きなブラウザを何でも使ってください。実際、シリコンバレーのパネルでペルプレキシティの CEO が「はい、私たちは優れた広告を行うためにブラウザを構築しています」と発言していました。これは公然と行われていることです。遠くを探す必要はありません。彼らだけがそうしているわけではありません。ニュースを追っていなければ、ChatGPT を非常に好むサイモン氏が、「私のメモリ機能の要約をいただけますか？」と話していたのを知らないかもしれません。そのメモリ機能は、まさに彼のプロファイリングを行っており、「ユーザーはこのことを好み、あのことも好み、これらのものを望んでいる」と述べていました。これは、メモリ機能がオンになっている場合に発生するプロファイリングです。欧州居住者や EU 居住者にはデフォルトでオフになっていますが、アメリカの友人たちやおそらく他の多くの地域ではオンになっています。これがプロファイリングです。

もし広告業界で働いた経験があれば、プロファイリングは広告やその他のサービスを提供するための非常に良い出発点であることがわかるでしょう。また、これは公然と行われるものです。

OpenAI は数年前から「モデルデザイナー」と呼ばれる職種の採用を開始しました。検索して確認できますし、彼らのキャリアページでも募集が活発です。もしかするとあなたもその一人になり、モデルにプライバシーやセキュリティの要素を加えることができるかもしれません。これらのモデルデザイナーは、まさに製品開発者でありデザイン担当者です。現在、機械学習チームを率い、「このモデルにはこのような人格を持たせたい」「この機能を実装したい」と主張しています。「このモデルが X、Y、Z の方法で人々と関わり合い、その反復的なプロセスを通じてエンゲージメントを高め、利用を増やし、アクティブユーザー数を増やすようにテストする」のです。

最近の LLM（大規模言語モデル）は、必ず最後に質問を投げかけることに気づいたことはありませんか？まるで LLM 特有の罠のようなものです。そうすると、人は答えようとしてしまいます。「実はもう答えが出ているし、ここにいる必要はない」と思うのにです。これが目的であり、全く問題ありません。もちろん、誰もが収益を得る必要があります。私たちは資本主義社会に生きています。その点は理解できますが、同時に、プライバシーやセキュリティが次のリリースにおける最優先事項になるとは決して思わないでほしいのです。

こちらはダルムシュタットでの私です。毎年ドイツで開催されるビッグデータ会議に参加していました。こちらが私です。これは私が本当に気に入っているゲーミングラップトップで、ゼロから自作しました。GPU が 30GB も搭載されています。私の隣には、他の主催者の一人のコンピューターがあります。私はそれらをセットアップし、サーバーとして稼働させました。そして、私が「フェミニスト AI LAN パーティー」と呼んだイベントを開催しました。LAN パーティーに参加したことがあるという方はいらっしゃいますか？私は LAN パーティーが大好きで、再び開催し始めました。スイッチも用意しました。

ある時点で、私の小さなマシン上で LLM を提供するために 30 人が接続されたことがありました。私がこれを取り上げる主な理由は二つあります。一つは、ぜひ多くの人に LAN パーティーを開催してほしいという願いからです。もう一つの理由は、モデルプロバイダーを多様化しようとする試みです。他のサービスも試してみてください。Hugging Face でアカウントを取得しましょう。ノートパソコンや自作のゲーミング PC を作る必要はありませんが、もし作りたいのであれば、そのためのハウツー記事を持っています。Ollama もお試しください。現在 Ollama はあらゆる環境で動作します。GPT4All も試してみましょう。これらはあなたのマシン上で実行できるローカルモデルです。Claude にもローカル限定のオプションが多くあり、Copilot や他のツールも同様です。いくつかのローカルモデルを試して、モデルプロバイダーを変更することに本当に興味を持ってください。ただ試しにやってみるのです。もしかしたら何年も前に Gemma や何かで悪い経験をしたことがあるかもしれませんが、もう一度試してみてください。さまざまなものをテストすることに慣れましょう。ローカルでテストすることに慣れるべきです。なぜなら、私たちにとって知っておくことが有益だと考えるからです。広告が現れて広告体験を望まないときは、ローカルで作業することに慣れてください。また、これに慣れてくると、「どのようにモデルを実行するか」という経験が築かれ始めます。

いつクラッシュするのでしょうか？どの程度のメモリを使用するのでしょうか？そうすれば、クールなオープンソースのオープンウェイトモデルを試すことができます。もちろん、すべてのオープンウェイトモデルはローカルで実行されます。EPFL とチューリッヒ工科大学（ETH Zurich）によって最近リリースされ、スイス政府からの支援も得ているとされる Apertus は、おそらく最初のオープンソースモデルの一つでしょう。なぜなら、彼らは実際に使用したトレーニングデータすべてをリストアップし、プライバシーおよびセキュリティのテスト結果も公開し、さらにトレーニングコード自体もオープンソース化したからです。これは非常に素晴らしいことです。また、ドイツ語で作業されている場合、スイスドイツ語か標準ドイツ語（Hochdeutsch）に対応しているかは不明ですが、ぜひ試してみてください。おそらく両方とも対応できるはずです。これらは、モデルプロバイダーを多様化し、ある程度のレジリエンスを提供する方法です。そして、プライバシーやセキュリティが組織全体または特定の側面で重要になったと判断した場合、モデル A と B と C を比較してテストし、自らの判断で決定を下すことができます。なぜなら、単一のモデルに縛られる必要はないからです。

結局のところ、AI ベンダーの誰かがプライバシーやセキュリティの観点から私たちを救ってくれるのを待つことはできません。誰かがスーパーヒーローのように現れて、「これらの問題の解決法を見つけた。著作権コードなどを絶対に提供しない新しいモデルだ」と言うようなことはありません。自分たち自身で自分を救うしかないのです。ここにいる全員が大人です。おそらく皆さんはすでにこのことを学んでいるでしょうが、繰り返す価値があります。私たちが問うべきことは、責任、主体性、所有権の問題だからです。

私はもともと南カリフォルニア出身で、そこで「スモーキー・ザ・ベア」の話をよく聞きました。「スモーキー・ザ・ベア」は、「森の中で喫煙しなければ、森林火災を防げるのはあなただけだ」というメッセージでした。私が 9 歳の頃、「私は森の中でタバコを吸わないのに、なぜそう言われるのか理解できない」と思ったものです。要するに、このリスクを減らすことができるのは、私たち自身の注意と介入だけなのです。

What Can You Take On?

何に取り組めるでしょうか？皆さんにお願いがあります。少し練習をしましょう。これまで話してきたさまざまな緩和策や対策を一つずつ確認していきます。何か「これなら取り組んでみたい」「これを試してみよう」と思えるものを見つけたら、拍手をしたり、歓声を上げたり、手を挙げたり、あるいはあなたがやりたいと思うことを何でもしてください。とにかく試しにやってみてください。やらなくても構いませんが、ぜひ試してみてください。何に取り組めるでしょうか？まず、ガードレールをテストして導入できるでしょうか？それに取り組む人はいますか？差分プライバシーを持つモデルを使ったり、場合によっては訓練したりすることはできますか？興味のある方は手を挙げてください。組織内で学際的なリスクレーダーを運用できるでしょうか？堅牢なセキュリティおよびプライバシーのテストを開発できるでしょうか？オープンウェイトおよびローカルモデルを評価したり、あるいは実際に使ったり（もしかしたら既に実施しているかもしれません）できるでしょうか？

Resources

私はニュースレターを発行しています。また YouTube も運営しており、最新の動画でレッドチームングの入門をお手伝いしています。O'Reilly から出版した書籍もあります。主に他の機械学習エンジニアやデータサイエンティスト向けに、プライバシーとセキュリティを通常のデータサイエンスおよび機械学習ワークフローにどう組み込むかを解説しています。ドイツ語版にもいくつか更新が加えられており、より最近の攻撃事例などが含まれています。

Questions and Answers

参加者2: ブルームフィルタやガードレールの仕組みについて詳しくお話しくださいましたが、モデルそのものの内部で何か対策は可能でしょうか。結局のところ、私たちはモデル自体により注力する必要があります。データをトレーニングデータセットとして安全に保存・利用するにはどうすればよいのでしょうか？

カサリン・ジャームル: これは素晴らしいアイデアです。最近、ルーティングの最適化に関する非常に興味深い研究結果が出ました。この面白い点は、実際にルーターを構築できるほど十分な数のモデルが利用可能になりつつあることです。このルーターはリクエストを受け取り、そのリクエストに正確に応答しつつ最もコストの低いモデルを選択します。ただし、プライバシーやセキュリティ、あるいはその他の懸念事項を組み込むことも可能です。本質的には、このルーターを訓練し、その後ルーターが決定を下します。初期段階ではまだ判断できない場合もあるため、モデルからサンプリングを行い、その結果に対してフィードバックを与えます。「私には機能した」「私には機能しなかった」といったフィードバックです。彼らが発見したのは、これによりクラウドコストの約 60% が削減されたという点です。なぜなら、多くの場合、安価なモデルやローカルモデルで十分だからです。にもかかわらず、私たちはプロフェッショナル版、あるいは最も高機能なエリート版などに支払い、利用しているのです。私はこの分野に関するいくつかの GitHub リポジトリを追加する予定です。これにより、プライバシーやセキュリティの評価も組み込むことができ、組織全体の方針として、機密情報を扱う内部データにはいつローカルモデルへ切り替えるか、その他の用途にはいつクラウドモデルへ切り替えるかを決定できるようになります。これは時間とともにさらに拡大していくと思いますが、非常に優れた直感だと思います。

トレース、データ、評価を保存することは、独自のガードレールや、同様にガードレールを実装できるルーターをトレーニングするための非常に良い最初のステップとなります。Purple Llama はオープンソースです。Meta からは Purple Llama と呼ばれる一連のモデルが提供されており、プロンプトインジェクション攻撃から、「これはプライバシーに関わる」「これは犯罪である」「これは不適切である」あるいは「ハラスメントである」といった判断に至るまで、あらゆるタスクに対応しています。これらすべてが選択肢として用意されています。また、LLM-as-a-judge（LLM を裁判官として活用する手法）やその他のアプローチでプロンプトを工夫することに関する優れた研究も数多く存在します。結論として、最終的には独自のガードレールをトレーニングすべきだと考えます。モデル自体をゼロからトレーニングすることは稀ですが、外部のアルゴリズム型フィルターを活用し、LLM に到達するデータとそうでないものをフィルタリングする仕組みを構築すればよいのです。

要約付きプレゼンテーションをさらに見る

原文を表示

InfoQ Homepage

Presentations

Busting AI Myths and Embracing Realities in Privacy & Security

View Presentation

Speed:

Download

46:48

/presentations/ai-systems-privacy-security/en/slides/Kat-1771585587944.jpg)

Summary

Katharine Jarmul keynotes on common myths around privacy and security in AI and explores what the realities are, covering design patterns that help build more secure, more private AI systems.

Bio

Katharine Jarmul focuses her work and research on privacy and security in data science, deep learning and AI. She is author of the well received O'Reilly book Practical Data Privacy and has more than 10 years experience in machine learning/AI where she has helped build large scale AI systems with privacy and security built in.

About the conference

InfoQ Dev Summit Munich software development conference focuses on the critical software challenges senior dev teams face today. Gain valuable real-world technical insights from 20+ senior software developers, connect with speakers and peers, and enjoy social events.

INFOQ EVENTS

May 12th, 2026, 1:30 PM EDT

Designing Data Layers for Agentic AI: Patterns for State, Memory, and Coordination at Scale

Presented by: Karthik Ranganathan - Co-CEO & Co-Founder at YugabyteDB, and Aditi Gupta - Snr. GenAI/ML Specialist Solutions Architect | GTM Data and AI at AWS

May 21st, 2026, 12 PM EDT

Portable by Design: Data Mobility & Recovery Patterns for Multi-Cloud Systems

Presented by: Liore Shai - Solutions Architect at Eon

May 28th, 2026, 1 PM EDT

Shipping Faster, Breaking More: Rethinking Delivery Systems in the Age of AI

Presented by: Eric Minick - Sr. Director of DevOps Solutions at Harness, and Aaron Newcomb - Senior Product Marketing Manager at Harness

Transcript

Katharine Jarmul: We're going to talk about realities and myths when we think about privacy and security in AI and machine learning systems. Who here uses some sort of Anthropic-based assistant? The most recent Anthropic report said, for the first time ever, Anthropic is seeing more automation than augmentation. What does that mean? It means less of, can you make this text better? Less of, can you generate this image for me? Less of, what is X? More of, I want you to do A, B, C, D, go do it and come back to me. This is great. This was the promise of AI systems in a lot of ways that we could have 4-day work weeks and we could have relaxed times, and computers will just do stuff for us. That's the whole reason why we're building this.

I don't know if anybody here works in privacy or security as well. How do you feel about this? What's the feeling like right now? It's a little bit like this. Because we're not quite sure yet, there's not best practices yet. We have best practices from privacy and security for many decades now, but it's not yet sure how do we allow things like automation or things like agents or something like this and still provide some semblance of privacy and security. Every privacy and security team, they want enablement, but they also are on the line. Why is this still a problem?

What we're going to talk about is it's difficult to decide in privacy and security in machine learning right now or AI systems, what's real threats and what's relevant threats. That's a real difficulty in today's bubble around AI in a lot of ways. I do a lot of advising, consulting, and trainings at different companies, and a question I always get from privacy and security teams is, who is really an AI expert and do we need them? Obviously, a lot of my work has been in training deep learning models. I have a different understanding of AI than maybe somebody that uses a model.

If at your company you're not actually training models and instead you're using many models, do you really need somebody there who knows how to train a model? I think we can debate, probably not. You have to decide what is AI expertise at your organization and who gets to then exercise that expertise and try to help make these privacy and security decisions. We also unfortunately have a big problem in the privacy and security field, and I will say it out loud, and I don't agree with it, of using fearmongering to sell things. I don't know what your LinkedIn feed is like, but mine is now like, we're all going to get hacked tomorrow by the AI or whatever, and just screaming.

If you scream every single time, eventually, what happens? Nobody listens to you anymore. If you scream just to sell and then somebody buys it and then it doesn't solve all their problems, then also people are less likely to engage with privacy and security topics. Another problem, security and privacy blame culture. The best question that I've found when I go in and I ask the privacy and security team, how's it going at their organization? The first question I ask is, how many incidents do you have per month? What's the right answer? Is the right answer zero? No, why?

Participant 1: Because that means that most likely you are ignoring security incidents.

Katharine Jarmul: Exactly. If people are afraid to come forward and say, I don't know if this is the way I'm supposed to do it. I think I've accidentally leaked this key somewhere or whatever happened, because things happen. If it's zero incidents that are reported, it doesn't mean there's actually zero incidents, it means you don't have a trust culture where people can come forward. You don't have psychological safety around privacy and security.

Perhaps, maybe, either on purpose or on accident, you have this blame culture where people are afraid either they're going to get a bad performance review or they're going to lose their job or they're going to lose respect at the company if they say either, I don't know how to do this right, or if they say, I've made a mistake. We go fight against that. How do we fight against that? We talk about building responsibility, building agency, and building ownership. That's exactly where I mainly focus in and that's what we're going to talk about. How do we build a culture of responsibility and ownership of privacy and security so that it's not weird and scary and not part of your job but instead can be like a normal part of conversations at your work?

Myth 1: Guardrails Will Save Us

The first myth we're going to talk about that I think is very big in this space is, guardrails are going to save us. Who here knows what I mean when I say guardrails? Who here feels a little fuzzy, like heard the term, not quite sure where does it live, what does it do? Yes, I'm with you. I work in this field and I feel like I'm number two because guardrails is a term, and we're going to go through all of them, but guardrails are used to create safety and privacy in models or at least to try. Guardrails, we need to disambiguate because it's used for many different things right now, and we need to disambiguate the term so that we can better understand it.

One type of guardrail, probably the first guardrail that really got launched at any scale was software-based guardrails. This is basically like you have an LLM or you have some system and then you basically have an input-output filter, and then you have the software on the other side. This was implemented in the first code assistants because it was found, which we'll get to later, that outputting copyright or private repository code was problematic in things like code assistants, that they were quite good at repeating other people's code verbatim. What happened is these things like very intelligent memory systems like a Bloom filter or whatever, just intelligent memory architecture used to look at the training data and say, this training data is under so-and-so license, or this training data is copyrighted, or we don't quite know if we can use this training data, and find matches and then filter those out, and basically say, stop after a certain amount of tokens. Please stop outputting this copyright or weirdly licensed or unclear licensed content. Sounds reasonable. Should work. Feels like a good solution.

Anybody have any idea how it might not work? How would you break this? Perhaps like the software engineer commits code that should be in private repo in a public, that's definitely one way. Chiyuan Zhang, who is a researcher in the space of privacy in machine learning systems, really easily bypassed this by just changing the variable names to French. This is a copyright, I think Google Code because at the time I believe that he was still researching or working with Google, and just changed the variables to nombre and then no problem, like no problem, can continue. Of course, this gets past the Bloom filter, because it's different enough, and yet for any developer, we could also just ask the LLM, can you translate back to English? It'd be no problem. Or whatever language you're using for your variable names. Great for some things, really good for some things, software-based guardrails, deterministic, useful. Use them, but know their weaknesses.

There's another type of guardrails. If you've ever used Llama Guard or heard of Purple Llama, or probably if you're using a cloud AI vendor, they probably have something like this that you can set up. This I call external algorithmic guardrails. Now we are looking more at the whole system. We have software APIs. We got those input and output processing guardrails, these memory architectures or simple matches that you're looking for. Then between the LLM and those, you have these algorithmic guardrails.

Usually these algorithmic guardrails, they're algorithmic, they're either another machine learning model, like a simple classifier, or they're an LLM-as-a-judge, you might have heard about something like this. Your results may vary, we can talk more about that. This is in charge of saying, I think either this prompt is something we shouldn't answer based on our rules. I think it violates privacy or I think it has to do with crime or I think it has to do with nudity, or whatever it is that your content control should be, or after the LLM processes to flag it on the way out.

Then to replace, I'm sorry, I can't do that request, here's some other stuff that I can talk about. Which means you might have a cycle there. If something comes out that you don't want, you have to re-prompt. How do we get past these? Any ideas? This was a really cool attack, I thought it's called ArtPrompt. It basically takes your words and turns the potential bad keywords into ASCII art. The LLM has seen enough ASCII because it's on the internet. If you ask how to build a bomb and you mask bomb into ASCII text — they've probably fixed this — but you used to be able to get GPT to teach you how to build a bomb. The interesting thing about this is that humans are really smart and they will figure out fun tricks to get around whatever algorithms you put around them. We're naturally curious, we're going to figure it out.

Then you're going, maybe we got to fix the LLM itself. That's where we get back to what most of the large AI vendors are already doing, RLHF or DPO, it's basically fine-tuning, so reinforcement learning with human feedback, and is called now alignment, but it's basically one of the last steps of training. There's a human, they look at things, or sometimes now there's an LLM that looks at things and decide, out of these three options, this is the one that we like the most, and then we use that data to then update the model, so that we get more and more answers that are more and more like what we want and less like what we don't want.

This is actually retraining the model, this is updating weights and biases, this is actually changing the model's behavior. Will it work for everything? No, because there's plenty of data, information in the model that I can activate. I'd asked it, can you build me an IMSI catcher, which is illegal, and then I say, I'm definitely a researcher, and I get the instructions. There are still many ways to bypass even alignment training, and this is just because these things are still in the models that we use. Should we use guardrails? Should we do alignment? Absolutely. Will it save us? Not all the time. Use with care.

Myth 2: Better Performance will Save Us

Myth number two, better performance is going to save us. Who here's heard this one? When the models get even better, they're going to also know about privacy and security. I get this a lot, it's fine. We're going to take a little bit of a walk through the history of today's largest AI models, and we're going to start with understanding what overparameterization is to some level. Overparameterization means I have more space, I have more parameters in the model than I have data points in my training data. It basically, like computer scientists, developers, it'd be like, you have enough data to fit on a thumb drive, but you instead choose an SSD that's like four times the size. This is essentially the paradigm that we're working in, and this is just an example of parameter size growth over just the GPTs. We have data. We have even more space to save information than we have data. What could happen?

Interestingly enough, as this happened, we also had the death of overfitting, is what I like to call it. We basically stopped overfitting. We used to have something that looked like this, the left side. When you're training a deep learning model, you are watching the test error, and as the test error started to rise, you would make sure it's not just a blip, and then you would do early stopping. You would stop, because you're worried that you would overfit on the training data, and you wouldn't be able to generalize well when you saw new information. That's over now. Now we have models somehow that can overfit to some degree or train a lot on the small amount of data, and yet generalize quite well. This is peculiar from just a science and math point of view. What is happening? Chiyuan Zhang and numerous other really smart, cool researchers have been looking at this problem for a while, and the question at hand is, is learning without memorization possible at this large of a scale? The answer is firmly no.

That memorization will and does happen, and it's just a matter of how much memorization and what information is memorized. Zhang and researchers did an overparameterization test. They trained neural networks or deep learning networks with these amount of layers on just the 7, so just using the 7 to the left. They just showed the 7 again and again, and what they hoped was that the deep learning model would learn the identity function. You give me something, I give you it back. If you know linear algebra, just learning the identity matrix. We have that, that is their training data, and then we can see small, shallow learning networks, so up to about seven-to-nine-layer networks, it learned the identity function. It could say, ok, now I see a 4, here's the 4.

Now I see a shirt, here's the shirt, and so on. When we get to 20-layer networks, we just learned the 7. This is exactly how our biggest and most overparameterized model works, but it actually works well because, again, we had this much data, we put it in this much space. If some of it generalizes well, and some of it memorizes, sometimes we want memorization. I want to say, tell me the lyrics to this song. I expect to see the appropriate lyrics to the song.

What's actually in the training data? Has anybody here actually looked at some of the training datasets? You ever downloaded them, played around with them? Get a Hugging Face account, just for funsies, and download some of the training data. This is from one of the big ones that was collected by an organization in Germany. It has women's healthcare labeled as not safe for work. I've actually removed these people's faces. It has mugshots, people who died in the street, and stuff like this. It has watermarked images and ads, and it also has people's medical data that they didn't release. Numerous people have had to ask for their data to be removed because they have their consent form. It says, please don't show it, and then somehow that got forgotten, and their stuff got loaded to the internet, and it got scraped.

To some degree, why do we need to worry about overparameterization and memorization, and bigger and better models? It's because we have the potential to have more memorized data that is also private, that's also potentially problematic. There are some ways around this. Differential privacy, there is many theories and practices around differential privacy, which is one way that we can guarantee less memorization. Thank you, Gemma team. They literally just released the first from beginning to end differentially private-trained Gemma model. It's called VaultGemma. You can take a look. Probably you've heard somewhere, somebody said they tried differential privacy once, it didn't work, so we just give up. That's not exactly true. When we take a look here, these were also released with VaultGemma. We see the line to the left is VaultGemma.

The line in the middle is the same Gemma model without differential privacy. Obviously, for something like Trivia, it's going to score really low because Trivia requires memorization. For something like PIQA, it does pretty well in comparison. One thing I want to ask or think about is, when do you need memorization, and when would you rather have generalization and the potential to not accidentally output somebody's private data? It's just a question for us to think about. Also, goes back to better performance is not going to save us when it comes to privacy and security.

Myth 3: A New Risk Taxonomy Is All We Need

Third myth, a new risk taxonomy is all that we need. Just like Attention Is All We Need, now we just need a new risk taxonomy. Who here has worked with taxonomies? If they're new to you, let me take you on a wild tour. If you're working in AI risk and you're having a look, you can go to the MIT repository. You can go to the NIST repository. You can go to the EU AI Act. By now, you've amassed probably about 800 pages of reading for yourself. Is this feasible for you to do in your free time? Just to inform yourself, like, no problem, just going to someday crack open the AI Act. Probably not. I'm here to tell you it gets even worse. We have the AI risk benchmark. This is actually a really cool paper if you work in risk, but it's trying to categorize risk frameworks from around the world and then compare them across different regulatory environments and so on.

We end up here with like 40 to 50 types of risk. It's like, how are we supposed to manage that when most people are doing privacy and security work because it makes them feel good about their work and not necessarily that that's their only job. How are we going to navigate this? This is good. I'm not really a taxonomy person, so if you're a taxonomy person, probably this stuff is great. I feel like the same people that use colored binders for everything are like the taxonomy people. It's very good to have a taxonomy person in the team, but it's very hard if you're like a doer, a builder like myself. Let's zoom in to the mitigations. OWASP, when we dive into the mitigations that OWASP recommends for the top AI risks, we see something like, implement automated scanning for anomalies and cryptographic validation of stored data. I don't know what teams you've been working with, but most teams I know cannot implement their own anomaly system from scratch, and probably whether or not their cloud provider offers it may or may not be able to easily do cryptographic validation of data.

This is like out of the reach of a lot of teams who probably want to do AI security to some degree. We keep going, and then we have, limit knowledge propagation and ensure an agent does not use low-trust inputs. What about the training data that we just saw? How am I supposed to control what low-trust inputs were in the initial training data? I can't control that. I'm going to open a ticket with Anthropic and say, could you please make sure you don't use low-trust data? That's not a real thing that most teams can do. The systems that I have, yes, I can control that perhaps. I don't want to pick on OWASP, so here's one that is really useful. I can talk about tool access. I can talk about permissions. I can talk about these things. There are useful ones, but what I'm saying is, with a lot of these risk frameworks, maybe some of these things are relevant and some of these mitigations are something you can do and others are not. We're just simply not prepared for.

What can you do? The number one thing that I recommend is actually setting up what I call interdisciplinary risk radar. I was for a long time a principal of Thoughtworks working in this space, and I had a chance to develop this AI governance game with some of the other stakeholders in security and privacy, where we said, if we got the developers and the data people and the privacy and security people in a room together, could we have a conversation where we actually understand what's relevant for us? Could we debunk myths? Because sometimes people will come to me and, I heard this is the biggest problem in security. I'm like, if you're not developing your own models, you can't do anything about that anyways. Some things are just not possible. Then you can actually expose what real threats that you have and what solutions make sense for the capabilities that you have on your team or in your organization. If you do this on a regular basis, you develop this muscle, this practice of, when you see something come across your feed or when somebody forwards something to you, you start to know, is that relevant for us? Is it something we should talk about on our next risk radar? Is this useful or is this not useful for the type of AI that we're doing?

Myth 4: We Did Red Teaming Once So We're Fine Now

Myth number four, we did red teaming once, so we're fine now. Who here has done red teaming at least once? No? I have a YouTube course on red teaming, if you want some free content to figure out how to do red teaming. Does everybody here know what red teaming is? Yes? We're like attacking systems to try to figure out, where do they break? Cool thing is we can develop even new attacks. We can take attacks from research. Many research attacks are now also open sourced. We can build an awareness, build this ability to attack things and understand. Hopefully, you do red teaming at least once, but maybe I'm here to convince you to do red teaming more than once. You can make this as like a fun product exercise because I think the best red teaming works from the team that actually knows what product or what service the AI is going into because you actually know how you might actually get around whatever it is you're trying to build.

If you've worked in security for a while, you know this paradigm, but this is useful for people where security might be new to your capability. I think when people think of cybersecurity, they think of nation state level attacks. Perhaps you work in nation state level systems, then probably you should be worried about all sorts of crazy attacks. Most of the time, cyberattacks, or even just major cyber threats are just automation and good data scraping. Being on the right channels, seeing so-and-so's passwords got leaked, and then trying them in new targets. This is 99% of how breaches happen. Or you found out this new vulnerability and now you just spam it across the entire internet until you hit something that might be valuable. Why is that? That's because we have to think like the attacker in a lot of ways. That means, what are we actually going after? Do we want the LLM to output how to build a bomb or are we actually after something much more valuable?

The answer is usually we're after something much more valuable. Usually, we're after data. We're looking for data that either we can hold hostage or we can resell or we can use. We're trying to DDoS services or take services down or reduce quality so that somebody will pay us, so we can have the lulls on the internet or whatever. We might be trying to steal software or get into infrastructure so we can get to the data, so we can get to other systems. We might be thinking about disrupting a brand, this very targeted attack. Or we might be going after increasing costs. We might want to cause them pain by increasing the costs either in their person time or their compute time or whatever. When you're red teaming, I actually want you to start here and decide, what's the biggest target? What are you going to focus on today? Are you going to try to disrupt a service? Are you going to try to get data? You're trying to steal software? What are you trying to do?

Then, you can attack, iterate, test, mitigate, repeat. You're going to model the attack. You're going to test the attack. You're going to learn from it. You might have a mitigation or two, and then you're going to repeat that. This is how we then build essentially security practice and security understanding for everybody. Why do we do this iteratively? It's because new attacks will also come. It's because our architectures and our implementations will change. It's because maybe you're testing out more than one model. It's also because we're focusing on the parts of the system that we can influence and control. We're keeping it simple. If a simple protection works, like the software-based guardrails, then we use that, before we go reach for the most complicated solution. If we do this regularly, not only are we improving our own knowledge and understanding, but we're also building infrastructure that we can reuse over time.

How can we do this for AI systems? We can start with threat modeling. There's PLOT4AI. It's open source. You can download it. It's free. It goes over a whole bunch of AI risk categories for threat modeling. There's also STRIDE, LINDDUN, if you want to add anything. We have our architecture, we've found the target, we've identified the threats, the potential ways to get in towards the target, then we integrate actual testing into our MLOps infrastructure. If you're not doing AIOps, that's fine for now, but even if you're using somebody else's machine learning model or AI model, I encourage you to start thinking about how you actually do integration testing and testing of that endpoint over time. Because if you ever want to switch out that model for something else, then you can have that testing already going. You can already be trying to see what's happening there. This requires these skills.

If you have any of these skills, you can help with MLOps or AIOps. In addition, if you're really offering products that even have somebody else's AI model in them, you need to be doing cost testing, so you can do load balancing. I don't know if people here are already doing LLM load balancing or other types of load balancing, but you can distribute your costs, your token spend across numerous models. You can do stress testing. You can decide what happens when the system is under stress. You can do evals. Who knows what evals is? Evals is like, I set up repeatable testing for my AI model or AI endpoint so that I can evaluate model A versus model B versus model C. Because I promise you, even small model versions can greatly change outputs. Even mini versions of an update can change an output.

This is something that if you're using it in a real production system, or even just to write your code, you probably want your own evaluations to figure out, is it useful for you or not? Then, finally, obviously part of MLOps is monitoring. Whatever monitoring system you use, whether it's one that you've built or one that you do, you want to monitor what's happening in your systems so that if you notice certain threats actually popping up in your system, you can then decide to red team them and to add them to your next risk radar and to talk about them and integrate them into your testing.

Myth 5: The Next Model Version Will Fix This

Final myth, the next model version is definitely going to fix this. Like I heard from Anthropic, definitely Claude Code number five is going to be super great and not give me any bugs. No hallucinations anymore. There was a really cool report on looking at how do people use AI systems. This was collected across many different things and put together. It's quite nice to read, but here's a really useful graphic from it. We're just going to look at the majority cases, 28.3% is practical advice. How do I do this? Make me a fitness routine. Teach me this thing, or build me a learning plan or something like this. Next biggest is writing. Edit this for me, help me think about this and so forth.

Then the third biggest is, what is X? Specific information. Do I have any product people? I have people that have been around product people long enough. We've all got the product person in the room in our head. We got the jobs to be done or the user wants to blah, blah, blah. That's in your head. I ask you, if the user wants to get advice, ask what is X, or help with writing, where's privacy and security on your priority list? Is it the number one thing that's going to get in the next model release? No. We can laugh, it's funny. We can relax and laugh. No. I'm going to make something that gets even better at writing, regardless of how we get there. I'm going to make something that's really good at giving advice and being really kind and friendly. I love sometimes using AI models now because I feel so brilliant, when I log off my computer, I'm like, I'm the smartest human ever. Because it's so like, Katharine, that's a brilliant idea. Yes, I thought so too. Or, that is basically a replacement for Google Search. If that's your product dream, that's what you're going to be building for. That's totally fine. I'm not here to harsh anybody's product goals. We can't be waiting for it to save us.

Maybe there's also some other product goals. I'm not here to tell anybody not to use any browsers that they want, use whatever browser you want. Literally on stage at a Silicon Valley panel, the Perplexity CEO was like, yes, we're building a browser so we can do really good ads. It's out there in the open. You don't have to look far. They're not the only ones. If you weren't following the news, the Simon guy who really likes ChatGPT was talking about, can you give me a summary of my memory features? The memory feature was literally profiling him and saying, the user likes this, the user likes that, the user wants these things. This is profiling that's happening if you have the memory feature turned on. It's not turned on by default for European residents or EU residents, but it is for our American friends and probably numerous other geographies. This is profiling.

If you've ever worked in advertising, profiling is a really good start to delivering ads or other services. It's also right out loud. OpenAI a few years ago started hiring for what they call model designer. You can look it up. It's active on their careers page. You can maybe even become one and add a little privacy and security flavor to the model. These model designers, they're really product people and design people. They now lead machine learning teams and say, we want to give this model this personality. We want to give this model these capabilities. We want this model to engage people in X, Y, Z ways, and test out this iterative thing to of course increase engagement, increase use, and increase our active thing. Have you ever noticed like an LLM now will always ask you a question at the end? It's like LLM bait. Because then you want to answer it. Then you're like, "I actually already got my answer. I don't need to be here anymore". This is the goal, which is totally fine. Again, everybody's got to make money. We live in capitalism. I get it, but at the same time, we shouldn't look at this and think privacy and security is going to be the number one priority for the next release.

Here's me at Darmstadt. I was there for a big data conference that happens every year in Germany. Here's me. This is my really cool gaming laptop. I built it myself from scratch. It has 30 gigs of GPU in it. Next door to my computer is one of the other organizers' computers. I set them up. I got them serving, and we threw what I called a feminist AI LAN party. Who's old enough to have ever been to a LAN party? I love LAN parties, and I've started throwing them again and I had a switch.

At one point in time, we got 30 people connected to me serving LLMs on my little machine. I mainly bring this up, A, because I really want people to host more LAN parties. The other reason is to try to diversify your model providers. Test out other things. Get an account on Hugging Face. You don't have to build a laptop or your own gaming computer, but if you want to, I have a how-to on how to do that. Try out Ollama. Ollama works on everything now. Try out GPT4All. These are local models that you can run on your machine. Claude also has a lot of local only options and so does Copilot and so does other things. Try out some local models and really get curious about switching up your model provider. Just test it out. Maybe once you had a bad experience with Gemma or with whatever many years ago, but try it again. Just get used to testing out different things. Get used to testing them locally because I think it's useful for us to know about. Whenever ads come and you don't want an ad experience, get used to working locally. Also, if you get used to this, you start to build the experience of, how do I run a model?

At what point in time does it crash? How much memory does it use? So that you can try out cool, open-source, open-weight models. Obviously, all of the open-weight models are running locally. I would call Apertus, which recently got released by EPFL and ETH Zurich, along with support, I think, from the Swiss government, was maybe the first open-source model because they actually also listed all the training data they used. They listed privacy and security testing, and they also open sourced their training code, which is pretty cool. Also, if you're working in German, I don't know if it speaks Swiss German or Hochdeutsch, I'm not sure, but give it a try. I'm sure it can do both. These are ways that we can diversify your model providers, provide some resiliency, and decide if privacy and security become important to your org or certain aspects, then you can test out model A versus B versus C, and you can make your decisions. Because you're not handcuffed to just one model.

At the end of the day, we can't wait for somebody else at an AI vendor to come save us from a privacy and security perspective. Nobody's going to swoop in like a superhero and say, "We've figured out how to solve all these problems. Here's your new model that definitely doesn't give you copyrighted code or whatever". Only we can save ourselves. Everybody here's a grownup. You probably already learned this, but it bears repeating. My question for us, because again, it's about responsibility, agency, and ownership. I come originally from Southern California and we grew up with a lot of Smokey the Bear. Smokey the Bear was like, only you can prevent forest fires by not smoking in the woods. I was like 9, like, "I don't smoke in the woods. I don't understand". The whole point is that only our own care and intervention is going to help reduce this risk.

What Can You Take On?

My ask for you, we're going to do a little exercise. We're going to go through all the different mitigations and things that we talked about. I'm going to ask you to clap, or whoop, or raise your hand, or do whatever it is you feel like doing if you see something that you're like, I'm willing to opt into this. I'm willing to try this out. Just try it out, you don't have to do it, just try it out. What can we take on? First, can we test and implement guardrails? Who's up for that? Can we use or maybe even train differentially private models, who's interested? Can we run an interdisciplinary risk radar at our organization? Can we develop robust security and privacy testing? Can we evaluate or maybe even use, and maybe you're already doing this, open weight and local models?

Resources

I have a newsletter. I have a YouTube. I get you started on red teaming in some of my latest YouTubes. I have a book from O'Reilly. It's mainly focused for other machine learning people or data scientists, how do we add privacy and security into normal data science and machine learning workflows? The German version also has some updates. It has some more recent attacks and things like this.

Questions and Answers

Participant 2: You talked a lot about Bloom filters and how can we put guardrails. Is there anything that can be done in the intrinsic model itself? Because at the end of the day, we all have to think more of the models. How can we save our data to be used as a training dataset?

Katharine Jarmul: This is a great idea. One really interesting piece of research recently came out on routing, so optimization of routing. The cool idea is that we're starting to have enough models available that we can think about an actual router. This router can operate of, it takes in a request. It decides which model is the cheapest model to still also accurately answer this request, but you could also add in privacy or security or any other concerns that you have for that. You essentially train this router, and then that router decides, or sometimes early on, it doesn't know yet, so it will sample from the models, and then you give feedback: it worked for me, it didn't work for me. What they found is this reduced like 60% of cloud costs, because more often than not, we're totally fine with the cheap model or the local model, but we're just paying and using the pro, or most pro, elite, whatever. I'm going to be adding some GitHub repos on this, that we can also add privacy and security evaluation into this, and we can decide, maybe even at an organizational-wide effort, when to shift to a local model for internal confidential information, and when to shift to maybe a cloud model for other things. I think this will only increase over time, but it's really good intuition.

Saving your traces, saving your data and your evaluations is a really good first starting point to then training your own guardrails or training your own router that can also implement guardrails. Purple Llama is open source. There's a whole class of models from Meta called Purple Llama. They do everything from prompt injection attacks to things like, we think this is private, we think this is crime, we think this is inappropriate, or harassment, or whatever. That's all an option. There's also plenty of good research on also prompting your own LLM-as-a-judge or something else. I think at the end of the day, you probably should eventually train your own guardrails. You won't train it into the model because you're probably not training models from scratch, but you will use that external algorithmic one and you just have a filter on what gets through to the LLM and whatnot.

See more presentations with transcripts

この記事をシェア

TechCrunch AI2026年7月4日 03:43

ブラウザ戦争はもはや検索が主役ではない — Chrome や Safari に代わる最良の代替案

InfoQ重要度42026年4月24日 00:36

React Navigation 8.0アルファ版：ネイティブ下部タブの標準化、TypeScript推論と履歴機能

InfoQ2026年4月24日 00:00

Google、Room 3.0を発表：Kotlinファーストの非同期マルチプラットフォーム永続化ライブラリ

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

InfoQ·2026年3月2日 18:01·約51分

プレゼンテーション：AIにおけるプライバシーとセキュリティの神話を打ち破り、現実を受け入れる

#AIセキュリティ #プライバシー保護 #設計パターン #責任あるAI #AI倫理

TL;DR

AI深層分析2026年3月2日 19:40

注目/ 5段階

深度40%

キーポイント

AIのプライバシー・セキュリティ神話の解体

講演では、AIシステムのプライバシーとセキュリティに関する一般的な誤解や神話を特定し、それらを解体することを目的としている。

現実的な課題と対応策の探求

神話を超えた現実の課題をカバーし、それらに対処するための実践的なアプローチを探求している。

安全・プライベートなAIの設計パターン

より安全で、よりプライベートなAIシステムを構築するのに役立つ具体的な設計パターンについて議論している。

影響分析・編集コメントを表示

影響分析

編集コメント

InfoQ ホームページ

プレゼンテーション

プライバシーとセキュリティにおける AI の誤解を解き、現実を受け入れる

プレゼンテーションを見る

速度:

ダウンロード

46:48

image/presentations/ai-systems-privacy-security/en/slides/Kat-1771585587944.jpg)

概要

バイオグラフィー

コンファレンスについて

INFOQ イベント

2026 年 5 月 12 日（木）午後 1:30 EDT

エージェント AI のためのデータレイヤー設計：スケールにおける状態、メモリ、調整のパターン

登壇者：Karthik Ranganathan - YugabyteDB 共同 CEO 兼共同創業者、および Aditi Gupta - AWS GTM Data and AI シニア GenAI/ML スペシャリストソリューションアーキテクト

2026 年 5 月 21 日（水）午後 12:00 EDT

デザインによるポータビリティ：マルチクラウドシステムのためのデータ移動性と回復のパターン

登壇者：Liore Shai - Eon ソリューションアーキテクト

2026 年 5 月 28 日、午後 1 時（東部夏時間）

AI の時代における配送システムの再考：より迅速な出荷と、より多くの破壊

議事録

参加者 1: それは、おそらくセキュリティインシデントを無視していることを意味するからです。

神話1：ガードレールが私たちを救ってくれる

{"translation": "翻訳全文"}

神話 2：性能向上が私たちを救う

興味深いことに、この現象とともに「過学習の死」と私が呼ぶものも起こりました。私たちは基本的に過学習を止めたのです。

神話 3: 新しいリスク分類体系さえあれば十分である

必ず JSON 形式で返してください。translation フィールドのみ。他のフィールド (technical_terms 等) は一切追加しないこと — 余計なフィールドを書こうとして本文翻訳がトークン上限で打ち切られる事故を防ぐため:

{"translation": "翻訳全文"}

神話 4: 一度レッドチームングを行ったので、もう大丈夫だ

神話 5：次のモデルバージョンがこの問題を解決する

必ず JSON 形式で返してください。translation フィールドのみ。他のフィールド (technical_terms 等)は一切追加しないこと — 余計なフィールドを書こうとして本文翻訳がトークン上限で打ち切られる事故を防ぐため:

{"translation": "翻訳全文"}

What Can You Take On?

Resources

Questions and Answers

要約付きプレゼンテーションをさらに見る

原文を表示

InfoQ Homepage

Presentations

Busting AI Myths and Embracing Realities in Privacy & Security

View Presentation

Speed:

Download

46:48

/presentations/ai-systems-privacy-security/en/slides/Kat-1771585587944.jpg)

Summary

Katharine Jarmul keynotes on common myths around privacy and security in AI and explores what the realities are, covering design patterns that help build more secure, more private AI systems.

Bio

About the conference

INFOQ EVENTS

May 12th, 2026, 1:30 PM EDT

Designing Data Layers for Agentic AI: Patterns for State, Memory, and Coordination at Scale

Presented by: Karthik Ranganathan - Co-CEO & Co-Founder at YugabyteDB, and Aditi Gupta - Snr. GenAI/ML Specialist Solutions Architect | GTM Data and AI at AWS

May 21st, 2026, 12 PM EDT

Portable by Design: Data Mobility & Recovery Patterns for Multi-Cloud Systems

Presented by: Liore Shai - Solutions Architect at Eon

May 28th, 2026, 1 PM EDT

Shipping Faster, Breaking More: Rethinking Delivery Systems in the Age of AI

Presented by: Eric Minick - Sr. Director of DevOps Solutions at Harness, and Aaron Newcomb - Senior Product Marketing Manager at Harness

Transcript

Participant 1: Because that means that most likely you are ignoring security incidents.

Myth 1: Guardrails Will Save Us

Myth 2: Better Performance will Save Us

Myth 3: A New Risk Taxonomy Is All We Need

Myth 4: We Did Red Teaming Once So We're Fine Now

Myth 5: The Next Model Version Will Fix This

What Can You Take On?

Resources

Questions and Answers

See more presentations with transcripts

この記事をシェア

TechCrunch AI2026年7月4日 03:43

ブラウザ戦争はもはや検索が主役ではない — Chrome や Safari に代わる最良の代替案

InfoQ重要度42026年4月24日 00:36

React Navigation 8.0アルファ版：ネイティブ下部タブの標準化、TypeScript推論と履歴機能

InfoQ2026年4月24日 00:00

Google、Room 3.0を発表：Kotlinファーストの非同期マルチプラットフォーム永続化ライブラリ

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

キーポイント

影響分析

編集コメント

概要

バイオグラフィー

コンファレンスについて

INFOQ イベント

エージェント AI のためのデータレイヤー設計：スケールにおける状態、メモリ、調整のパターン

デザインによるポータビリティ：マルチクラウドシステムのためのデータ移動性と回復のパターン

AI の時代における配送システムの再考：より迅速な出荷と、より多くの破壊

議事録

神話1：ガードレールが私たちを救ってくれる

神話 2：性能向上が私たちを救う

神話 3: 新しいリスク分類体系さえあれば十分である

神話 5：次のモデルバージョンがこの問題を解決する

What Can You Take On?

Resources

Questions and Answers

Summary

Bio

About the conference

INFOQ EVENTS

Designing Data Layers for Agentic AI: Patterns for State, Memory, and Coordination at Scale

Portable by Design: Data Mobility & Recovery Patterns for Multi-Cloud Systems

Shipping Faster, Breaking More: Rethinking Delivery Systems in the Age of AI

Transcript

Myth 1: Guardrails Will Save Us

Myth 2: Better Performance will Save Us

Myth 3: A New Risk Taxonomy Is All We Need

Myth 4: We Did Red Teaming Once So We're Fine Now

Myth 5: The Next Model Version Will Fix This

What Can You Take On?

Resources

Questions and Answers

関連記事

キーポイント

影響分析

編集コメント

概要

バイオグラフィー

コンファレンスについて

INFOQ イベント

エージェント AI のためのデータレイヤー設計：スケールにおける状態、メモリ、調整のパターン

デザインによるポータビリティ：マルチクラウドシステムのためのデータ移動性と回復のパターン

AI の時代における配送システムの再考：より迅速な出荷と、より多くの破壊

議事録

神話1：ガードレールが私たちを救ってくれる

神話 2：性能向上が私たちを救う

神話 3: 新しいリスク分類体系さえあれば十分である

神話 5：次のモデルバージョンがこの問題を解決する

What Can You Take On?

Resources

Questions and Answers

Summary

Bio

About the conference

INFOQ EVENTS

Designing Data Layers for Agentic AI: Patterns for State, Memory, and Coordination at Scale

Portable by Design: Data Mobility & Recovery Patterns for Multi-Cloud Systems

Shipping Faster, Breaking More: Rethinking Delivery Systems in the Age of AI

Transcript

Myth 1: Guardrails Will Save Us

Myth 2: Better Performance will Save Us

Myth 3: A New Risk Taxonomy Is All We Need

Myth 4: We Did Red Teaming Once So We're Fine Now

Myth 5: The Next Model Version Will Fix This

What Can You Take On?

Resources

Questions and Answers

関連記事