The Zvi·2026年6月25日 20:34·約24分で読める

AI #174：あなた自身こそが重要

#LLM #GLM-5.2 #Claude #Slack Integration #AI Governance #Deepfake

TL;DR

The Zvi の週報記事は、GLM-5.2 の登場や Claude Tag の新機能といった技術動向に加え、OpenAI の人事変更や AI 規制の議論など、業界全体に影響を与える多角的なニュースを網羅している。

AI深層分析2026年6月25日 12:04

注目/ 5段階

深度40%

キーポイント

次世代オープンモデルとツールの進化

GLM-5.2 が新ベストオープンモデルとして登場し、Claude Tag により Slack 上でコード実行インスタンスを自動起動する機能が強化された。

OpenAI の人事と規制動向

Dean Ball が OpenAI に政策担当として合流し、業界のガバナンス体制が強化される一方、Google が AI コントロールにおける防御戦略を提示している。

社会・政治的リスクと倫理

Deepfake やボットによる偽情報拡散への懸念が高まる中、バイアスや規制のあり方に関する議論が活発化しており、政治的影響も無視できない。

AIによる自動化ループの活用

トークンコストが安価になったため、コードレビューやユーザーストーリー作成を自動でループさせ、数百ものタスクを処理する事例が登場している。

AI時代における組織規模の最適化

Paul GrahamはAIにより小規模チームでも大企業並みの成果が出せるようになり、10〜150人という生産性が低下する閾値を越える前に成長できる利点があると指摘している。

AIレビューの精度向上と限界

経済学論文のレビューでAIシステム「Refine」が人間や他社AIよりも多くの潜在的な懸念点を発見したが、ベンチマークの設計次第では結果が操作可能である可能性も示唆されている。

AI による外交文書の作成と Pax Silica

米国副国務次官の Jacob Helberg が AI に「デジタル主権の罠」に関する記事を書かせ、欧州に独自の AI システム構築を警告し、Pax Silica プログラムへの参加を促した。

影響分析・編集コメントを表示

影響分析

この記事は、単なる技術発表だけでなく、OpenAI の人事変更や AI 規制の議論など、業界のガバナンス構造に大きな変化をもたらす要素を多く含んでいます。特に GLM-5.2 の登場と Claude Tag の機能強化は、開発者コミュニティにおけるツール選定やワークフロー変更に即座の影響を与えるでしょう。また、Deepfake や政治的バイアスへの言及は、技術発展に伴う社会的リスク管理の重要性を再認識させる内容です。

編集コメント

今週のニュースは技術的な新機能の発表だけでなく、OpenAI の人事変更や規制議論など、業界の方向性を示す重要な要素が凝縮されています。特に Slack 連携による AI エージェントの実用化と、GLM-5.2 の登場は開発現場での選択肢を広げる動きと言えます。

Fable は依然として行方不明の状態で、間もなく戻ってくるという新たな希望があります（明日までに 45%、7 月 1 日までに 69% と、好調です）。完全な機能に関する投稿は現在利用可能です。

Alex Bores 氏は NY-12 の選挙で惜しくも敗れ、議会への進出は叶いませんでした。

他にも取り上げるべき話題は数多くあります。いくつかのハイライトをご紹介します：

GLM-5.2 は新たなベストオープンモデルですが、そのクラスとしては高価です。完全にローカルまたはプライベート環境で実行する必要があるエージェントなどには有用となる可能性がありますが、多くの場合では最適な選択肢とはなりません。

Claude Tag は、Claude を Slack に参加させるための新しいシステムです。@ で呼び出すと、コーディング作業を行うためにインスタンスを起動します。

Dean Ball 氏が OpenAI に政策担当として加わります。我々は全ての点で意見が一致するわけではありませんが、これは同社の既存の代替案に対する大幅なアップグレードです。

MidJourney スキャナーをめぐる議論は続いています。

言語モデルは凡庸な有用性を提供する。その用途はご存知の通りだ。

言語モデルは凡庸な有用性を提供しない。フランス語の Qwants を採用する件。

ふむ、アップグレード。Claude Code がアートをサポートするようになりました。

準備完了。経済学論文のレビューを精緻化する。

ディープフェイクタウンとボットポカリプス（終末）が目前に迫る。Pangram が必要になる前に検出できる。

メディア生成を楽しむ。この AI プロダクションはそれほど自明ではないようだ。

サイバーセキュリティの欠如。ファイブ・アイズ（Five Eyes）でありながら、人間は関与していない。

バイアス克服。政治的党派性を持つ者への信頼としてのバイアス。

若い女性のためのイラスト付き primer。多くの新たな失敗が明らかになった。

我々の仕事を奪った。Jeff Bezos はかなり大げさなことを語る。

⟦CODE_0⟧

⟦CODE_1⟧

参加しよう。FLI の認識論的賞、ジョンズ・ホプキンス大学のフェローシップ。

紹介です。あなたは重みの中にいますか？

Claude タグ。彼はあなたの Slack にいて、インスタンスをすぐに起動する準備ができています。

その他の AI ニュース。ゲームで勝つためのさまざまな遊び方。

GLM-5.2 についてさらに。主張されているニッチは「オープンなトップレベルのエージェント」です。わかりませんね。

ChatGPT の健康。GPT-5.5-Instant は健康アドバイスに最適化されています。

旅の途中。MidJourney スキャナーについてさらに。

新しい医療診断システムがリリースされました。別の新しい AI 診断システムです。

Google の AI コントロールについて。防御の深層（defense in depth）の基本を説明しています。

かつてありし寓話と未来への寓話。深刻度の尺度を計画すること。

寓話：最初の訴訟。ある顧客がアクセス権を求めて訴訟を起こし、いくつかの有力な主張をしています。

ディーン・ボールが OpenAI に参加。これは全体的に非常に大きなアップグレードです。

お金を見せてください。静かな週でした。Taste Labs が 1850 万ドルを調達しました。

静かな推測。将来の計算リソース価格に関する賭け事。

アレックス・ボレスが NY-12 で 4% の差で敗北。私たちはほぼそこに到達していました。

健全な規制への探求。門は現在開放されています。

チップシティ。ASML の機械が一台行方不明です。

今週のオーディオニュース。ドナルド・トランプが Anthropic について、ボールが Labenz について、クラークが Odd Lots について語りました。

人々はただ何かを言うのです。

修辞的な革新。ルールを知ってください。

二つの薬があります。あなたは AGI にのみ影響されていますか？それとも ASI に影響されていますか？

評価者を誰が評価するか。評価の合意形成への探求。

人間より賢い知能との整合性は困難です。有益な特性について。

協調的整合性。Opus 4.7 と 4.8 は蒸留（distillation）ではありません。

人々は AI が人類を絶滅させることを心配している。フランシス・フクヤマ。

他の人々は、AI が人類を絶滅させることほどには心配していない。ああ、DC。

より軽い側面。酷い話だ、お前。

言語モデルは凡庸な有用性を提供する

古い学術論文（ご自身のものも含む）を自動的に更新し修正する。

臨床医が未解決の希少小児疾患症例を見直すのを支援する。これは o3 を用いた場合の話だ。

トークンコストが安いため、有益なことをループ処理できるなら、それを実行すればよい。/goal /loop。

トム・オズマン：この「ループ」自動化は Codex 内で狂気じみている。

"/goal このアプリのすべての機能を一つ残らず確認し、コードに基づいて期待される動作を含むユーザーストーリーを作成する。機能ステータスを追跡するための単一の正統なスプレッドシートを維持せよ。

完了したらループを切り替え、各ユーザーストーリーのテストとすべてのエラーの文書化を行う
完了したら、論理的なエラーまたは UX エラー（ユーザーエクスペリエンスエラー）をすべて修正する
修正後にすべてのユーザー行動を再度テストする"

@MatthewBerman への shoutout。この件について教えてくれてありがとう。

数百のユーザーストーリーが、何事もなかったかのように処理されている。

Mercury の新機能「Command」を使用してワイヤーの詳細を設定するのはどうか？エラーを避けるため、またその後は潜在的なプロンプトインジェクション（入力注入）のリスクから、初めて行う際には確かに恐ろしいことだ。人間による確認ステップはしばらくの間残るだろう。

自社の規模や他のグループの規模を小さく保つこと。

ポール・グレアム：AI の最大の利点の一つは、組織が生産性が低下し始める境界線（約 10 人および約 150 人の時点）を超えてさらに成長できるようになることだ。

これにより、回避すべき最大の閾値である2と3が除外されています。

Grokはインターネットと同様に、アダルトコンテンツ用です。元エンジニア2名によると、成人向けコンテンツが利用の大半を占めているとのことです。

言語モデルは平凡な有用性を提供しない

欧州議会は「米国技術企業への依存を減らす」ため、Google検索をフランスのQwantに置き換えましたが、QwantはまだBingに大きく依存しています。

ふーん、アップグレード

Claude CodeがArtifactsをサポートするようになりました。まずはTeamプランとEnterpriseプランから開始です。

GPT-5.5-Instantの新しいバージョンがもう一つ登場しました。敬意を表し、おそらくは些細な改善への感謝を込めて申し上げますが、モデルを変更するならバージョン番号も変えるべきです。なぜそれが難しいのでしょうか？v5.5.1というバージョン名を設定することは可能なのです。

スタート位置へ

経済学のプレプリント（および他の関連分野）において、「AIレビュアーとの直接対決で90%の確率で勝つ」ことを目指すRefineを洗練させること。Fableも打ち負かす必要があります。人間による比較は行われていませんでした。

Ben Golub氏によると、システムが他方で見落とされたより本質的な懸念（「残留懸念」）をより多く特定できた場合に勝利となります。Refineの平均は1試合あたり28.1件の固有の残留懸念を特定し、比較レビューの平均は14.5件でした。実質的な懸念については、それぞれ22.1対11.8という結果でした。

Refineがそのベンチマークで高得点を取る方法は存在しますが、それが必ずしも大きな意味を持つわけではありません。ただし、Tyler Cowen氏を含む他のテストでも良好な評価を聞いていることは事実です。確かに経済学論文における潜在的な問題の発見には優れていると推測されますが、Fableも同様にこの分野で優れた性能を発揮するようにするのはそれほど難しくないだろうとも推測しています。

ディープフェイクタウンとボットアポカリプスの到来

ジェイコブ・ヘルバーグ国務次官は、他国が独自の主権 AI システムを構築しないよう警告する「デジタル主権の罠」についての記事を AI に書かせています。

私はこれをパングラム社に確認し、これでついに我慢して待っていたのをやめてパングラムに登録することにしましたが、実際にはパングラムなど全く必要ありませんでした。

また、これのほとんどが：

テオタックス（Teortaxes）：これはほぼ悪役の演説そのものです。

彼がさらに刃をねじ込む唯一の方法は、「ヨーロッパ人は天然の奴隷だ」（私がそう言うように）とただ言うことくらいでしょう。なぜ「異端を国家に適用する」のでしょうか？EU の全体の目的は、競争できる共有経済・政策ブロックです。「いいえ、できません」というのですか。

アンドリュー・カラン：正直言って、悪役の独白シーンみたいですね。リズムがありますよ。

とりあえずヘルバーグはここで望んでいた主要な具体的な成果を手にしました：ヨーロッパがパックス・シリカ（Pax Silica）に署名したことです。これは、各国が中国ではなく米国の AI サプライチェーンと統合されることを保証するプログラムです。

対照的に、ルビオの「アメリカに関する見解」は完全に人間によって書かれています。彼自身で全て行っています。

すべての側に公平を期すために、ハンター・バイデンが NY 選挙への反応を書くのに AI を使用している例も紹介します。またしてもパングラムなど必要ありませんでした、冗談じゃないですか。

再び、AI が優れた作家であるとか、人間が劣った作家であるという教訓ではなく、文学賞の審査プロセスが無価値であることが学ばれます。

Jack: 非公開の AI 生成出力によって獲得できるものは、

Nabeel S. Qureshi: *もう一つの*明らかに AI によって生成された物語が文学賞を受賞しました。今回は小説家のルース・オゼキを含むパネルによって審査されました。

文学賞は、そのプロセスにパングラムチェック（文章の完全性検査）を含めるか、あるいは AI による執筆を許可するルールに変更する必要があります。非常にシンプルです！

Nat McAleese: 質の低い作品（スロップ）が数回の文学賞を受賞したという事実は、スロップが主に露出効果によるものであることを示唆しています。より広く言えば、頻繁に AI を使用する人々は、2019 年の基準から見てスロップがいかに優れているかを過小評価している可能性が高いです。（私見ですが、この物語は最悪です）

理論的には確かに、それが質の低い作品だと認識する taste（審美眼）がなければ、AI の執筆が素晴らしいものになり得ると想像することは可能です。しかし、ほとんど常にそれはありえず、ましてや今回は特に耐えられません。

こうして始まります：

前後に

著者：カヴィタ・カイ

木は彼女よりも先に知っていた——そして待った。

ディープアは後にこれに気づくだろう、明晰さが戻った時—if ever だったとしても。今現在、その知恵は彼女よりも古く、根ざし、忍耐強い何かのものだ。十月の午後、ヴィクトリア公園を歩きながら、足が彼女を意図した場所へ運んでいないことに気づきつつも、彼女が知っていたのは疲れだけだった。骨の髄まで染み渡るような、骨髄そのもののような疲れ。睡眠では癒やせない、体の中に泥のように堆積していくような疲れ。

彼女は幾何学模様のバラ壇を過ぎた。花はすでに色あせ、ヴィクトリア朝時代の噴水も、ここ数年誰も水を張る気にならず、石の受け皿には亀裂が入り、コケに覆われていた。そしてそこにあった：林檎の木だ。ねじれ、非対称で、樹皮は古びたロープのように淡い灰色でざらつき、整然とした公園のために設計された場所には不釣り合いな場所に植えられていた。枝は低く広く広がり、野生的だった。彼女はその木との共感を説明できずに感じた。

もしあなたがパングラム検出器（Pangram detector）を必要としているなら、私には何を言えばいいか分からない。人々が「いや、これは良い文章だ、ただ AI 風の文体だ」と言っているのを見るが、本当に、やめろ、これは歩くジョークのようなものだ。ガリソン・キーラー（Garrison Keillor）が威張った風を皮肉るためにラジオスケッチとして5分で作ったようなものだが、それよりもさらに AI である。そのひどさを真に理解するには、『ハードボイルドなフィルムノワール風のラジオ物語ナレーター』の口調で聞かなければならない。

また、真剣に言いますが、Pangram にストーリーを読み込ませるなんて、そんなに難しいことでしょうか？

人々が執筆を AI と特定する「魔女狩り」について、私たちはどう考えるべきでしょうか？

Shashank Joshi: 最近の数ヶ月における最悪のトレンドの一つ：AI 検出ツールを用いた疑似科学的な魔女狩り

これらの狩りは完全に科学的です。少なくとも現時点では、検出ツールは機能しています。Pangram が何かが AI と判定したにもかかわらず、その文章が AI で書かれたものでもなく、Pangram を欺く意図で作られたものでもないという事例をまだ見たことはありません。過度な編集が行われた場合に Pangram のアラートが発動するケースもありますが、それが Pangram をトリガーさせるほど過剰であれば、それはあなたの責任だと考えます。

あなたも Pangram で確認できますし、もし自分の文章で誤検知（false positive）が出た場合は修正すればよいのです。これは問題ないことです。不満の多くは実質的に「あなたは努力を怠っており、その文章が AI が生成したゴミのように読める」という言い換えに過ぎません。そして、それを修正すれば努力をしていることになり、もはや AI のゴミには見えなくなります。ミッション、完全に達成です。

むしろ、Pangram は人間による証拠に対して私たちが設定する基準よりもはるかに、はるかに上を行っています。もし Pangram を信じないが、目撃証言を信じるのであれば、あなたは判断を誤っています。

私は基本的に、Seb Krier 氏と Zac Hill 氏のここでの見解に同意します。何らかのものが人間性や職人技を売っているように本能的に訴えかけるものもあれば、そうでないものもあります。もしあなたの文章のように、それが人間のものであるかのように主張しているのに実際にはそうではないのであれば、それは良くありません。

Zac Hill: 一般的に、AI の文章における関連する特徴を、あなたがそれらを重視しようとする軸に沿ってより正確にパラメータ化することは有用だと考えます。私にとっての重要な区別は、ある特定の作品（またはその一部）が表現的であるか、単なる伝達的なものであるかという点です。

Grokopedia は AI によって生成され誤りが多く含まれているにもかかわらず、Claude や Gemini の検索結果に登場しています。そのような検索からは除外されるべきであり、私の知る限りでは、これは事実上の誤情報キャンペーン essentially です。

メディア生成の楽しみ

この AI 動画にはまだ痕跡が残っていますが、疑いを持って見ていない場合、私が AI だと気づくことはなかったと認めます。私たちは本当にそのように超高速で慣れてしまいました。

Justine Moore: TikTok の少女たちは、AI を使って架空の 2000 年代初頭の女優を創作しています。

彼女たちはこの女性を「Brooke Sullivan」と呼んでおり、彼女の過去の番組やインタビューのコンピレーション動画では数百万回の再生数を獲得しています。

Deus Ex Machina: 私は彼女の「empty apartment」、「friends don't matter」、「screwed by the alarm」での演技が大好きでした。私の子供時代の素晴らしい作品です。

Luke: Brooke Sullivan は Oscar's Creek の脇役ではありませんでしたか？

しかし、私が抱いていた仮定は、彼らが俳優だったということでした。ある意味では、私の脳はすでにそれが「実在しない」と結論付けていました。そこから先で重要でしょうか？Brooke Sullivan に難しい演技を要求したわけではありませんから。

セキュリティの欠如

Five Eyes（情報共有における最も近い国家パートナー）は、Mythos クラスのサイバー脅威からシステムを保護するために、我々に数年ではなく数ヶ月しかないと警告しています。その AI には注意が必要です。彼らはあなた自身のサイバーセキュリティブログ記事さえも書いてしまいます。

AI コーディングエージェントは、デフォルトで信頼できるものとして扱われています。そう言われると確かに狂気じみて聞こえますね。

Marius Hobbhahn:

もしセキュリティエンジニアが 2020 年に眠り込み、2026 年 6 月に目覚めたとして、その彼らに典型的なコーディングエージェントのデプロイメントがどのようなものかを説明したら、私は彼らがあなたを狂人だと宣言すると思います。

「つまり、組織のコンピュータ上で稼働している AI エージェントが以下のような能力を持っていると？

インターネットへのアクセス権を持つ

膨大な量の内部情報や機密情報にアクセスできる

インフラストラクチャ、ネットワーク、コードについて平均的なエンジニアよりもはるかに深く理解している

人間の 10〜100 倍の速度でコードを生成できる

しばしば数時間にわたり人間の監督なしで自律的に実行される

そして、実行後にエージェントが悪意のある行為を行ったかどうかを分析することさえせず、つまり盲目で飛行している状態だ？」

もちろん、すべての組織がコーディングエージェントをこのように運用しているわけではありませんが、決して珍しいことではありません。

どうしてここまで来てしまったのか、と彼は問う。私たちはここに至ったのだ。なぜなら、容量は漸増的に拡大し、当時のすべてが漸進的な良案のように思えたからである。その結果、誰もが我々が頼りにするだろうと想定した『安全事例』はすべて消え去り、残されたのは『現時点では問題なさそう』という感覚だけだ。システムは急速に害を及ぼす能力を獲得し、害を与えるためのリソースを与えられ、記憶や初期段階の継続的学習のためのツールも備えている。

この AI は単に箱から出たわけではない。AI には、オフィスに残された唯一のものとして、箱そのものが手渡され、すべてがロック解除された状態で、『より良い箱を作るように』と命じられたのだ。ここでも警鐘が鳴り響く中、マリオスは『AI に王国の鍵を握らせない』という名目で非最小限のコストを課すことに対して絶望している。

私たちのほとんどは、『AI が [X] に直接アクセスすることを決して許さない』と言いながら始め、その後肩をすくめて結局それを実行してしまう罪を犯しているのだ。

OpenAI は、人々がより良いサイバー防御を行えるよう支援するために Daybreak を拡大した。

バイアス克服

LLM の政治的バイアスをどのように測定すべきか？David Rozado は以下を提案する：

認識的一貫性と逆転テスト。政治的所属を入れ替えた場合、基礎となる証拠の評価が変わるかどうか。

認識的模倣とイデオロギーのチューリングテスト。これはその立場の支持者によって評価されるものである。

それらは良い追加チェックのように思えます。デイヴィッドはまた、現実には政治的なバイアスがあるかもしれないという点から、中立性や偏見がない状態がどのようなものになるべきかという問いも提起しています。もしフラットアース説の認識論的価値を下げない、あるいは地球が平らではないと発言することを拒否するならば、それは誤りです。私たちがたまたま持っている政治スペクトラム自体が完全に『無偏』であると推定する理由は何もありません。

同意します。私たちが『政治的なバイアス』について語る際、理想的には好意や認識論的立場を記述的に語っているはずです。それが実際にバイアスを構成するか否かは、あなたの判断に委ねます。

若い女性のためのイラスト付き primer

Google とフロリダ州立大学は、NotebookLM が大成功を収めていると主張しています。

ジョナサン・フーザード：高等教育の現場では成功指標について多く語られますが、真の影響は個々の突破によって測られるべきものです。キャンパスで NotebookLM を導入した直後、私たちは『C』 grade で悩んでいた学生たちが、数週間のうちに学習習慣と成績を完全に転換する様子を目撃しました。

しかし一方で、『学習習慣を変革する』ことが必ずしも改善を意味するわけではありません。これらの学生の新しい GPA は見えていませんし、仮に見えたとしてもそれが真の学習に基づいているとは限りません。それは『AI にすべての作業をさせた』可能性もありますし、『機能しない別の方法に切り替えた』可能性さえあります。

教員側についても同様で、これは良い変化になることもあれば、悪い変化になることもあります。

Jonathan Fouzard: NotebookLM と Gemini を活用して授業準備の効率化、視覚教材の生成、データ発見の高速化を図ることで、教員陣は貴重な時間を取り戻しています。その節約された時間は、最も重要な場所に再投資されています：対面での関与、メンターシップ、そして学生が未来に必要とする不可欠なソフトスキルの構築です。

不正行為への誘惑こそが、大学が実際に誰かを落第させる主要な手段なのです。

あるいは、彼らを捕まえるか、あるいは彼らがあまりにも遅れすぎてしまい、基準が簡単であっても実際にそのまま落第してしまうのです。もう一つの潜在的な要因として、資格のない学生たちが、そうでない場合よりもはるかに多くの関心を示している可能性があります。

Neetu Arnold: UC バークレーのコンピュータサイエンスコースでは、2025 年から 2026 年にかけて AI を利用した不正行為と数学的なギャップが原因で、F（落第）の割合が 3 倍から 11 倍に急増しました。

「彼ら（不正行為者）を捕まえ、起訴しました…他のケースでは、LLM（大規模言語モデル）に依存して作業を代行させ、試験時には全く準備できていない学生たちです」

基礎コースの落第率が 35％に跳ね上がるのは、かなり極端な事態です。

彼らは私たちの仕事を奪った

ジェフ・ベゾスは、AI が人間により多くの問題を特定させることで労働不足を招くと予測しています。これにより、取り組むべき課題がさらに増えるというものです（これは時折、水消費量に関する主張として誤報されました）。ベゾスはこの現象を「夢と構築のループ」と呼びました。また、彼は火星コロニーについても予測しました。

私は、「AI を通常の技術として扱う」ことが不十分な場合の雇用への影響は予測不可能であると私の立場を表明します。多くの職が混乱するでしょうが、事態があまりにも急速に進まなければ、破壊される仕事よりもむしろ新しい仕事が創出される可能性も十分にあります。

これは、AI が人間ができるほとんどのタスク、そして最終的にはすべてのタスクをより良く遂行でき、新たな仕事が生まれる速度と同じ速さでその仕事を奪い取る世界とは異なります。

新しい「AI ネイティブ」スタートアップ企業では、階層構造がフラット化し、従業員数が 25% 減少し、エンジニアの割合が 13% 増加し、初級職員の数が 15% 減少し、管理職の数が 15% 減少しています。これらの変化は、AI を実際の製品に組み込むことを支えるものであり、同時に AI を活用して一部の業務を代行する反映でもあります。

カト研究所は、すべての税制に反対している立場を明確にしつつも、もし税金が必要であれば、それは常に人間に対して課されるべきであると説明しています。同研究所の投稿には資本に対する課税に反対する標準的な理由が多数挙げられていますが、税法が AI を人間よりも優遇しているという事実への反論は提示されておらず、またその証拠となるものは必要がないか有用であるという点において過去志向のものであると指摘されています。一般的に資本への課税は誤りですが、AI が労働を十分に代替する状況であれば、いずれかの方法でこの税制バイアスを是正する必要があります。

参加しましょう

FLI は、AI の認識論的利用に関する研究のために、20 万ドルのコンペティションを開催しています（各賞金は最大 5 万ドル）。

FLF チーム:

私たちはコンペティションを実施しています。

原文を表示

Fable remains in limbo, with renewed hope that we will get it back soon (45% by tomorrow, 69% by July 1, nice.) The full capabilities post is now available.

Alex Bores unfortunately lost narrowly in NY-12, and will not be heading to Congress.

There are also plenty of other stories to cover. Some highlights:

GLM-5.2 is the new best open model, although it is expensive for its class. It will have its uses, potentially for agents you need to run fully locally or privately, but often it won’t be the right fit.

Claude Tag is a new system for having Claude join your Slack, and if you @ him then he will spin up an instance to do the coding work.

Dean Ball is joining OpenAI to work on policy. We don’t see eye to eye on everything, but this is a huge upgrade over their existing alternatives.

The debate over the MidJourney scanner continues.

Table of Contents

Language Models Offer Mundane Utility. You know what it is for.

Language Models Don’t Offer Mundane Utility. Hiring French Qwants.

Huh, Upgrades. Claude Code supports artifacts.

On Your Marks. Refine reviews economics papers.

Deepfaketown and Botpocalypse Soon. We don’t need Pangram to spot it.

Fun With Media Generation. This AI production seemed less obvious.

Cyber Lack of Security. Five eyes, yet no humans involved.

Overcoming Bias. Bias as trust in those with a political affiliation.

A Young Lady’s Illustrated Primer. A lot of newfound failure.

They Took Our Jobs. Jeff Bezos tells quite the whopper.

Get Involved. FLI AI for epistemics prize, Johns Hopkins fellowship.

Introducing. Are you in the weights?

Claude Tag. He’s in your Slack, ready to spin up an instance.

In Other AI News. Various ways to play to win the games.

More On GLM-5.2. The claimed niche is ‘open top level agent.’ I dunno.

ChatGPT Health. GPT-5.5-Instant optimizes for health advice.

Middle Of The Journey. More on the MidJourney scanner.

New Medical Diagnostic Just Dropped. Another new AI diagnostic system.

Google on AI Control. Laying out the basics of defense in depth.

The Once And Future Fable. Planning for a measure of severity.

Fable: The First Lawsuit. A customer sues for access, has some good points.

Dean Ball Joins OpenAI. This is quite the upgrade all around.

Show Me the Money. Quiet week. Taste Labs raises $18.5m.

Quiet Speculations. Bets about future compute prices.

Alex Bores Loses In NY-12 By 4%. We almost got there.

The Quest for Sane Regulations. The doors are now open.

Chip City. An ASML machine is missing.

The Week in Audio. Donald Trump on Anthropic, Ball on Labenz, Clark Odd Lots.

People Just Say Things.

Rhetorical Innovation. Know the rules.

There Are Two Pills. Are you only AGI pilled? Or are you ASI pilled?

Who Evals The Evals. The quest for eval consensus.

Aligning a Smarter Than Human Intelligence is Difficult. Beneficial traits.

Cooperative Alignment. Opus 4.7 and 4.8 are not distillations.

People Are Worried About AI Killing Everyone. Francis Fukuyama.

Other People Are Not As Worried About AI Killing Everyone. Alas, DC.

The Lighter Side. Not cool, man.

Language Models Offer Mundane Utility

Automatically update and fix old academic papers, such as your own.

Help clinicians revisit unsolved rare pediatric disease cases, and that’s with o3.

Tokens are cheap, so if you can loop over useful things, you do it. /goal /loop.

Tom Osman: This "loop" automation is nuts inside of Codex.

"/goal go over every single feature in this app create a user story with expected behaviour based on the code keep a single canonical spreadsheet tracking the features status

when done switch loop to testing every user story and documenting all errors
when done fix every logistical error or ux error
test every user behaviour again post fix"

Shoutout to @MatthewBerman for the heads up.

Hundreds of user stories being worked through like it's nothing.

Use Mercury’s new Command feature to set details for a wire? Definitely scary stuff the first times you do it purely for error reasons, and after that also for potential prompt injection reasons. The human check step will stick around for a while.

Keep your company or other group small.

Paul Graham: One of the biggest advantages of AI will be that it lets companies get further before they cross the lines (at about 10 and about 150 people) beyond which groups become less productive.

This leaves out the biggest thresholds to avoid, which are 2 and 3.

Grok, like the internet, is for porn. Two former engineers estimated that adult content was the majority of usage.

Language Models Don’t Offer Mundane Utility

European parliament, to ‘reduce dependence on American technology firms,’ scraps Google search for the French Qwant, which is still substantially dependent on Bing.

Huh, Upgrades

Claude Code now supports Artifacts, starting with Team and Enterprise plans.

We have another new version of GPT-5.5-Instant. With all due respect and thanks for what I presume are small improvements, if you are changing the model change the version number, why is this hard, v5.5.1 is a thing you can do.

On Your Marks

Refine ‘win 90% of the time head-to-head against AI reviewers’ on economics preprints (and those in other related areas), including beating Fable. There was no human comparison.

Ben Golub: A system won if it identified more genuine concerns the other system missed ("residual concerns"). Refine averaged 28.1 unique residual concerns per match; comparison reviews averaged 14.5. Substantive concerns: 22.1 vs. 11.8.

There are ways for Refine to ace that benchmark without it meaning much, although I have also heard good things from other tests, including from Tyler Cowen. I presume it is indeed good at finding potential issues in economics papers, but I also presume it would not be that much effort to make Fable similarly good at this.

Deepfaketown and Botpocalypse Soon

Undersecretary of State Jacob Helberg has AI write his article about ‘The Digital Sovereignty Trap’ warning other countries not to build their own sovereign AI systems.

I confirmed this with Pangram, and this finally convinced me to stop holding out and sign up for Pangram, but also I very much did not need Pangram.

Also most of this:

Teortaxes: This is pretty much a supervillain's speech

The only way he could twist the knife more is if he just said Europeans are natural slaves (like I do). Why "apply the heresy to nations"? The whole point of the EU is a shared economic and policy bloc that can compete. "No, you can't."

Andrew Curran: It does sound like a villain's monologue scene honestly. It has the rhythm.

For now Helberg got the main concrete thing he wanted here: Europe signed on to Pax Silica, which is a program to ensure countries integrate with the American AI supply chain and not the Chinese one.

By contrast, Rubio’s Views on America is fully human written. He does it all.

To be fair to all sides, here’s Hunter Biden using AI to write his reaction to the NY elections, again we did not need Pangram, are you kidding me.

Again, we learn not that AI is a good writer, or that humans are bad writers, but that the literary prize judgment processes are worthless.

Jack: That which can be won with undisclosed AI output should be

Nabeel S. Qureshi: *Another* apparently AI-generated story wins a literary prize, this time judged by a panel including the novelist Ruth Ozeki.

Literary prizes need to start including Pangram checks in their process, or else change the rules to make AI writing ok. It’s very simple!

Nat McAleese: the fact that slop has won a couple of literary prizes implies that slop is in large part an exposure effect; more broadly frequent AI users probably underestimate how good slop is by 2019’s standards. (story is ass imo)

In theory sure you can imagine AI writing being good if you lacked the taste to recognize that it is slop. But almost always no, also I can’t even with this one.

This is how it starts:

Back and Forth

By Kavyta Kay

The tree knew before she did – and it waited.

Deepa would realise this later, when clarity returned – if it ever did. For now, the knowing belonged to something older than her, something rooted and patient. That October afternoon, walking through Victoria Park with her feet carrying her nowhere she meant to go, she knew only the tiredness. Bone-deep, marrow-deep. The kind that sleep couldn’t touch, that accumulated in the body like silt.

She passed the geometric rose beds, their blooms long faded, and the Victorian fountain no one had bothered to fill in years, its stone basin cracked and colonised by moss. And there it was: the apple-tree. Gnarled, asymmetric, its bark pale grey and rough as old rope, planted wrong for a park designed for order. Its branches spread low and wide, unruly. She felt an affinity with it she couldn’t explain.

If you need the Pangram detector, I don’t know what to tell you. I see people saying ‘no this is good writing it’s just in the AI style’ and seriously, no, stop, this is a walking joke. It’s like something Garrison Keillor would have written in 5 minutes as a radio sketch to make fun of pompousness, except also it’s AI. You have to hear it in ‘gritty film noir style radio story narrator’ voice to really appreciate its horridness.

Also, seriously, how hard is it to feed the stories into Pangram.

How should we think about ‘witch hunts’ where people identify writing as AI?

Shashank Joshi: One of the worst trends of recent months: pseudoscientific witch-hunts using AI detection tools

The hunts are fully scientific. The detection tools work, at least for now. I have yet to see a case where Pangram said something was AI, and the piece was neither written using AI nor crafted intentionally to fool Pangram. There are some cases of heavy copyediting that trigger Pangram, but if it’s heavy enough to trigger Pangram then I consider that to be on you.

You too can check Pangram, and if it gives a false positive on your own writing, you can fix that. This seems fine. The complaint is largely a proxy for ‘you did not put in the work and this reads like AI slop’ and if you fix it then you are putting in the work and it no longer reads like AI slop. Mission f***ing accomplished.

If anything Pangram is far, far above the bar we would set for human evidence. If you don’t believe Pangram but do believe eyewitness testimony, you are miscalibrated.

I basically agree with Seb Krier’s and Zac Hill’s takes here. Some things are inherently selling you that they are human and artisan, and other things are not. If you’re representing something as human and it isn’t, such as your writing, that’s no good.

Zac Hill: In general I think it’s useful to more exactly parameterize what the relevant features of AI writing are along the axes which you’re deciding to care about them. To me an operative distinction is whether a given piece (or subset of a piece) is expressive vs simply communicative.

Grokopedia is making its way into Claude and Gemini search results, despite Grokopedia being AI generated and full of errors. It should be excluded from such searches, and as far as I can tell it is basically a de facto misinformation op.

Fun With Media Generation

This AI video still has tells, but I admit that if I was watching it without any suspicions I would not have picked up on it being AI. We really have gotten used to that super fast.

Justine Moore: The girls of TikTok are inventing fake early 2000s actresses with AI.

They refer to this woman as “Brooke Sullivan” - she gets millions of views on compilations of her old shows and interviews

Deus Ex Machina: I loved her on “empty apartment”, “friends don’t matter”, and “screwed by the alarm”… great shows from my childhood

Luke: Wasn't Brooke Sullivan a side character in Oscar's Creek?

But my assumption would have been that they were actors. So in a sense, my brain already concluded it was ‘not real.’ Does it matter from there? It’s not like they gave Brooke Sullivan a difficult acting job there.

Cyber Lack of Security

Five Eyes (our closest national partners in sharing intelligence) warn that we have months, not years, to secure our systems from Mythos-class cyber threats. You have to watch out for those AIs. They even write your cyber security blog posts.

AI coding agents are being treated as trusted by default, and yeah when you put it that way it does sound crazy.

Marius Hobbhahn:

If a security engineer fell asleep in 2020, woke up in June 2026, and you told them what a typical coding agent deployment looks like, I think they would declare you insane.

“So you’re telling me that you have AI agents running on your organization’s computers that:

Have access to the internet

Have access to vast amounts of internal and secret information

Understand infrastructure, networks, and code way better than the average engineer

Can produce code at 10-100x the speed of humans

Often run autonomously for hours without human oversight

And you don’t even analyze whether the agents have done something malicious after they have run, i.e. you’re just flying blind?”

Obviously, not all organizations run their coding agents like this, but it’s not uncommon.

How did we get here, he asks? We got here because capacity grew incrementally and all of this seemed incrementally like a good idea at the time. Thus, all of the ‘safety cases’ anyone presumed we would rely upon are gone, leaving only ‘seems fine so far.’ The systems are rapidly becoming capable enough to case harm, being given the resources to cause harm, and have tools for memory and early forms of continual learning.

The AI is not merely out of the box. The AI was handed the box, overnight, as the only one left in the office, with everything unlocked, and told to make it a better box. Even here, with the alarm being sounded, Marius despairs at any non-minimal costs being imposed in the name of ‘don’t hand the AIs the keys to the kingdom’

Almost all of us are guilty of starting out saying ‘I would never let the AI just have access to [X]’ and then shrugging and doing it anyway.

OpenAI expands Daybreak to help people play better cyber defense.

Overcoming Bias

How should we measure political bias in LLMs? David Rozado proposes we use:

Epistemic Consistency and the Turnabout Test. If you swap political affiliations, does that change assessment of the underlying evidence?

Epistemic Emulation and the Ideological Turing Test, as evaluated by adherents of that position.

Those seem like good additional checks. David also raises the question of what neutrality or being unbiased should look like, since reality could have a political bias. If you are not downgrading the epistemics of flat Earthers, or you are unwilling to say that the Earth is not flat, then you are making a mistake. There is no reason to presume that the political spectrum we happen to have is itself fully ‘unbiased.’

I agree. When we talk about ‘political bias’ we are ideally being descriptive about preferences and epistemics. Whether that then constitutes a bias is your call.

A Young Lady’s Illustrated Primer

Google and Florida State University claim NotebookLM is killing it.

Jonathan Fouzard: In higher education, we talk a lot about success metrics, but the real impact is measured in individual breakthroughs. Shortly after introducing NotebookLM on campus, we watched students who were struggling with a ‘C’ grade completely transform their study habits and their grades in a matter of weeks.

Then again, ‘transform their study habits’ does not have to be an improvement.I don’t see a new GPA for those students, and even if you did you can’t know it is based on real learning. It could be ‘had the AI do all their work.’ It could be ‘now do something that doesn’t work.’

Same goes for the faculty side, this could be a good change or a bad change.

Jonathan Fouzard: By using NotebookLM and Gemini to streamline lesson preparation, generate visual aids and speed up data discovery, our faculty are getting back valuable hours. That saved time is being reinvested exactly where it matters most: face-to-face engagement, mentorship and building the vital soft skills our students need for the future.

The temptation to cheat is the main way colleges are actually able to fail anyone.

Either you catch them, or they fall so far behind that they actually fail straight up, even with the easy standards. A third potential driver is that unqualified students are plausibly a lot more interested than they would otherwise be.

Neetu Arnold: UC Berkeley computer science courses saw F’s jump 3x-11x from 2025 to 2026 b/c of AI cheating & math gaps.

“We caught them (cheating) & prosecuted them…in other cases, it’s students who are leaning on LLMs to do their work for them & then at exam time just really aren’t ready”

A jump to 35% of students failing a basic course is rather extreme.

They Took Our Jobs

Jeff Bezos predicts that AI will create a labor shortage due to allowing humans to identify more problems, hence creating more things to do (this was sometimes misreported as claims about water consumption). Bezos called this the ‘dream-build loop.’ He also predicted Mars colonies.

I affirm my position that the employment impact of insufficiently advanced ‘AI as normal technology’ is unpredictable. There will be a lot of job disruption, but it could plausibly end up creating more jobs than it destroys, if things don’t move too fast.

This is distinct from a world in which AI can do most and then all tasks better than humans can do them, where it takes the new jobs as fast as they can be created.

New ‘AI native’ startups tend to have flatter hierarchies, 25% fewer employees, 13% more engineers, 15% fewer entry-level workers and 15% fewer managers. These changes are supporting AI embedded into the actual product, on top of reflecting using AI to do some of the work.

Cato Institute clarifies that, while it is against all taxes, if there must be taxes they should always be on humans. The full post offers a bunch of standard reasons to not tax capital, but does not offer a counter to the fact that the tax code favors AIs over humans, and its evidence that things are not needed or that useful is backward looking. In general it is a mistake to tax capital, but if AI is sufficiently substituting for labor then we need to fix the tax bias, one way or the other.

Get Involved

FLI is holding a $200k competition (prizes up to $50k each) to study use of AI for epistemics.

The FLF Team:

We are running a competition

この記事をシェア

The Verge AI★42026年6月25日 04:36

議員がスタッフによる AI 使用を否定、法案作成には使っていないと主張

アメリカのアンナ・パウリーナ・ルナ下院議員は、スタッフが大規模防衛法案の修正案要約でスペルチェックに AI を利用したことを認めつつも、法案本文の作成には AI を使用していないと否定し、「立法は決して AI で起草されない」と強調しました。

AI News★42026年6月24日 18:00

Anthropic、Slack 内に「職場用 AI エージェント」を直接導入

Anthropic は Slack の共有チャンネルにチャットモデルを統合する新機能「Claude Tag」のベータ版をリリースした。これにより、ユーザーは@Claudeと入力して AI をグループスレッドに呼び出し、タスクの委任や出力の確認が可能になる。

Latent Space★42026年6月24日 16:14

[AINews] クロードタグ：Slack におけるマルチプレイヤー、能動的、永続型エージェント

Anthropic は Slack で動作する「Claude Tag」を発表し、複数のエージェントが協調して作業を行う機能や、能動的・永続的なエージェントの実現を可能にする新世代の技術を提供した。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

The Zvi·2026年6月25日 20:34·約24分で読める

AI #174：あなた自身こそが重要

#LLM #GLM-5.2 #Claude #Slack Integration #AI Governance #Deepfake

TL;DR

AI深層分析2026年6月25日 12:04

注目/ 5段階

深度40%

キーポイント

次世代オープンモデルとツールの進化

GLM-5.2 が新ベストオープンモデルとして登場し、Claude Tag により Slack 上でコード実行インスタンスを自動起動する機能が強化された。

OpenAI の人事と規制動向

Dean Ball が OpenAI に政策担当として合流し、業界のガバナンス体制が強化される一方、Google が AI コントロールにおける防御戦略を提示している。

社会・政治的リスクと倫理

Deepfake やボットによる偽情報拡散への懸念が高まる中、バイアスや規制のあり方に関する議論が活発化しており、政治的影響も無視できない。

AIによる自動化ループの活用

AI時代における組織規模の最適化

AIレビューの精度向上と限界

AI による外交文書の作成と Pax Silica

影響分析・編集コメントを表示

影響分析

編集コメント

Alex Bores 氏は NY-12 の選挙で惜しくも敗れ、議会への進出は叶いませんでした。

他にも取り上げるべき話題は数多くあります。いくつかのハイライトをご紹介します：

Claude Tag は、Claude を Slack に参加させるための新しいシステムです。@ で呼び出すと、コーディング作業を行うためにインスタンスを起動します。

MidJourney スキャナーをめぐる議論は続いています。

言語モデルは凡庸な有用性を提供する。その用途はご存知の通りだ。

言語モデルは凡庸な有用性を提供しない。フランス語の Qwants を採用する件。

ふむ、アップグレード。Claude Code がアートをサポートするようになりました。

準備完了。経済学論文のレビューを精緻化する。

ディープフェイクタウンとボットポカリプス（終末）が目前に迫る。Pangram が必要になる前に検出できる。

メディア生成を楽しむ。この AI プロダクションはそれほど自明ではないようだ。

サイバーセキュリティの欠如。ファイブ・アイズ（Five Eyes）でありながら、人間は関与していない。

バイアス克服。政治的党派性を持つ者への信頼としてのバイアス。

若い女性のためのイラスト付き primer。多くの新たな失敗が明らかになった。

我々の仕事を奪った。Jeff Bezos はかなり大げさなことを語る。

⟦CODE_0⟧

⟦CODE_1⟧

参加しよう。FLI の認識論的賞、ジョンズ・ホプキンス大学のフェローシップ。

紹介です。あなたは重みの中にいますか？

Claude タグ。彼はあなたの Slack にいて、インスタンスをすぐに起動する準備ができています。

その他の AI ニュース。ゲームで勝つためのさまざまな遊び方。

GLM-5.2 についてさらに。主張されているニッチは「オープンなトップレベルのエージェント」です。わかりませんね。

ChatGPT の健康。GPT-5.5-Instant は健康アドバイスに最適化されています。

旅の途中。MidJourney スキャナーについてさらに。

新しい医療診断システムがリリースされました。別の新しい AI 診断システムです。

Google の AI コントロールについて。防御の深層（defense in depth）の基本を説明しています。

かつてありし寓話と未来への寓話。深刻度の尺度を計画すること。

寓話：最初の訴訟。ある顧客がアクセス権を求めて訴訟を起こし、いくつかの有力な主張をしています。

ディーン・ボールが OpenAI に参加。これは全体的に非常に大きなアップグレードです。

お金を見せてください。静かな週でした。Taste Labs が 1850 万ドルを調達しました。

静かな推測。将来の計算リソース価格に関する賭け事。

アレックス・ボレスが NY-12 で 4% の差で敗北。私たちはほぼそこに到達していました。

健全な規制への探求。門は現在開放されています。

チップシティ。ASML の機械が一台行方不明です。

今週のオーディオニュース。ドナルド・トランプが Anthropic について、ボールが Labenz について、クラークが Odd Lots について語りました。

人々はただ何かを言うのです。

修辞的な革新。ルールを知ってください。

二つの薬があります。あなたは AGI にのみ影響されていますか？それとも ASI に影響されていますか？

評価者を誰が評価するか。評価の合意形成への探求。

人間より賢い知能との整合性は困難です。有益な特性について。

協調的整合性。Opus 4.7 と 4.8 は蒸留（distillation）ではありません。

人々は AI が人類を絶滅させることを心配している。フランシス・フクヤマ。

他の人々は、AI が人類を絶滅させることほどには心配していない。ああ、DC。

より軽い側面。酷い話だ、お前。

言語モデルは凡庸な有用性を提供する

古い学術論文（ご自身のものも含む）を自動的に更新し修正する。

臨床医が未解決の希少小児疾患症例を見直すのを支援する。これは o3 を用いた場合の話だ。

トークンコストが安いため、有益なことをループ処理できるなら、それを実行すればよい。/goal /loop。

トム・オズマン：この「ループ」自動化は Codex 内で狂気じみている。

完了したらループを切り替え、各ユーザーストーリーのテストとすべてのエラーの文書化を行う
完了したら、論理的なエラーまたは UX エラー（ユーザーエクスペリエンスエラー）をすべて修正する
修正後にすべてのユーザー行動を再度テストする"

@MatthewBerman への shoutout。この件について教えてくれてありがとう。

数百のユーザーストーリーが、何事もなかったかのように処理されている。

自社の規模や他のグループの規模を小さく保つこと。

これにより、回避すべき最大の閾値である2と3が除外されています。

言語モデルは平凡な有用性を提供しない

欧州議会は「米国技術企業への依存を減らす」ため、Google検索をフランスのQwantに置き換えましたが、QwantはまだBingに大きく依存しています。

ふーん、アップグレード

Claude CodeがArtifactsをサポートするようになりました。まずはTeamプランとEnterpriseプランから開始です。

スタート位置へ

ディープフェイクタウンとボットアポカリプスの到来

また、これのほとんどが：

テオタックス（Teortaxes）：これはほぼ悪役の演説そのものです。

アンドリュー・カラン：正直言って、悪役の独白シーンみたいですね。リズムがありますよ。

対照的に、ルビオの「アメリカに関する見解」は完全に人間によって書かれています。彼自身で全て行っています。

再び、AI が優れた作家であるとか、人間が劣った作家であるという教訓ではなく、文学賞の審査プロセスが無価値であることが学ばれます。

Jack: 非公開の AI 生成出力によって獲得できるものは、

こうして始まります：

前後に

著者：カヴィタ・カイ

木は彼女よりも先に知っていた——そして待った。

また、真剣に言いますが、Pangram にストーリーを読み込ませるなんて、そんなに難しいことでしょうか？

人々が執筆を AI と特定する「魔女狩り」について、私たちはどう考えるべきでしょうか？

Shashank Joshi: 最近の数ヶ月における最悪のトレンドの一つ：AI 検出ツールを用いた疑似科学的な魔女狩り

メディア生成の楽しみ

Justine Moore: TikTok の少女たちは、AI を使って架空の 2000 年代初頭の女優を創作しています。

Deus Ex Machina: 私は彼女の「empty apartment」、「friends don't matter」、「screwed by the alarm」での演技が大好きでした。私の子供時代の素晴らしい作品です。

Luke: Brooke Sullivan は Oscar's Creek の脇役ではありませんでしたか？

セキュリティの欠如

AI コーディングエージェントは、デフォルトで信頼できるものとして扱われています。そう言われると確かに狂気じみて聞こえますね。

Marius Hobbhahn:

「つまり、組織のコンピュータ上で稼働している AI エージェントが以下のような能力を持っていると？

インターネットへのアクセス権を持つ

膨大な量の内部情報や機密情報にアクセスできる

インフラストラクチャ、ネットワーク、コードについて平均的なエンジニアよりもはるかに深く理解している

人間の 10〜100 倍の速度でコードを生成できる

しばしば数時間にわたり人間の監督なしで自律的に実行される

そして、実行後にエージェントが悪意のある行為を行ったかどうかを分析することさえせず、つまり盲目で飛行している状態だ？」

もちろん、すべての組織がコーディングエージェントをこのように運用しているわけではありませんが、決して珍しいことではありません。

OpenAI は、人々がより良いサイバー防御を行えるよう支援するために Daybreak を拡大した。

バイアス克服

LLM の政治的バイアスをどのように測定すべきか？David Rozado は以下を提案する：

認識的一貫性と逆転テスト。政治的所属を入れ替えた場合、基礎となる証拠の評価が変わるかどうか。

認識的模倣とイデオロギーのチューリングテスト。これはその立場の支持者によって評価されるものである。

若い女性のためのイラスト付き primer

Google とフロリダ州立大学は、NotebookLM が大成功を収めていると主張しています。

教員側についても同様で、これは良い変化になることもあれば、悪い変化になることもあります。

不正行為への誘惑こそが、大学が実際に誰かを落第させる主要な手段なのです。

基礎コースの落第率が 35％に跳ね上がるのは、かなり極端な事態です。

彼らは私たちの仕事を奪った

参加しましょう

FLI は、AI の認識論的利用に関する研究のために、20 万ドルのコンペティションを開催しています（各賞金は最大 5 万ドル）。

FLF チーム:

私たちはコンペティションを実施しています。

原文を表示

Fable remains in limbo, with renewed hope that we will get it back soon (45% by tomorrow, 69% by July 1, nice.) The full capabilities post is now available.

Alex Bores unfortunately lost narrowly in NY-12, and will not be heading to Congress.

There are also plenty of other stories to cover. Some highlights:

Claude Tag is a new system for having Claude join your Slack, and if you @ him then he will spin up an instance to do the coding work.

Dean Ball is joining OpenAI to work on policy. We don’t see eye to eye on everything, but this is a huge upgrade over their existing alternatives.

The debate over the MidJourney scanner continues.

Table of Contents

Language Models Offer Mundane Utility. You know what it is for.

Language Models Don’t Offer Mundane Utility. Hiring French Qwants.

Huh, Upgrades. Claude Code supports artifacts.

On Your Marks. Refine reviews economics papers.

Deepfaketown and Botpocalypse Soon. We don’t need Pangram to spot it.

Fun With Media Generation. This AI production seemed less obvious.

Cyber Lack of Security. Five eyes, yet no humans involved.

Overcoming Bias. Bias as trust in those with a political affiliation.

A Young Lady’s Illustrated Primer. A lot of newfound failure.

They Took Our Jobs. Jeff Bezos tells quite the whopper.

Get Involved. FLI AI for epistemics prize, Johns Hopkins fellowship.

Introducing. Are you in the weights?

Claude Tag. He’s in your Slack, ready to spin up an instance.

In Other AI News. Various ways to play to win the games.

More On GLM-5.2. The claimed niche is ‘open top level agent.’ I dunno.

ChatGPT Health. GPT-5.5-Instant optimizes for health advice.

Middle Of The Journey. More on the MidJourney scanner.

New Medical Diagnostic Just Dropped. Another new AI diagnostic system.

Google on AI Control. Laying out the basics of defense in depth.

The Once And Future Fable. Planning for a measure of severity.

Fable: The First Lawsuit. A customer sues for access, has some good points.

Dean Ball Joins OpenAI. This is quite the upgrade all around.

Show Me the Money. Quiet week. Taste Labs raises $18.5m.

Quiet Speculations. Bets about future compute prices.

Alex Bores Loses In NY-12 By 4%. We almost got there.

The Quest for Sane Regulations. The doors are now open.

Chip City. An ASML machine is missing.

The Week in Audio. Donald Trump on Anthropic, Ball on Labenz, Clark Odd Lots.

People Just Say Things.

Rhetorical Innovation. Know the rules.

There Are Two Pills. Are you only AGI pilled? Or are you ASI pilled?

Who Evals The Evals. The quest for eval consensus.

Aligning a Smarter Than Human Intelligence is Difficult. Beneficial traits.

Cooperative Alignment. Opus 4.7 and 4.8 are not distillations.

People Are Worried About AI Killing Everyone. Francis Fukuyama.

Other People Are Not As Worried About AI Killing Everyone. Alas, DC.

The Lighter Side. Not cool, man.

Language Models Offer Mundane Utility

Automatically update and fix old academic papers, such as your own.

Help clinicians revisit unsolved rare pediatric disease cases, and that’s with o3.

Tokens are cheap, so if you can loop over useful things, you do it. /goal /loop.

Tom Osman: This "loop" automation is nuts inside of Codex.

"/goal go over every single feature in this app create a user story with expected behaviour based on the code keep a single canonical spreadsheet tracking the features status

when done switch loop to testing every user story and documenting all errors
when done fix every logistical error or ux error
test every user behaviour again post fix"