TLDR AI·2026年5月11日 09:00·約21分

メタ・メタ・プロンプティング：AI エージェントを機能させる秘訣（16 分読）

#Meta-Meta-Prompting #AI Agents #オープンソース #システム設計

TL;DR

この記事は、AI を単なるチャットツールではなくオペレーティングシステムとして扱う「Meta-Meta-Prompting」手法を提案し、オープンソースで実装可能な本格的な自律型 AI エージェントの構築法を示している。

AI深層分析2026年5月12日 00:06

重要/ 5段階

深度40%

キーポイント

AI のパラダイムシフト：チャットから OS へ

従来の対話型インターフェースに依存せず、AI をシステム全体の「オペレーティングシステム」として機能させるアプローチの重要性を説く。

Meta-Meta-Prompting の概念

プロンプト自体をメタレベルで設計・最適化する手法により、複雑なタスクを自律的に実行できるエージェントを構築する技術的枠組みを紹介する。

オープンソースによる実装可能性

記事で提案されるすべての概念とコードは GitHub で公開されており、誰でも無料で本格的な AI システムの構築に着手可能である点を強調している。

影響分析・編集コメントを表示

影響分析

この記事は、AI エージェント開発におけるパラダイムシフトを示唆しており、単なるプロンプトエンジニアリングから、システム全体の設計思想へと視座を高める必要性を浮き彫りにしています。特に、すべての技術がオープンソースで公開されている点は、開発コミュニティにおける実装のハードルを劇的に下げ、自律型 AI システムの実用化を加速させる重要な転換点となるでしょう。

編集コメント

「チャットボックス」の限界を超え、真の意味で自律する AI エージェントを構築するための具体的なロードマップが示された貴重な記事です。開発者はぜひ GitHub のリポジトリを確認し、実装を試みるべきでしょう。

Article

Conversation

Meta-Meta-Prompting: AI エージェントを機能させるための秘密（16 分間の読了）

人々はなぜ私が夜中に午前 2 時までコーディングしているのかと私に尋ね続けます。私は CEO として Y Combinator を率いており、大きな責任のある仕事を持っています。私たちは年間数千名のビルダーを支援し、彼らが現実の収益を生み出し、急速に成長する本物のスタートアップを構築するという夢を実現できるよう手助けしています。

過去 5 ヶ月間、AI が私を再びビルダーへと変えました。昨年末、ツールの質が十分に向上したため、私は再び構築活動に戻りました。おもちゃのようなプロジェクトではありません。複利効果を生む本物のシステムです。私は、チャットウィンドウとして扱うのをやめ、オペレーティングシステムとして扱い始めたときに、パーソナル AI が具体的にどのようなものになるのかを、具体的な例を示しながらお見せしたいのです。そして、私がこれをオープンソース化し、このような記事として公開するのは、あなたにも私と一緒にスピードアップしてほしいからです。

これはシリーズの一部です：

最初の投稿でコアアーキテクチャを紹介しました。

2 番目の投稿ではインテリジェンスのためのルーティングテーブルについて取り上げました。

3 番目の投稿は、すべての技術者が自分自身を 100 倍から 1000 倍に増やす方法についてでした。

4 番目の投稿では、モデルは車ではなくエンジンであると論じました。そして

5 番目の投稿では、LangChain がなぜ 1.6 億ドルを調達し、トレーニングプランなしでスクワットラックとダンベルセットを提供した上で、その後必要なそのトレーニングプランを与えたのかについて説明しました。

先月、私はペマ・チョードン（Pema Chödrön）の『When Things Fall Apart』を読んでいました。この本は 162 ページで、苦しみ、基盤のない状態、そして手放すことに対する仏教的アプローチを 22 の章にわたって解説しています。友人が困難な時期にこれを勧めました。

私は AI に「ブックミラー（book mirror）」を実行させました。

具体的に何を意味するかというと、システムは本書の全 22 チャプターを抽出し、各チャプターに対してサブエージェントを実行しました。このサブエージェントは同時に二つのことを実行します：著者のアイデアを要約すること、そしてそのすべてのアイデアを私の実際の人生にマッピングすることです。これは「リーダーにも適用できる」といった汎用的な陳腐なものではありません。具体的なマッピングです。システムは私の家族の歴史（移民出身の両親、父は香港とシンガポール出身、母はビルマ出身）を知っています。また、私の専門的な文脈（YC の運営、オープンソースツールの構築、数千名の創業者へのメンタリング）も知っています。私が何を読んでいるか、深夜 2 時に何を考えているか、セラピストと私が取り組んでいることもすべて把握しています。

出力結果は 30,000 語の「脳ページ」でした。各チャプターは二つの列としてレンダリングされています：ペマが何と言っているか、そしてそれが私の実際の経験にどう対応するかです。「根拠のない状態」に関するチャプターは、先週行った特定の創業者との会話につながりました。「恐怖」に関するチャプターは、セラピストが特定したパターンに対応しました。「手放すこと」に関するチャプターは、今年見つけた創造的自由について書き記した深夜のセッションを参照しています。

全体で約 40 分かかりました。時給 300 ドルのセラピストがこの本を読み、私の人生に適用しようとしても、40 時間では不可能です。なぜなら、彼らには私の専門的な文脈、読書履歴、会議のメモ、創業者との関係性といったすべての情報がフルグラフとしてロードされ、相互参照可能になっているわけではないからです。

私はこれまでに20冊以上の書籍に対してこの手法を適用してきました：『Amplified』（Dion Lim）、バートランド・ラッセルの自伝、『Designing Your Life』、『Gifted Child のドラマ』、『Finite and Infinite Games』、リンダバーグ著『海からの贈り物（Gift from the Sea）』、ヘッセ著『シッダールタ』『荒野の狼（Steppenwolf）』、ハミング著『科学と工学の実践（The Art of Doing Science and Engineering）』、『The Dream Machine』、アラン・ワッツ著『自分自身を知ることを禁じるタブーに関する本（The Book on the Taboo Against Knowing Who You Are）』、フィーンマン著『他人が何を思うか気にするな（What Do You Care What Other People Think）』、ペマ・チョードロン著『ものが崩れ落ちるとき（When Things Fall Apart）』、ケン・ウィルバー著『万物の簡史（A Brief History of Everything）』などです。それぞれの書籍は、脳が豊かになるにつれてより深みを増していきます。2 番目のミラーは 1 番目を認識しており、20 番目のミラーはそれまでの 19 冊すべてを認識しています。

私が最初に作成した書籍用ミラーはひどいものでした。バージョン 1 には家族に関する 3 つの事実誤認が含まれていました。両親が離婚していると言っていたのですが、実際はそうではありませんでした。また、私はカナダで生まれたのに香港で育ったと記述されていました。共有してしまった場合、信頼を損なう可能性のある基本的なミスです。

そこで必須の事実確認ステップを追加しました。現在では、すべてのミラーが公開前に、既知の脳内事実に基づいてクロスモーダル評価（cross-modal evaluation）を実行します。Opus 4.7 1M は精度エラーを検出し、GPT-5.5 は文脈の欠落を捉え、DeepSeek V4-Pro は内容が一般論のように読まれてしまうケースを検出します。

必ず JSON 形式で返してください。translation フィールドのみ。他のフィールド (technical_terms 等)は一切追加しないこと — 余計なフィールドを書こうとして本文翻訳がトークン上限で打ち切られる事故を防ぐため:

{"translation": "翻訳全文"}

その後、GBrain ツールを使用した深層検索（deep retrieval）にアップグレードしました。元のバージョンは統合には優れていましたが、具体性に欠けていました。バージョン 3 ではセクションごとの脳内検索を実行します。右側の列のすべての項目が、実際の脳ページを引用しています。本で難しい対話への対処法について語る際、単なる一般原則を統合するだけでなく、共同創業者と困難な対話をしていた特定の創業者との実際の会議メモから引き出したり、兄ジェームズと木曜日に過ごしたときに思いついたアイデアや、19 歳の時に大学時代のルームメイトとの IM チャットの内容を引き出したりします。これは奇妙なくらいです。

これが、GBrain で「/skillify」コマンドを使用するスキル化（skillification）の実践における意味です。私は最初の手動試行を行い、反復可能なパターンを抽出し、トリガーとエッジケースを含むテスト済みのスキルファイルを作成しました。その結果、すべての未来の書籍ミラーにおいて修正が積み重されました。

ここから再帰的な段階に入り、ここで最も重要な洞察があると考えられます。

私の人生を動かすシステムは、単一の巨大な塊として存在していたわけではありません。それはスキルから組み立てられたものでした。そして、それらのスキル自体も、ある「スキル」によって作成されたものです。

スキルフィは、新しいスキルを生成するメタスキルです。繰り返し行うワークフローに出会うと、「これをスキルフィして」と指示すると、システムが直前の動作を検証し、反復可能なパターンを抽出し、トリガーやエッジケースを含むテスト済みのスキルファイルを作成してレジスターに登録します。ブックミラーパイプラインは、手動で初めて行った際にスキル化されました。会議準備ワークフローも、毎回同じ手順を実行していることに気づいた後にスキル化されました。

スキルは合成可能です。ブックミラーは、ストレージのためにブレインオプスを呼び出し、コンテキストの強化のためにエンリッチを呼び出し、品質評価のためにクロスモーダル・イバル（cross-modal-eval）を呼び出し、出力のために PDF 生成を呼び出します。各スキルは一つのことに特化しており、それらが連鎖して複雑なワークフローを構築します。あるスキルを改善すると、そのスキルを利用するすべてのワークフローが自動的に向上します。「プロンプトでこのエッジケースに言及し忘れた」という再也不用ありません。スキルが記憶しています。

デミス・ハサビスが YC のファイアサイドチャットに登壇しました。セバスチャン・マラビーによる彼の伝記がちょうど出版された直後でした。

私はシステムに、私の準備を支援するよう指示しました。

2 分足らずのうちに、デミスの完全な脳ページ（記事、ポッドキャストの文字起こし、そして私の自身のノートから数ヶ月かけて蓄積されたもの）が引き出されました。彼が公表している AGI のタイムラインに関する信念（「50% スケーリング、50% イノベーション」、AGI は 5〜10 年後に到来すると考えている）、マラビーの伝記におけるハイライト、彼が表明した研究優先事項（継続的学習、ワールドモデル、長期記憶）、AI について私が公言してきたことへの相互参照、会話中に脳のマルチホップ推論能力を示すためのデモスクリプト 3 つ、そして私たちの世界観が重なり合う部分と乖離する部分に基づいた一連の会話フック。これらはすべて用意されました。

これは単なる Google 検索の改善ではありませんでした。これは、私がデミスについて蓄積してきた文脈、私の自身の立場、そして会話における戦略的目標を活用した準備でした。システムは事実だけでなく、アプローチ（アングル）も用意しました。

私は約 100,000 ページからなる構造化されたナレッジベースを維持しています。会う人ひとりひとりに、タイムライン、ステータスセクション（現在何が真実か）、未解決のトピック（オープンスレッド）、そしてスコアを含むページが作成されます。すべての会議には文字起こしと構造化されたサマリーが作成され、私が「エンティティ伝播」と呼ぶプロセスが行われます。会議のたびにシステムは言及されたすべての人物と企業を巡り、その脳ページに議論された内容を更新します。読んだ本すべてには章ごとのミラーリングが行われ、関与した記事、ポッドキャスト、動画すべてが取り込まれ、タグ付けされ、相互参照されます。

スキーマはシンプルです。各ページには、上部にコンパイルされた真実（現在の最良の理解）、下部に追加のみ可能なタイムライン（時系列順のイベント）、そしてソース資料のための生データサイドカーが含まれます。これは、会議に参加し、メールを読み、講演を視聴し、PDF を取り込んだ AI によって継続的に更新される、個人用のウィキペディアと考えるとよいでしょう。

これがどのように積み重なるかの例を示します。オフィスアワーで創業者に出会います。システムはその人物のページや会社のページを作成または更新し、会議メモとの相互参照を行い、以前に会ったことがあるかを確認して（前回議論した内容を表面化させ）、申請データを取得し、最新の指標を抽出し、ポートフォリオ企業や連絡先の中でその人の問題に関連するものがいないか特定します。次に彼らと会うために部屋に入る頃には、システムは完全なコンテキストパックを用意しています。

これは単なるファイルキャビネットを持つことと、神経系を持つことの差です。ファイルキャビネットは物を保存しますが、神経系はそれらを接続し、変更されたものをフラグ付けし、現在 Relevant なものを表面化させます。

仕組みは以下の通りです。これがパーソナル AI を構築する正しい方法だと考え、すべてをオープンソース化しましたので、あなた自身で構築することも可能です。

ハーン（基盤）は薄く設計されています。OpenClaw がランタイムとして機能します。これは私のメッセージを受け取り、どのスキルが適用されるかを判断し、ディスパッチします。ルーティングロジックは数行程度です。書籍や会議、創業者に関する知識は一切持っていません。単にルーティングするだけです。

スキルは肥大化しています。現在では100以上あり、それぞれが特定のタスクに対する詳細な指示を含む自己完結型のMarkdownファイルとなっています。すでに上記で「book-mirror」や「meeting-prep」をご覧いただいているでしょう。以下に、これらと共に提供されるいくつかの例を示します：

meeting-ingestion: 会議のたびに、議事録を抽出して構造化された要約を作成し、言及されたすべての人物と企業を順にたどり、議論内容をそれぞれの「脳ページ」に更新します。会議ページ自体が最終成果物ではありません。真の価値は、その情報をすべての人物ページや企業ページへ遡って伝播させる点にあります。
enrich: 人物の名前を入力すると、5 つの異なる情報源からデータを取得し、キャリアの軌跡、連絡先情報、会議履歴、関係性の文脈を統合した単一の「脳ページ」にまとめます。すべての主張には出典が明記されます。
media-ingest: ビデオ、オーディオ、PDF、スクリーンショット、GitHub リポジトリなどを処理します。文字起こしを行い、エンティティ（実体）を抽出して、適切な「脳の場所」へファイルを格納します。私は YouTube の動画やポッドキャスト、ボイスメモに対してこれを頻繁に使用しています。
perplexity-research: 脳を拡張したウェブ検索機能です。Perplexity を経由してウェブを検索しますが、統合する前に脳が既に知っている情報を確認し、何が本当に新しい情報で、何がすでに記録済みなのかを明確に区別して報告します。

私は自分の業務のためにさらに数十個のスキルを開発しており、これらは将来的にオープンソース化する予定です。例えば、メールの優先度付け（email-triage）、投資家からの更新通知を検知してポートフォリオの変化を特定し、数値データを企業ページへ抽出する機能（investor-update-ingest）、スケジュールの衝突検出や移動不可能性の判定を行う機能（calendar-check）、そして市民活動で使用するジャーナリズム研究用のスタックなどです。各スキルには、新しい人間のアシスタントが習得するのに数ヶ月かかる業務知識がコード化されています。誰かが私がどのように AI を「プロンプト」しているかと尋ねた場合、答えはこうなります。「私はプロンプトしていません。スキルこそがプロンプトなのです。」

データは豊かです。脳リポジトリには構造化された知識が10万ページ分蓄えられています。私が関与したすべての人、企業、会議、書籍、記事、アイデアがすべてリンクされ、検索可能で、毎日成長し続けています。

コードもまた豊かです。それを支えるコード（文字起こし、OCR、ソーシャルメディアのアーカイブ化、カレンダー同期、API統合のためのスクリプトなど）も重要ですが、複合的な価値が宿るのはデータの方です。私は1日に100件以上のクロンジョブを実行しており、これらはすべてのものをチェックします：ソーシャルメディア、Slack、メール、私が注意を向けているあらゆるもの。私のOpenClaw/Hermesエージェントたちも私に代わってそれらを確認しています。

モデルは相互交換可能です。精度にはOpus 4.7 1Mを使用し、想起と包括的な抽出にはGPT-5.5を、創造的な作業や第三者の視点にはDeepSeek V4-Proを、速度にはGroq上のLlammaを採用します。タスクに応じてどのモデルを呼び出すかはスキルが決定します。ハーン（基盤）はそれに関わりません。「どのAIモデルが最良か」という問いに対しては、「それは誤った質問です」と答えるべきです。モデルは単なるエンジンに過ぎず、他すべてが車体なのです。

人々は私に生産性について尋ねてきます。私はそれをそのように考えていません。私が考えるのは複利効果です。

参加するすべての会議が脳に蓄積されます。読むすべての書籍が次の書籍の文脈を豊かにします。構築するすべてのスキルが次のワークフローを高速化します。更新するすべての人物ページが次の会議準備を鋭くします。今日のシステムは2ヶ月前の10倍の力を持ち、今後2ヶ月後にはさらに10倍になります。

私がまだ午前 2 時にコーディングしているとき（そして私は定期的にそうしています。AI がビルドする喜びを私に与えてくれたからです）、私は単にソフトウェアを書いているわけではありません。私は、毎時間改善されるシステムに貢献しています。100 の cron ジョブが 24 時間 365 日稼働しています。会議の取り込みは自動的に実行され、メールのトリアージは 10 分ごとに実行されます。ナレッジグラフはあらゆる会話から自らを拡張します。このシステムは日々の議事録を処理し、リアルタイムで私が見過ごしたパターンを抽出します。

これは執筆ツールではありません。検索エンジンでもありません。チャットボートでもありません。それは比喩としてではなく、実際に機能するセカンドブレインです。10 万ページ以上のコンテンツ、100 以上のスキル、15 の cron ジョブ、そして過去 1 年間に私が関与したあらゆる専門的な人間関係、会議、書籍、アイデアの蓄積された文脈を備えた実行中のシステムとして機能します。

私はこのスタック全体をオープンソース化しました。

は、私がこれを構築するために使用したコーディングスキルのフレームワークです（スター数 87,000+）。エージェントがコードを書く必要がある際、私はまだ OpenClaw/Hermes エージェント内でスキルとしてこれを使用しています。そこには優れたプログラム可能なブラウザ（ヘッド付きおよびヘッドレス両方）が含まれています。

は知識インフラストラクチャです。

と

はハーンセス（制御枠組み）です。どちらかを選ぶべきですが、私は通常両方とも使用します。データリポジトリは GitHub にあります。

この論文の主張はシンプルです：未来を支配するのは、企業所有の集中型 AI ツールを使用する個人ではなく、複利効果を生む AI システムを構築する個人です。その違いとは、日記をつけることと神経系を持つことの差にほかなりません。

これを構築したい場合は：

ハーネスを選びましょう。既存のものを使うか、あるいはゼロから独自に構築してください。ハーネスは単なるルーターです。自宅の予備コンピュータで Tailscale を使ってホストするか、Render や Railway のようなクラウドサービスを利用しましょう。
ブレインを起動させましょう。私はこれを実装し、OpenClaw に組み込み、さらに GBrain へと拡張しました。これは LongMemEval で 97.6% の再現率（LLM を検索ループに含めないことで MemPalace を上回る）を示すものであり、本記事で説明されたすべての機能を含む 39 のインストール可能なスキルを同梱しています。コマンド一つでインストール可能。すべての人、会議、記事、アイデアがページを持つ Git リポジトリです。
何か面白いことを実行しましょう。いきなりスキルのアーキテクチャを計画するのではなく、まず行動を起こしてください。レポートを書く。ある人物について調査する。NBA のスコアデータをシーズン単位でダウンロードし、スポーツベットの予測モデルを構築する。ポートフォリオを分析する。あなたが実際に興味を持っていることなら何でも構いません。エージェントを使って実行し、それが良くなるまで反復改良を重ねてから、先ほど紹介したメタスキルである Skillify を実行してパターンを再利用可能なスキルとして抽出します。その後、check_resolvable を実行して新しいスキルがリゾルバーに正しく接続されているか検証してください。このループにより、単発の作業が複利効果を生むインフラへと変換されます。
使い続け、出力結果を確認し続けてください。最初はスキルが中途半端なものになるでしょう。それが狙いです。実際に使用し、生成された内容を読み、何かおかしいと感じたらクロスモーダル評価（cross-modal eval）を実行してください。出力を複数のモデルに通して、あなたが重視する次元について互いに採点させます。これが私が book-mirror 内の事実誤認を発見した方法です。その修正はスキルに組み込まれ、それ以降のすべてのミラーリングがクリーンになりました。6 ヶ月後には、どのチャットボットも模倣できないものを持っているでしょう。なぜなら価値はモデル自体にあるのではなく、あなたが特定の人生や仕事、判断力についてシステムに教えた内容にあるからです。
このシステムで私が最初に構築したものはひどいものでした。100 番目に作ったものは、カレンダー、受信トレイ、会議の準備、読書リストを任せても安心できるものになりました。システムが学習しました。私も学びました。複利曲線は確かに存在します。

ファットスキル。ファットコード。スリムなハネス。LLM 単体ではただのエンジンに過ぎません。あなた自身で車を組み立てることができます。

ここで私が説明したすべてのスキル、ブックミラーパイプライン、クロスモーダル評価フレームワーク、スキルファイサイクル、レゾルバーアーキテクチャ、そして 30 以上のインストール可能なスキルパックはすべて GitHub でオープンソースかつ無料で公開されています：

. Go build.

原文を表示

Article

Conversation

Meta-Meta-Prompting: The Secret to Making AI Agents Work

People keep asking me why I am spending my nights coding til 2AM. I have a job and a big one, as CEO of Y Combinator. We help thousands of builders a year to create their dreams of building real startups with real revenue that grow fast.

In the last 5 months, AI made me a builder again. Late last year, the tools got good enough that I went back to building. Not toy projects. Real systems that compound. I want to show you, with specific examples, what personal AI actually looks like when you stop treating it as a chat window and start treating it as an operating system. And I give it away as open source and in articles like this because I want you to speed up with me.

This is part of a series:

introduced the core architecture.

covered the routing table for intelligence.

was about how every technical person just multiplied themselves by 100x to 1000x.

argued that the model is the engine, not the car. And

explained why LangChain raised $160M and gave you a squat rack and dumbell set without a workout plan, and then gave you that workout plan you needed.

Last month I was reading Pema Chödrön's When Things Fall Apart. It's 162 pages, 22 chapters on Buddhist approaches to suffering, groundlessness, and letting go. A friend recommended it during a hard period.

I asked my AI to do a book mirror.

What that means concretely: The system extracted all 22 chapters of the book, and then, for each chapter, ran a sub-agent that did two things simultaneously: summarized the author's ideas, and then mapped every idea to my actual life. Not generic "this applies to leaders" pablum. Specific mapping. It knows my family history (immigrant parents, dad from Hong Kong and Singapore, mom from Burma). It knows my professional context (running YC, building open-source tools, mentoring thousands of founders). It knows what I've been reading, what I've been thinking about at 2am, what my therapists and I are working on.

The output was a 30,000-word brain page. Each chapter rendered as two columns: what Pema says, and how it maps to what I'm actually living through. The chapter on groundlessness connected to a specific founder conversation I'd had the week before. The chapter on fear mapped to patterns my therapist had identified. The chapter on letting go referenced a late-night session where I'd written about the creative freedom I'd found this year.

The whole thing took about 40 minutes. A $300/hour therapist reading this book and applying it to my life couldn't do this in 40 hours, because they don't have the full graph of my professional context, my reading history, my meeting notes, and my founder relationships all loaded and cross-referenceable.

I've done this with over 20 books now: Amplified (Dion Lim), Autobiography of Bertrand Russell, Designing Your Life, Drama of the Gifted Child, Finite and Infinite Games, Gift from the Sea (Lindbergh), Siddhartha (Hesse), Steppenwolf (Hesse), The Art of Doing Science and Engineering (Hamming), The Dream Machine, The Book on the Taboo Against Knowing Who You Are (Alan Watts), What Do You Care What Other People Think (Feynman), When Things Fall Apart (Pema Chodron), A Brief History of Everything (Ken Wilber), and more. Each one gets richer because the brain gets richer. The second mirror knew about the first. The twentieth knew about all nineteen.

The first book mirror I did was terrible. Version 1 had three factual errors about my family. It said my parents were divorced when they weren't. Said I grew up in Hong Kong when I was born in Canada. Basic stuff that could have damaged trust if I'd shared it.

So I added a mandatory fact-check step. Every mirror now runs cross-modal evaluation against known facts in the brain before it ships. Opus 4.7 1M catches precision errors. GPT-5.5 catches missing context. DeepSeek V4-Pro catches when something reads as generic.

Then I upgraded to deep retrieval with GBrain tool use. The original version was good at synthesis but weak on specificity. Version 3 does per-section brain searches. Every right-column entry cites actual brain pages. When the book talks about dealing with difficult conversations, it doesn't just synthesize general principles. It pulls from my actual meeting notes with specific founders who were having tough conversations with co-founders. Or that idea I had on a Thursday hanging out with my brother James. Or the IM chat I had with my college roommate when I was 19. It's uncanny.

This is what skillification (using /skillify in GBrain) means in practice. I took the first manual attempt, extracted the repeatable pattern, wrote a tested skill file with triggers and edge cases, and every fix compounded across all future book mirrors.

Here's where it gets recursive, and where I think the biggest insight is.

The system that runs my life didn't exist as a monolith. It was assembled from skills. And those skills were themselves created by a skill.

Skillify is a meta-skill that creates new skills. When I encounter a workflow I'm going to repeat, I say "skillify this" and it examines what just happened, extracts the repeatable pattern, writes a tested skill file with triggers and edge cases, and registers it in the resolver. The book-mirror pipeline was skillified from the first time I did it manually. The meeting-prep workflow was skillified after I noticed I was doing the same steps before every call.

Skills compose. Book-mirror calls brain-ops for storage, enrich for context, cross-modal-eval for quality, and pdf-generation for output. Each skill is focused on one thing. They chain together to create complex workflows. When I improve one skill, every workflow that uses it gets better automatically. No more "forgot to mention this edge case in my prompt." The skill remembers.

Demis Hassabis came to YC for a fireside chat. Sebastian Mallaby's biography of him had just come out.

I asked the system to prep me.

In under two minutes it pulled: Demis's full brain page (which had been accumulating for months from articles, podcast transcripts, and my own notes). His published beliefs about AGI timelines ("50% scaling, 50% innovation," thinks AGI is 5-10 years away). The Mallaby biography highlights. His stated research priorities (continual learning, world models, long-term memory). Cross-references to things I've said publicly about AI. Three demo scripts for showing the brain's multi-hop reasoning capability during the conversation. And a set of conversation hooks based on where our worldviews overlap and diverge.

This wasn't just a better Google search. This was preparation that used my accumulated context about Demis, my own positions, and the strategic goals for the conversation. The system prepped not just facts, but angles.

I maintain a structured knowledge base with about 100,000 pages. Every person I meet gets a page with a timeline, a state section (what's currently true), open threads, and a score. Every meeting gets a transcript, a structured summary, and something I call entity propagation: after every meeting, the system walks through every person and company mentioned and updates their brain pages with what was discussed. Every book I read gets a chapter-by-chapter mirror. Every article, podcast, and video I engage with gets ingested, tagged, and cross-referenced.

The schema is simple. Each page has: compiled truth at the top (the current best understanding), an append-only timeline below (events in chronological order), and raw data sidecars for source material. Think of it as a personal Wikipedia where every page is continuously updated by an AI that was at the meeting, read the email, watched the talk, and ingested the PDF.

Here's an example of how this compounds. I meet a founder at office hours. The system creates or updates their person page, their company page, cross-references the meeting notes, checks if I've met them before (and surfaces what we discussed last time), checks their application data, pulls their latest metrics, and identifies if any of my portfolio companies or contacts are relevant to their problem. By the time I walk into the next meeting with them, the system has a full context pack ready.

This is the difference between having a filing cabinet and having a nervous system. The filing cabinet stores things. The nervous system connects them, flags what's changed, and surfaces what's relevant to right now.

Here's how it works. I think this is the right way to build personal AI, and I open-sourced the whole thing so you can build it yourself.

The harness is thin. OpenClaw is the runtime. It receives my messages, figures out which skill applies, and dispatches. A few thousand lines of routing logic. It doesn't know anything about books or meetings or founders. It just routes.

The skills are fat. Over 100 of them now, each a self-contained markdown file with detailed instructions for one specific task. You've already seen book-mirror and meeting-prep above. Here are a few more that ship with

meeting-ingestion: After every meeting, it pulls the transcript, creates a structured summary, and then walks through every person and company mentioned and updates their brain pages with what was discussed. The meeting page is not the end product. The entity propagation back to every person and company page is the real value.
enrich: Give it a person's name. It pulls from five different sources, merges everything into a single brain page with career arc, contact info, meeting history, and relationship context. Cited sources on every claim.
media-ingest: Handles video, audio, PDF, screenshots, GitHub repos. Transcribes, extracts entities, files to the right brain location. I use this constantly for YouTube videos, podcasts, and voice memos.
perplexity-research: Brain-augmented web research. Searches the web via Perplexity, but before synthesizing, checks what the brain already knows so it can tell you what's actually new vs. what you've already captured.

I have dozens more I've built for my own work that I'll probably open source: email-triage, investor-update-ingest that detects portfolio updates in my email and extracts metrics into company pages, calendar-check for conflict detection and travel impossibility, and a whole journalistic research stack I use for civic work. Each skill encodes operational knowledge that would take a new human assistant months to learn. When someone asks how I "prompt" my AI, the answer is: I don't. The skills are the prompts.

The data is fat. 100,000 pages of structured knowledge in the brain repo. Every person, company, meeting, book, article, and idea I've engaged with, all linked, all searchable, all growing every day.

The code is fat. The code that feeds it (scripts for transcription, OCR, social media archival, calendar sync, API integrations) matters too, but the data is where the compound value lives. I run more than 100 crons per day that check all the things: social media, Slack, email, whatever I pay attention to, my OpenClaw/Hermes Agents look at for me too.

The models are interchangeable. I run Opus 4.7 1M for precision. GPT-5.5 for recall and exhaustive extraction. DeepSeek V4-Pro for creative work and third perspectives. Groq with Llamma for speed. The skill decides which model to call for which task. The harness doesn't care. When someone asks "which AI model is best," the answer is: wrong question. The model is just the engine. Everything else is the car.

People ask me about productivity. I don't think about it that way. What I think about is compounding.

Every meeting I take adds to the brain. Every book I read enriches the context for the next book. Every skill I build makes the next workflow faster. Every person page I update makes the next meeting prep sharper. The system today is 10x what it was two months ago, and two months from now it'll be 10x again.

When I'm still up at 2am coding (and I am, regularly, because AI gave me back the joy of building), I'm not just writing software. I'm adding to a system that gets better every hour. 100 cronjobs 24/7. The meeting ingestion runs automatically. The email triage runs every 10 minutes. The knowledge graph enriches itself from every conversation. The system processes daily transcripts and extracts patterns I missed in real time.

This is not a writing tool. It's not a search engine. It's not a chatbot. It's a second brain that actually works, not as a metaphor, but as a running system with 100,000 pages, 100+ skills, 15 cron jobs, and the accumulated context of every professional relationship, meeting, book, and idea I've engaged with in the last year.

I open-sourced the whole stack.

is the coding skill framework (87,000+ stars) that I used to build it. I still use it as a skill inside OpenClaw/Hermes Agent when the agent needs to code. There's a great programmable browser (both headed and headless) in there.

is the knowledge infrastructure.

and

are the harnesses, you should choose but I usually do both. The data repos are on GitHub.

The thesis is simple: the future belongs to individuals who build compounding AI systems, not to individuals who use corporate-owned centralized AI tools. The difference is the difference between keeping a journal and having a nervous system.

If you want to build this:

Pick a harness. , , or build your own from scratch with . Keep it thin. The harness is just the router. Host it on your spare computer at home with Tailscale, or use Render or Railway in the cloud.
Start a brain with . I got , implemented it in OpenClaw, and extended it into GBrain. It's the (97.6% recall on LongMemEval, beating MemPalace with no LLM in the retrieval loop) and it ships 39 installable skills including everything described in this article. One command to install. A git repo where every person, meeting, article, and idea gets a page.
Do something interesting. Don't start by planning your skill architecture. Start by doing a thing. Write a report. Research a person. Download a season of NBA scores and build a prediction model for your sports bets. Analyze your portfolio. Whatever you actually care about. Do it with your agent, iterate until it's good, and then run Skillify (the meta-skill from earlier) to extract the pattern into a reusable skill. Then run check_resolvable to verify the new skill is wired into the resolver. That loop turns one-off work into compounding infrastructure.
Keep using it and look at the output. The skill will be mediocre at first. That's the point. Use it, read what it produces, and when something is off, run cross-modal eval: send the output through multiple models and have them score each other on the dimensions you care about. That's how I caught the factual errors in book-mirror. The fix got baked into the skill, and every mirror since has been clean. In six months you'll have something no chatbot can replicate, because the value isn't in the model. It's in what you've taught the system about your specific life, work, and judgment.

The first thing I built with this system was terrible. The hundredth was something I'd trust with my calendar, my inbox, my meeting prep, and my reading list. The system learned. I learned. The compound curve is real.

Fat skills. Fat code. Thin harness. The LLM on its own is just an engine. You can build your own car.

Everything I described here, all the skills, the book mirror pipeline, the cross-modal eval framework, the skillify loop, the resolver architecture, plus 30+ installable skillpacks, is open source and free on GitHub:

. Go build.

この記事をシェア

AWS Machine Learning Blog重要度42026年6月26日 23:38

Stripe の金融コンプライアンス向け本番級 AI エージェント：AWS ベッドロックでの構築教訓

MarkTechPost2026年6月26日 17:48

Apple のオープンソース Swift ツール「Container」：Apple Silicon で軽量 VM として Linux コンテナを実行

Simon Willison Blog2026年6月26日 02:21

Simon Willison Blog の datasette-export-database 0.3a2 リリース

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

TLDR AI·2026年5月11日 09:00·約21分

メタ・メタ・プロンプティング：AI エージェントを機能させる秘訣（16 分読）

#Meta-Meta-Prompting #AI Agents #オープンソース #システム設計

TL;DR

AI深層分析2026年5月12日 00:06

重要/ 5段階

深度40%

キーポイント

AI のパラダイムシフト：チャットから OS へ

従来の対話型インターフェースに依存せず、AI をシステム全体の「オペレーティングシステム」として機能させるアプローチの重要性を説く。

Meta-Meta-Prompting の概念

オープンソースによる実装可能性

記事で提案されるすべての概念とコードは GitHub で公開されており、誰でも無料で本格的な AI システムの構築に着手可能である点を強調している。

影響分析・編集コメントを表示

影響分析

編集コメント

Article

Conversation

Meta-Meta-Prompting: AI エージェントを機能させるための秘密（16 分間の読了）

これはシリーズの一部です：

最初の投稿でコアアーキテクチャを紹介しました。

2 番目の投稿ではインテリジェンスのためのルーティングテーブルについて取り上げました。

3 番目の投稿は、すべての技術者が自分自身を 100 倍から 1000 倍に増やす方法についてでした。

4 番目の投稿では、モデルは車ではなくエンジンであると論じました。そして

私は AI に「ブックミラー（book mirror）」を実行させました。

{"translation": "翻訳全文"}

ここから再帰的な段階に入り、ここで最も重要な洞察があると考えられます。

デミス・ハサビスが YC のファイアサイドチャットに登壇しました。セバスチャン・マラビーによる彼の伝記がちょうど出版された直後でした。

私はシステムに、私の準備を支援するよう指示しました。

meeting-ingestion: 会議のたびに、議事録を抽出して構造化された要約を作成し、言及されたすべての人物と企業を順にたどり、議論内容をそれぞれの「脳ページ」に更新します。会議ページ自体が最終成果物ではありません。真の価値は、その情報をすべての人物ページや企業ページへ遡って伝播させる点にあります。
enrich: 人物の名前を入力すると、5 つの異なる情報源からデータを取得し、キャリアの軌跡、連絡先情報、会議履歴、関係性の文脈を統合した単一の「脳ページ」にまとめます。すべての主張には出典が明記されます。
media-ingest: ビデオ、オーディオ、PDF、スクリーンショット、GitHub リポジトリなどを処理します。文字起こしを行い、エンティティ（実体）を抽出して、適切な「脳の場所」へファイルを格納します。私は YouTube の動画やポッドキャスト、ボイスメモに対してこれを頻繁に使用しています。
perplexity-research: 脳を拡張したウェブ検索機能です。Perplexity を経由してウェブを検索しますが、統合する前に脳が既に知っている情報を確認し、何が本当に新しい情報で、何がすでに記録済みなのかを明確に区別して報告します。

人々は私に生産性について尋ねてきます。私はそれをそのように考えていません。私が考えるのは複利効果です。

私はこのスタック全体をオープンソース化しました。

は知識インフラストラクチャです。

と

はハーンセス（制御枠組み）です。どちらかを選ぶべきですが、私は通常両方とも使用します。データリポジトリは GitHub にあります。

これを構築したい場合は：

ハーネスを選びましょう。既存のものを使うか、あるいはゼロから独自に構築してください。ハーネスは単なるルーターです。自宅の予備コンピュータで Tailscale を使ってホストするか、Render や Railway のようなクラウドサービスを利用しましょう。
ブレインを起動させましょう。私はこれを実装し、OpenClaw に組み込み、さらに GBrain へと拡張しました。これは LongMemEval で 97.6% の再現率（LLM を検索ループに含めないことで MemPalace を上回る）を示すものであり、本記事で説明されたすべての機能を含む 39 のインストール可能なスキルを同梱しています。コマンド一つでインストール可能。すべての人、会議、記事、アイデアがページを持つ Git リポジトリです。
何か面白いことを実行しましょう。いきなりスキルのアーキテクチャを計画するのではなく、まず行動を起こしてください。レポートを書く。ある人物について調査する。NBA のスコアデータをシーズン単位でダウンロードし、スポーツベットの予測モデルを構築する。ポートフォリオを分析する。あなたが実際に興味を持っていることなら何でも構いません。エージェントを使って実行し、それが良くなるまで反復改良を重ねてから、先ほど紹介したメタスキルである Skillify を実行してパターンを再利用可能なスキルとして抽出します。その後、check_resolvable を実行して新しいスキルがリゾルバーに正しく接続されているか検証してください。このループにより、単発の作業が複利効果を生むインフラへと変換されます。
使い続け、出力結果を確認し続けてください。最初はスキルが中途半端なものになるでしょう。それが狙いです。実際に使用し、生成された内容を読み、何かおかしいと感じたらクロスモーダル評価（cross-modal eval）を実行してください。出力を複数のモデルに通して、あなたが重視する次元について互いに採点させます。これが私が book-mirror 内の事実誤認を発見した方法です。その修正はスキルに組み込まれ、それ以降のすべてのミラーリングがクリーンになりました。6 ヶ月後には、どのチャットボットも模倣できないものを持っているでしょう。なぜなら価値はモデル自体にあるのではなく、あなたが特定の人生や仕事、判断力についてシステムに教えた内容にあるからです。
このシステムで私が最初に構築したものはひどいものでした。100 番目に作ったものは、カレンダー、受信トレイ、会議の準備、読書リストを任せても安心できるものになりました。システムが学習しました。私も学びました。複利曲線は確かに存在します。

ファットスキル。ファットコード。スリムなハネス。LLM 単体ではただのエンジンに過ぎません。あなた自身で車を組み立てることができます。

. Go build.

原文を表示

Article

Conversation

Meta-Meta-Prompting: The Secret to Making AI Agents Work

This is part of a series:

introduced the core architecture.

covered the routing table for intelligence.

was about how every technical person just multiplied themselves by 100x to 1000x.

argued that the model is the engine, not the car. And

explained why LangChain raised $160M and gave you a squat rack and dumbell set without a workout plan, and then gave you that workout plan you needed.

I asked my AI to do a book mirror.

Here's where it gets recursive, and where I think the biggest insight is.

The system that runs my life didn't exist as a monolith. It was assembled from skills. And those skills were themselves created by a skill.

Demis Hassabis came to YC for a fireside chat. Sebastian Mallaby's biography of him had just come out.

I asked the system to prep me.

Here's how it works. I think this is the right way to build personal AI, and I open-sourced the whole thing so you can build it yourself.

meeting-ingestion: After every meeting, it pulls the transcript, creates a structured summary, and then walks through every person and company mentioned and updates their brain pages with what was discussed. The meeting page is not the end product. The entity propagation back to every person and company page is the real value.
enrich: Give it a person's name. It pulls from five different sources, merges everything into a single brain page with career arc, contact info, meeting history, and relationship context. Cited sources on every claim.
media-ingest: Handles video, audio, PDF, screenshots, GitHub repos. Transcribes, extracts entities, files to the right brain location. I use this constantly for YouTube videos, podcasts, and voice memos.
perplexity-research: Brain-augmented web research. Searches the web via Perplexity, but before synthesizing, checks what the brain already knows so it can tell you what's actually new vs. what you've already captured.

The data is fat. 100,000 pages of structured knowledge in the brain repo. Every person, company, meeting, book, article, and idea I've engaged with, all linked, all searchable, all growing every day.

People ask me about productivity. I don't think about it that way. What I think about is compounding.

I open-sourced the whole stack.

is the knowledge infrastructure.

and

are the harnesses, you should choose but I usually do both. The data repos are on GitHub.

If you want to build this:

Pick a harness. , , or build your own from scratch with . Keep it thin. The harness is just the router. Host it on your spare computer at home with Tailscale, or use Render or Railway in the cloud.
Start a brain with . I got , implemented it in OpenClaw, and extended it into GBrain. It's the (97.6% recall on LongMemEval, beating MemPalace with no LLM in the retrieval loop) and it ships 39 installable skills including everything described in this article. One command to install. A git repo where every person, meeting, article, and idea gets a page.
Do something interesting. Don't start by planning your skill architecture. Start by doing a thing. Write a report. Research a person. Download a season of NBA scores and build a prediction model for your sports bets. Analyze your portfolio. Whatever you actually care about. Do it with your agent, iterate until it's good, and then run Skillify (the meta-skill from earlier) to extract the pattern into a reusable skill. Then run check_resolvable to verify the new skill is wired into the resolver. That loop turns one-off work into compounding infrastructure.
Keep using it and look at the output. The skill will be mediocre at first. That's the point. Use it, read what it produces, and when something is off, run cross-modal eval: send the output through multiple models and have them score each other on the dimensions you care about. That's how I caught the factual errors in book-mirror. The fix got baked into the skill, and every mirror since has been clean. In six months you'll have something no chatbot can replicate, because the value isn't in the model. It's in what you've taught the system about your specific life, work, and judgment.

Fat skills. Fat code. Thin harness. The LLM on its own is just an engine. You can build your own car.

. Go build.

この記事をシェア

AWS Machine Learning Blog重要度42026年6月26日 23:38

Stripe の金融コンプライアンス向け本番級 AI エージェント：AWS ベッドロックでの構築教訓

MarkTechPost2026年6月26日 17:48

Apple のオープンソース Swift ツール「Container」：Apple Silicon で軽量 VM として Linux コンテナを実行

Simon Willison Blog2026年6月26日 02:21

Simon Willison Blog の datasette-export-database 0.3a2 リリース

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

メタ・メタ・プロンプティング：AI エージェントを機能させる秘訣（16 分読）

キーポイント

影響分析

編集コメント

Article

Conversation

Article

Conversation

関連記事

メタ・メタ・プロンプティング：AI エージェントを機能させる秘訣（16 分読）

キーポイント

影響分析

編集コメント

Article

Conversation

Article

Conversation

関連記事