TLDR AI·2026年5月8日 09:00·約16分

5時間の一時停止を乗り越えた6時間のコード実行記録（10分読了）

#Codex #LLM #Developer Tools #Context Management

TL;DR

Codex に導入された新機能「persisted goals」により、開発者は端末の再起動やスリープ後もプロンプトを再入力せずに作業を継続できるようになった。

AI深層分析2026年5月9日 22:04

注目/ 5段階

深度40%

キーポイント

状態の永続化機能の実装

ターミナルの再起動、ラップトップのスリープ、長時間の放置といった状況でもゴールの状態が保持される「persisted goals」機能が Codex に実装された。

自動復元によるユーザー体験の向上

システムが自動的に開発者メッセージを再開時に注入するため、ユーザーは時間を計測したり再プロンプトを入力したりする必要がなくなる。

長時間実行タスクへの対応

5 時間の一時停止を含む 6 時間にわたる実行（Codex Run）のような長期間の処理でも、中断から正確に再開できることが実証された。

影響分析・編集コメントを表示

影響分析

この機能は AI コーディングアシスタントの信頼性を高める重要な一歩であり、長期間にわたる複雑なコーディングタスクを中断なく実行できる環境を提供します。これにより、開発者は AI との対話フローを維持しやすくなり、生産性の向上が期待されます。

編集コメント

AI エージェントの信頼性を支える「状態管理」の実現は、単なる機能追加ではなく、開発ワークフローそのものを再定義する重要な転換点です。

TL;DR

/goal は 2026 年 4 月 30 日に Codex CLI v0.128.0 で、名目上の主要機能としてリリースされました。

これは、ターミナルの再起動やラップトップのスリープ、数時間にわたる一時停止を乗り越えても再プロンプトなしで維持される「永続化されたゴール（persisted goals）」を導入するものです。

ランタイム継続機能により、Codex は再開時に開発者からの入力を待つのではなく、自動的に開発者メッセージを注入します。

私は TypeScript のモノレポで実際のセッションを実行しました。経過時間：約 6 時間 44 分。モデルの計算に要した実質的な時間：約 41 分。最終ステータス：TASK_COMPLETE（タスク完了）。

このセッションでは、累積入力トークン数が約 680 万トークン消費されましたが、キャッシュヒット率は約 94% でした。自動コンテキスト圧縮（auto-context-compaction）は一度発火し、これは model_auto_compact_token_limit を経由して設定可能です。

私は Codex を一晩中稼働させるつもりはありませんでした。ベルリン時間 2026 年 4 月 30 日午後 9 時 19 分にセッションを開始し、最初のターンが 57 秒間実行されるのを見届けた後、ラップトップを閉じて就寝しました。5 時間半後に戻ってみると、/goal はすでに再稼働していました。中断した場所から正確に続きを実行していたのです。私は何も再プロンプトしていませんでした。

これが、変更履歴のエントリからは伝わらない /goal の本質です。それは単なる新機能のコマンドではありません。あなたとエージェントとの間の契約そのものが異なるものになったのです。

4 月 30 日にリリースされた内容

Codex CLI v0.128.0（タグ名：rust-v0.128.0）は、2026 年 4 月 30 日にリリースされました。リリースノートにおける主要な見出しは以下の通りです。「アプリサーバー API、モデルツール、ランタイム継続機能、および作成・一時停止・再開・クリアのための TUI コントロールを備えた、永続化された /goal ワークフローを追加しました。」

その一文には多くの情報が凝縮されているので、分解して説明しましょう。

永続化されたゴール（Persisted goals）が核となる概念です。以前の Codex セッションは一時的なものでした。ターミナルを閉じれば、進捗の糸口も失われていました。しかし、/goal コマンドはアクティブなゴールをアプリサーバーの状態に保存するため、プロセスが終了してもその状態は維持されます。

アプリサーバー API（App-server APIs）は、この永続化を支える基盤です。Codex は теперь、ゴールの状態を追跡するローカルサーバー層と通信します。

モデルツール（Model tools）とは、モデル自身がゴールのライフサイクルに対処するためのツールを備えていることを意味します。モデルは、完了のシグナルを送ったり、継続を要求したり、推論の一部としてゴールの状態を検査したりできます。

ランタイム継続（Runtime continuation）は、私がその夜に目にした挙動です。セッションが再開されたとき（または Codex がセッションが再び生存していることを検知したとき）、モデルに作業を続けるよう促す開発者メッセージが自動的に挿入されます。ユーザーが何かを入力する必要はありません。

ターミナル UI 制御（TUI controls）は、この機能の表面積を完成させます。ターミナル UI には、ゴール管理のための明示的な作成、一時停止、再開、クリアのアクションが用意されています。蓋を閉じるだけでなく、意図的に進行中のゴールを一時停止することも可能です。

v0.128.0 の残りの部分についても簡単に触れておきます。スクロールバックのリフロー機能が、テキストが壊れる代わりにターミナルのサイズ変更時に動作するようになりました。新しい codex 更新コマンドにより、CLI の自己更新が可能になりました。タスクが計画に適した候補である場合、composer が plan-mode（計画モード）の提案を表示します。TUI キーマップが設定可能になりました。権限プロファイルも拡張されました。--full-auto フラグは明示的な承認プロファイルに置き換えられるため非推奨となりました。デスクトップアプリも同週に polish 改善を受けましたが、本稿の焦点は CLI にあります。plan-mode 自体は、2026 年 4 月 20 日の v0.122.0 で先に導入されています。/goal はこの基盤の上に構築された機能です。

/goal の実際の動作

基本的なメカニズムは単純明快です。/goal に続けてプロンプトを入力するだけで、Codex がその目標を保存して作業を開始します。セッションが中断された場合（ネットワークのひっかかり、ラップトップの閉じ方、意図的な一時停止）、目標は保持されます。セッションが再開されると、ランタイム継続機能により Codex は自動的に再開されます。

モデルは TASK_COMPLETE または task_complete ツールによって完了をシグナルします。それが発生するまで、目標はアクティブな状態のままです。

この仕組みが、長時間実行される --continue セッションと実際に異なる点は、永続化レイヤーにあります。/goal 以前では、ターミナルを閉じるとセッションは死んでいました。文脈ファイルを慎重に管理しプロンプトを再注入することで連続性を近似することは可能でしたが、それは基本的に Ralph Wiggum Loop がより荒っぽい方法で行っていることに過ぎません。/goal は、この連続性を第一級の機能として扱います。

ここで重要となる設定項目がいくつかあります。~/.codex/config.toml ファイル内の model_auto_compact_token_limit キーは、コンテキストの自動圧縮（compaction）の閾値を設定します。[features] ブロックには機能フラグ（feature flags）が配置されます。model_reasoning_effort キーは、セッションにおける推論の努力レベルを指定します。手放しで自律的に実行したい場合は、approval_policy と sandbox_mode も正しく設定する必要があります。これらについては後ほど説明します。

ターミナルユーザーインターフェース（TUI）も変化しています。実行中のゴールの状態が視覚的に確認できるようになり、プロセスを終了させることなく意図的に実行中のゴールを一時停止できます。再開時にはランタイムの継続状態から自動的に引き継がれます。

実際のセッションがどのようなものかを示します。

対象プロジェクトは私が現在取り組んでいる TypeScript のモノレポです。いくつかのエンドツーエンドシナリオを持つ音声インタビューシステムで、特定の条件の下で正しく動作する必要があります。

自律的な /goal セッションでは、Codex を approval_policy = "never" および sandbox_mode = "danger-full-access" で実行します。この 2 つの設定は、手放しでの長時間実行のための前提条件です。モデルが許可を求めて停止することはなく、作業を行うためにフルファイルシステムアクセス権限を持っています。これは、信頼できるプロジェクトディレクトリで、クリーンな git ステート（状態）から開始する場合にのみ健全な選択となります。

/goal プロンプトは約 600 語でした。私は構造化アプローチを用いて作成しました：XML スタイルのブロックによる目標の整理、モデルが最初に参照すべき 10 以上のファイルからなる明示的な読書リスト、作業ルール（編集前に git status を確認する、grep より rg を優先する、apply_patch を使用する）、4 つの具体的な成功基準を明記した done_when 契約、そして明示的なアンチパターンフェンスです。そのフェンスの一つに「1 つのトランスクリプトを通過させるために文字列一致パッチを追加しない」というものがあります。音声システムに取り組んだ経験があれば、なぜこのフェンスが必要なのかお分かりいただけるでしょう。

そのようなプロンプトを作成すること自体が一つのタスクです。このような作業に対するプロンプト設計のアプローチをご覧になりたい場合は、The Interview Method でそのワークフローを解説しています。

モデル：gpt-5.5。推論エフォート：高。

セッションのタイムライン：

午後 9:19 - /goal を送信。

午後 9:20 - 最初のターン実行開始。57 秒間観察した後、中断しました（turn_aborted）。

5.5 時間後 - ラップトップを閉じました。再プロンプトは行いませんでした。

午前約 2:50 - 戻ってきた際、/goal はすでに開発者メッセージ（「アクティブなスレッド目標に向かって作業を継続する」）を注入し、実行中でした。自律的に動作していました。

コンテキスト圧縮が 1 回発火しました。累積入力トークン数は約 6.7M の時点です。

累積トークン数：入力約 6.8M、出力約 10K、推論トークン約 2.6K。キャッシュヒット率：約 94%。

経過時間：6 時間 44 分。モデルの実際の計算時間：ターン全体で約 41 分。

最終ステータス：TASK_COMPLETE。4 つのターゲットとなるエンドツーエンド音声シナリオすべてが検証を通過しました。

マニュアルによるトランスクリプトレビューでは、プロンプトライプループも、生体確認のスパイラル（liveness spirals）も、早期クローズも発見されませんでした。モデルはシナリオを体系的に処理し、基準を満たした時点で完了と判断しました。

注目に値する現実的な限界が一つあります。私がキャプチャしようとした TTS 最初のバイトタイミングフィールド（TTS first-byte timing field）は、上位レイヤーのライブラリが関連するランタイムイベントを発行しないため、測定できませんでした。モデルはこの事実を正直に文書化しました。アーティファクト内には明示的な null が記載され、そのフィールドが欠落している理由についての注釈も添えられています。このギャップをごまかすことはしませんでした。/goal は自律的な実行を提供できますが、外部環境が実際に露出させているものを迂回させることはできません。

約 94% のキャッシュヒット率は、経済性を成立させるための数値です。680 万トークンの入力量は驚くべき数字に聞こえますが、そのキャッシュ率における実際の増分コストは、名目上の数のごく一部に過ぎないことに気づけば、その脅威は和らぎます。

/goal とラルフ・ウィグムループの比較

以前、ラルフ・ウィグムループについて記事を書きました。この用語を考案したのはジェフリー・ハンツリー氏で、彼のオリジナル投稿が依然として標準的な参照となっています：この手法は本質的に while :; do cat PROMPT.md | claude-code; done（git の履歴をメモリとして利用する）という構造です。これは /goal が解決するのと同じ核心的な問題、つまり「AI エージェントに単一のコンテキストウィンドウよりも長く作業させ続けるにはどうすればよいか」という課題に対する解決策を提供します。

両者のアプローチは性質が異なります。

次元	ラルフ・ウィグムループ	/goal

SetupShell スクリプトまたはプラグイン、外部オーケストレーション

Codex CLI に組み込み済み

State persistenceGit の履歴、ディスク上のファイル

App サーバー API、ネイティブゴール状態

Resume behavior手動再呼び出し

自動ランタイム継続

Context managementイテレーションごとに新鮮なコンテキスト（設計上）

セッション内での圧縮

Reasoning continuityイテレーション間でステートレス

セッション内で連続的

ModelClaude CodeCodex with gpt-5.5

Good for各パスで新鮮な視点が必要なタスク

蓄積されるコンテキストを要する長期的タスク

ラルフ・ウィグムループは実際に有用です。設計上ステートレスであるという特性は、場合によっては利点となります。各イテレーションでは、誤った中間結論を引き継がずに問題にアプローチします。モデルが混乱した場合、次のイテレーションはクリーンな状態から始まります。

/goal は連続性を賭けます。モデルはターンを跨いでコードベースの全体像を構築し、各パスで最初からすべてを読み直す必要がありません。推論が蓄積されるタスク（微妙な相互作用のデバッグや複雑な状態機械のナビゲーションなど）では、連続性が勝ります。自然に反復的で収束的なタスク（テスト追加、リンティング修正など）では、ラルフの新鮮コンテキストモデルも同様にうまく機能することが多いです。

どちらかが正しいデフォルトというわけではありません。これらは異なる形状の問題に対するツールです。

/goal が不適切な選択となる場合

私が/goal を使わないいくつかの状況があります。

未定義の成功基準。 done_when 契約は任意ではありません。開始前に具体的な成功基準を4つ記述できない場合、モデルがいつ完了したかを判断する手段がありません。その結果、タスク完了を早期に宣言するか、無限ループに陥るかのどちらかになります。まずは契約を書きましょう。

探索的作業。 「このコードベースが何をしているのかを把握する」といった初期段階の作業には、人間が関与するプロセス（human-in-the-loop）が有効です。モデルが情報を提示するにつれて、あなたも新たな知見を得ることができます。しかし、/goal の目的は実行であって探索ではありません。

セキュリティ上重要なパス。 私は approval_policy = "never" および sandbox_mode = "danger-full-access" で動作しています。この設定は、完全に信頼できるプロジェクトディレクトリ内でのみ適切です。認証システムや決済フローなど、機密データにアクセスするあらゆる箇所では、承認プロセスをループ内に含める必要があります。

不明確な外部依存関係。 タスクが、あなたが確信を持っていない外部システムに依存している場合、まずはその実態を確認してください。前述の TTS タイミングフィールドは軽微な事例です。よりコストのかかるケースとしては、5 時間目に壁にぶつかり、6 時間の実行が失敗する例があります。これは、外部 API があなたが想定した機能をサポートしていないことが原因です。

短時間のタスク。 /goal にはオーバーヘッドが発生します。10 分程度の対話式 Codex で完了できるタスクを、永続的なゴールにラップしても改善されません。ある閾値以下の複雑さであれば、その労力は見合いません。私の大まかなヒューリスティック（経験則）としては、旧モデルでは2回以上のセッションにまたがって行わないと快適に完了できないタスクでない限り、/goal は不要である可能性が高いです。

マインドセットの転換

必ず JSON 形式で返してください。translation フィールドのみ。他のフィールド (technical_terms 等)は一切追加しないこと — 余計なフィールドを書こうとして本文翻訳がトークン上限で打ち切られる事故を防ぐため:

{"translation": "翻訳全文"}

旧: 自律型 AI の実行は、何かがうまくいかなくなった際に介入する準備ができているセッションを監視することです。

新: 自律型 AI の実行とは、事前に契約書を作成し、あとは邪魔をしないことです。

この変化は、監督者から設計者への転換です。/goal セッションの品質は、最初のターンが実行される前にほぼ完全に決定されます。プロンプトの質、成功基準、アンチパターンによる境界線、読書リストなどです。一度開始されれば、あなたの仕事はほとんど完了します。契約書をうまく作成していれば、モデルは実行します。そうでなければ、どれだけ監視しても救うことはできません。

これは対話型プロンプティングとは異なるスキルであり、会話をするよりも仕様書を書くことに近いです。

結論

/goal は、Codex がプランモード以来出荷した中で最も重要な機能です。永続化層とランタイム継続機能が、実際には長い --continue セッションとの違いを生み出しています。壁時計で 6 時間 44 分、実際の計算時間は 41 分という実行が可能なのは、モデルがコンテキストを維持し、キャッシュが保持され、私が何も手を加えなくても 5 時間のギャップを goal が生き延びたからです。

経済性が成立するのはキャッシュヒット率によるものであり、品質が保たれるのは事前のプロンプトの規律によるものです。これら二つの要素はどちらも自動的には実現しません。

これは 2 部構成シリーズの第 1 報です。続編ではワークフロー側、つまり /goal に到達する前に仕様がプロンプトとしてどのように準備されるかについて取り上げます。SPEC.md から /goal へ：私の Codex + GPT-5.5 ワークフロー。

ソース

Codex v0.128.0 リリースノート

Codex チェンジログ

Codex CLI 機能リファレンス

ジョフリー・ハンリー：ラルフ・ウィグム、伝説の山羊

原文を表示

TL;DR

/goal shipped in Codex CLI v0.128.0 on April 30, 2026 as a named headline feature.

It introduces persisted goals: a goal state that survives terminal restarts, laptop sleeps, and multi-hour pauses without re-prompting.

Runtime continuation means Codex injects a developer message on resume rather than waiting for you to type anything.

I ran a real session on a TypeScript monorepo. Wall time: about 6h 44min. Actual model compute: about 41 minutes. Final status: TASK_COMPLETE.

The session burned roughly 6.8M cumulative input tokens at a ~94% cache hit rate. Auto-context-compaction fired once, configurable via model_auto_compact_token_limit.

I did not plan to run Codex overnight. I started a session at 9:19 PM Berlin time on April 30, watched one turn run for 57 seconds, then closed the laptop and went to bed. When I came back five and a half hours later, /goal was already running again. It had picked up exactly where it left off. I had not re-prompted anything.

That is the thing about /goal that does not come through in a changelog entry. It is not just a new command. It is a different contract between you and the agent.

What Shipped on April 30

Codex CLI v0.128.0 (tagged rust-v0.128.0) dropped on April 30, 2026. The headline from the release notes: “Added persisted /goal workflows with app-server APIs, model tools, runtime continuation, and TUI controls for create, pause, resume, and clear.”

That one sentence packs a lot in, so let me pull it apart.

Persisted goals are the core idea. Previous Codex sessions were ephemeral. Close the terminal, lose the thread. /goal stores the active goal in app-server state, so it outlives the process.

App-server APIs is the plumbing behind that persistence. Codex now talks to a local server layer that tracks goal state.

Model tools means the model itself gets tools for interacting with the goal lifecycle. It can signal completion, request continuation, and inspect goal state as part of its reasoning.

Runtime continuation is the behavior I saw that night. When you resume (or when Codex detects the session is alive again), it injects a developer message prompting the model to continue working. You do not have to type anything.

TUI controls rounds out the surface area. The terminal UI gets explicit create, pause, resume, and clear actions for goal management. You can pause a running goal intentionally, not just by closing the lid.

The rest of v0.128.0 is worth a quick mention. Scrollback reflow now works on terminal resize instead of the text getting mangled. A new codex update command handles CLI self-updates. The composer shows plan-mode nudges when a task seems like a good candidate for planning. TUI keymaps are now configurable. Permission profiles are expanded. The --full-auto flag is deprecated in favor of explicit approval profiles. The desktop app also got polish improvements the same week, though the focus of this post is the CLI. Plan mode itself landed earlier, in v0.122.0 on April 20, 2026. /goal builds on top of that foundation.

What /goal Actually Does

The basic mechanic is straightforward. You type /goal followed by your prompt. Codex stores the goal and starts working. If the session is interrupted (network hiccup, closed laptop, deliberate pause), the goal persists. When the session comes back, Codex resumes automatically via runtime continuation.

The model signals completion with TASK_COMPLETE or the task_complete tool. Until that happens, the goal stays active.

What actually makes this different from a long-running --continue session is the persistence layer. Before /goal, a closed terminal meant a dead session. You could approximate continuity by carefully managing context files and re-injecting prompts, which is basically what the Ralph Wiggum Loop does in a scrappier way. /goal makes continuity a first-class feature.

A few config knobs matter here. In ~/.codex/config.toml, the model_auto_compact_token_limit key sets the threshold for automatic context compaction. The [features] block is where feature flags live. The model_reasoning_effort key sets reasoning effort for the session. If you want hands-off autonomous runs, you will also need approval_policy and sandbox_mode configured correctly. I will get to that.

The TUI also changes. You get visible goal state. You can pause a running goal intentionally without killing the process. Resume picks it back up with runtime continuation.

Here is what a real session actually looked like.

The project was a TypeScript monorepo I am working on. A voice interview system with several end-to-end scenarios that needed to work correctly under a set of defined conditions.

I run Codex with approval_policy = "never" and sandbox_mode = "danger-full-access" for autonomous /goal sessions. These two settings are the precondition for hands-off long runs: the model does not stop to ask permission, and it has full filesystem access to do its work. This is only sane in a trusted project directory with clean git state going in.

The /goal prompt was around 600 words. I wrote it using a structured approach: XML-style blocks organizing the goal, an explicit reading list of ten or more files the model should consult first, working rules (check git status before edits, prefer rg over grep, use apply_patch), a done_when contract spelling out four concrete success criteria, and explicit anti-pattern fences. One of those fences: “do not add string-matching patches to pass one transcript.” If you have worked on voice systems, you know why that fence needs to exist.

Writing a prompt like that is itself a task. If you want to see how I approach prompt design for this kind of work, The Interview Method covers the workflow.

Model: gpt-5.5. Reasoning effort: high.

Session timeline:

9:19 PM - /goal submitted.

9:20 PM - First turn running. I watched it for 57 seconds, then interrupted (turn_aborted).

5.5 hours - I closed the laptop. No re-prompting.

~2:50 AM - When I came back, /goal had already injected a developer message (“Continue working toward the active thread goal”) and was running. Autonomous.

Context compaction fired once, at approximately 6.7M cumulative input tokens.

Cumulative tokens: ~6.8M input, ~10K output, ~2.6K reasoning tokens. Cache hit rate: ~94%.

Wall time: 6h 44min. Actual model compute: ~41 minutes across turns.

Final status: TASK_COMPLETE. All four target end-to-end voice scenarios passed verification.

Manual transcript review found no prompt loops, no liveness spirals, no premature closes. The model worked through the scenarios methodically and called it done when the criteria were met.

One real-world ceiling worth noting. A TTS first-byte timing field I wanted captured could not be measured, because the upstream library does not emit the relevant runtime event. The model documented this honestly. Explicit nulls in the artifact, with a note explaining why the field was missing. It did not paper over the gap. /goal can give you an autonomous run, but it cannot bypass what the external environment actually exposes.

The ~94% cache hit rate is the number that makes the economics work. 6.8M input tokens sounds alarming until you realize that the actual incremental cost at that cache rate is a fraction of the nominal number.

/goal vs the Ralph Wiggum Loop

I wrote about the Ralph Wiggum Loop a while back. Geoffrey Huntley coined it, and his original post is still the canonical reference: the technique is essentially while :; do cat PROMPT.md | claude-code; done with git history as memory. It solves the same core problem /goal solves: how do you keep an AI agent working on something longer than a single context window?

The approaches are different in character.

DimensionRalph Wiggum Loop/goal

SetupShell script or plugin, external orchestrationBuilt into Codex CLI

State persistenceGit history, files on diskApp-server APIs, native goal state

Resume behaviorManual re-invocationAutomatic runtime continuation

Context managementFresh context per iteration (by design)Compaction within session

Reasoning continuityStateless between iterationsContinuous within session

ModelClaude CodeCodex with gpt-5.5

Good forTasks that benefit from fresh eyes each passLong-horizon tasks with accumulating context

The Ralph Wiggum Loop is genuinely useful. The stateless-by-design property is sometimes an advantage: each iteration approaches the problem without carrying forward incorrect intermediate conclusions. If the model gets confused, the next iteration starts clean.

/goal bets on continuity instead. The model builds up a picture of the codebase across turns and does not have to re-read everything from scratch on each pass. For tasks where reasoning accumulates (debugging a subtle interaction, navigating a complex state machine), continuity wins. For tasks that are naturally iterative and convergent (adding tests, fixing lint), Ralph’s fresh-context model often works just as well.

Neither is the right default. They are tools for different shapes of problem.

When /goal Is the Wrong Choice

A few situations where I would not reach for /goal.

Undefined success criteria. The done_when contract is not optional. If you cannot write four concrete success criteria before you start, the model has no way to know when it is done. It will either declare TASK_COMPLETE prematurely or loop indefinitely. Write the contract first.

Exploratory work. Early-stage “figure out what this codebase is doing” work benefits from human-in-the-loop. You learn things as the model surfaces them. /goal is for execution, not exploration.

Security-critical paths. I run with approval_policy = "never" and sandbox_mode = "danger-full-access". That setup is only appropriate in project directories I trust completely. Authentication systems, payment flows, anything touching sensitive data: keep approval in the loop.

Unclear external dependencies. If your task depends on an external system you are not sure about, find out first. The TTS timing field I mentioned above is the mild version. The more expensive version is a six-hour run that hits a wall at hour five because the external API does not support what you assumed it would.

Short tasks. /goal has overhead. A task you can finish in ten minutes of interactive Codex is not improved by wrapping it in a persisted goal. The complexity is not worth it below some threshold. My rough heuristic: if the task would not comfortably span two or more separate sessions in the old model, it probably does not need /goal.

The Mindset Shift

Old: Autonomous AI runs are sessions you monitor, ready to intervene when things go sideways.
New: Autonomous AI runs are contracts you write upfront, then get out of the way.

The shift is from supervisor to architect. The quality of the /goal session is determined almost entirely before the first turn runs. The prompt quality, the success criteria, the anti-pattern fences, the reading list. Once it starts, your job is mostly done. If you wrote the contract well, the model executes. If you did not, no amount of monitoring will save it.

That is a different skill than interactive prompting. It is closer to writing a spec than having a conversation.

Conclusion

/goal is the most significant thing Codex has shipped since plan mode. The persistence layer and runtime continuation are what make it different from a long --continue session in practice. Six hours and forty-four minutes of wall time with forty-one minutes of actual compute is only possible because the model kept its context, the cache held, and the goal survived a five-hour gap without me touching anything.

The economics work out because of cache hit rates. The quality works out because of upfront prompt discipline. Neither of those things is automatic.

This is the first post in a two-part series. The companion post covers the workflow side: how I prep specs and prompts before they reach /goal. From SPEC.md to /goal: My Codex + GPT-5.5 Workflow.

Sources

Codex v0.128.0 release notes

Codex changelog

Codex CLI features reference

Geoffrey Huntley: Ralph Wiggum, the goat

この記事をシェア

KDnuggets重要度42026年6月27日 00:00

Apple Silicon で MLX を用いた言語モデルのファインチューニング

The Zvi重要度42026年6月26日 23:51

ホワイトハウスが個別に GPT-5.6 のアクセス権をその場しのぎで決定する方針へ

AWS Machine Learning Blog重要度42026年6月26日 23:42

AWS を活用した保険仲介向けドメイン特化型 AI の先駆者、Cara の取り組み

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

TLDR AI·2026年5月8日 09:00·約16分

5時間の一時停止を乗り越えた6時間のコード実行記録（10分読了）

#Codex #LLM #Developer Tools #Context Management

TL;DR

Codex に導入された新機能「persisted goals」により、開発者は端末の再起動やスリープ後もプロンプトを再入力せずに作業を継続できるようになった。

AI深層分析2026年5月9日 22:04

注目/ 5段階

深度40%

キーポイント

状態の永続化機能の実装

自動復元によるユーザー体験の向上

システムが自動的に開発者メッセージを再開時に注入するため、ユーザーは時間を計測したり再プロンプトを入力したりする必要がなくなる。

長時間実行タスクへの対応

5 時間の一時停止を含む 6 時間にわたる実行（Codex Run）のような長期間の処理でも、中断から正確に再開できることが実証された。

影響分析・編集コメントを表示

影響分析

編集コメント

AI エージェントの信頼性を支える「状態管理」の実現は、単なる機能追加ではなく、開発ワークフローそのものを再定義する重要な転換点です。

TL;DR

/goal は 2026 年 4 月 30 日に Codex CLI v0.128.0 で、名目上の主要機能としてリリースされました。

これは、ターミナルの再起動やラップトップのスリープ、数時間にわたる一時停止を乗り越えても再プロンプトなしで維持される「永続化されたゴール（persisted goals）」を導入するものです。

ランタイム継続機能により、Codex は再開時に開発者からの入力を待つのではなく、自動的に開発者メッセージを注入します。

私は TypeScript のモノレポで実際のセッションを実行しました。経過時間：約 6 時間 44 分。モデルの計算に要した実質的な時間：約 41 分。最終ステータス：TASK_COMPLETE（タスク完了）。

このセッションでは、累積入力トークン数が約 680 万トークン消費されましたが、キャッシュヒット率は約 94% でした。自動コンテキスト圧縮（auto-context-compaction）は一度発火し、これは model_auto_compact_token_limit を経由して設定可能です。

4 月 30 日にリリースされた内容

その一文には多くの情報が凝縮されているので、分解して説明しましょう。

/goal の実際の動作

モデルは TASK_COMPLETE または task_complete ツールによって完了をシグナルします。それが発生するまで、目標はアクティブな状態のままです。

実際のセッションがどのようなものかを示します。

モデル：gpt-5.5。推論エフォート：高。

セッションのタイムライン：

午後 9:19 - /goal を送信。

午後 9:20 - 最初のターン実行開始。57 秒間観察した後、中断しました（turn_aborted）。

5.5 時間後 - ラップトップを閉じました。再プロンプトは行いませんでした。

午前約 2:50 - 戻ってきた際、/goal はすでに開発者メッセージ（「アクティブなスレッド目標に向かって作業を継続する」）を注入し、実行中でした。自律的に動作していました。

コンテキスト圧縮が 1 回発火しました。累積入力トークン数は約 6.7M の時点です。

累積トークン数：入力約 6.8M、出力約 10K、推論トークン約 2.6K。キャッシュヒット率：約 94%。

経過時間：6 時間 44 分。モデルの実際の計算時間：ターン全体で約 41 分。

最終ステータス：TASK_COMPLETE。4 つのターゲットとなるエンドツーエンド音声シナリオすべてが検証を通過しました。

/goal とラルフ・ウィグムループの比較

両者のアプローチは性質が異なります。

次元	ラルフ・ウィグムループ	/goal

SetupShell スクリプトまたはプラグイン、外部オーケストレーション

Codex CLI に組み込み済み

State persistenceGit の履歴、ディスク上のファイル

App サーバー API、ネイティブゴール状態

Resume behavior手動再呼び出し

自動ランタイム継続

Context managementイテレーションごとに新鮮なコンテキスト（設計上）

セッション内での圧縮

Reasoning continuityイテレーション間でステートレス

セッション内で連続的

ModelClaude CodeCodex with gpt-5.5

Good for各パスで新鮮な視点が必要なタスク

蓄積されるコンテキストを要する長期的タスク

どちらかが正しいデフォルトというわけではありません。これらは異なる形状の問題に対するツールです。

/goal が不適切な選択となる場合

私が/goal を使わないいくつかの状況があります。

マインドセットの転換

{"translation": "翻訳全文"}

旧: 自律型 AI の実行は、何かがうまくいかなくなった際に介入する準備ができているセッションを監視することです。

新: 自律型 AI の実行とは、事前に契約書を作成し、あとは邪魔をしないことです。

これは対話型プロンプティングとは異なるスキルであり、会話をするよりも仕様書を書くことに近いです。

結論

ソース

Codex v0.128.0 リリースノート

Codex チェンジログ

Codex CLI 機能リファレンス

ジョフリー・ハンリー：ラルフ・ウィグム、伝説の山羊

原文を表示

TL;DR

/goal shipped in Codex CLI v0.128.0 on April 30, 2026 as a named headline feature.

It introduces persisted goals: a goal state that survives terminal restarts, laptop sleeps, and multi-hour pauses without re-prompting.

Runtime continuation means Codex injects a developer message on resume rather than waiting for you to type anything.

I ran a real session on a TypeScript monorepo. Wall time: about 6h 44min. Actual model compute: about 41 minutes. Final status: TASK_COMPLETE.

The session burned roughly 6.8M cumulative input tokens at a ~94% cache hit rate. Auto-context-compaction fired once, configurable via model_auto_compact_token_limit.

That is the thing about /goal that does not come through in a changelog entry. It is not just a new command. It is a different contract between you and the agent.

What Shipped on April 30

That one sentence packs a lot in, so let me pull it apart.

Persisted goals are the core idea. Previous Codex sessions were ephemeral. Close the terminal, lose the thread. /goal stores the active goal in app-server state, so it outlives the process.

App-server APIs is the plumbing behind that persistence. Codex now talks to a local server layer that tracks goal state.

Model tools means the model itself gets tools for interacting with the goal lifecycle. It can signal completion, request continuation, and inspect goal state as part of its reasoning.

What /goal Actually Does

The model signals completion with TASK_COMPLETE or the task_complete tool. Until that happens, the goal stays active.

The TUI also changes. You get visible goal state. You can pause a running goal intentionally without killing the process. Resume picks it back up with runtime continuation.

Here is what a real session actually looked like.

The project was a TypeScript monorepo I am working on. A voice interview system with several end-to-end scenarios that needed to work correctly under a set of defined conditions.

Writing a prompt like that is itself a task. If you want to see how I approach prompt design for this kind of work, The Interview Method covers the workflow.

Model: gpt-5.5. Reasoning effort: high.

Session timeline:

9:19 PM - /goal submitted.

9:20 PM - First turn running. I watched it for 57 seconds, then interrupted (turn_aborted).

5.5 hours - I closed the laptop. No re-prompting.

~2:50 AM - When I came back, /goal had already injected a developer message (“Continue working toward the active thread goal”) and was running. Autonomous.

Context compaction fired once, at approximately 6.7M cumulative input tokens.

Cumulative tokens: ~6.8M input, ~10K output, ~2.6K reasoning tokens. Cache hit rate: ~94%.

Wall time: 6h 44min. Actual model compute: ~41 minutes across turns.

Final status: TASK_COMPLETE. All four target end-to-end voice scenarios passed verification.

Manual transcript review found no prompt loops, no liveness spirals, no premature closes. The model worked through the scenarios methodically and called it done when the criteria were met.

/goal vs the Ralph Wiggum Loop

The approaches are different in character.

DimensionRalph Wiggum Loop/goal

SetupShell script or plugin, external orchestrationBuilt into Codex CLI

State persistenceGit history, files on diskApp-server APIs, native goal state

Resume behaviorManual re-invocationAutomatic runtime continuation

Context managementFresh context per iteration (by design)Compaction within session

Reasoning continuityStateless between iterationsContinuous within session

ModelClaude CodeCodex with gpt-5.5

Good forTasks that benefit from fresh eyes each passLong-horizon tasks with accumulating context

Neither is the right default. They are tools for different shapes of problem.

When /goal Is the Wrong Choice

A few situations where I would not reach for /goal.

The Mindset Shift

Old: Autonomous AI runs are sessions you monitor, ready to intervene when things go sideways.
New: Autonomous AI runs are contracts you write upfront, then get out of the way.

That is a different skill than interactive prompting. It is closer to writing a spec than having a conversation.

Conclusion

The economics work out because of cache hit rates. The quality works out because of upfront prompt discipline. Neither of those things is automatic.

Sources

Codex v0.128.0 release notes

Codex changelog

Codex CLI features reference

Geoffrey Huntley: Ralph Wiggum, the goat

この記事をシェア

KDnuggets重要度42026年6月27日 00:00

Apple Silicon で MLX を用いた言語モデルのファインチューニング

The Zvi重要度42026年6月26日 23:51

ホワイトハウスが個別に GPT-5.6 のアクセス権をその場しのぎで決定する方針へ

AWS Machine Learning Blog重要度42026年6月26日 23:42

AWS を活用した保険仲介向けドメイン特化型 AI の先駆者、Cara の取り組み

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

キーポイント

影響分析

編集コメント

TL;DR

4 月 30 日にリリースされた内容

/goal の実際の動作

/goal とラルフ・ウィグムループの比較

/goal が不適切な選択となる場合

マインドセットの転換

結論

ソース

TL;DR

What Shipped on April 30

What /goal Actually Does

/goal vs the Ralph Wiggum Loop

When /goal Is the Wrong Choice

The Mindset Shift

Conclusion

Sources

関連記事

キーポイント

影響分析

編集コメント

TL;DR

4 月 30 日にリリースされた内容

/goal の実際の動作

/goal とラルフ・ウィグムループの比較

/goal が不適切な選択となる場合

マインドセットの転換

結論

ソース

TL;DR

What Shipped on April 30

What /goal Actually Does

/goal vs the Ralph Wiggum Loop

When /goal Is the Wrong Choice

The Mindset Shift

Conclusion

Sources

関連記事