読み込み中…

Simon Willison Blog·2026年4月24日 10:31·約2分

Claude Codeの品質報告に関する最新アップデート

#LLM #コーディングエージェント #Claude Code #Anthropic #AIハッチス設計

TL;DR

シモン・ウィルソンは、Claude Codeの品質低下報告がモデル自体ではなくセッション管理ハッチスのバグ起因であることを指摘し、エージェント構築における非モデル要因の複雑さを強調する。

AI深層分析2026年4月24日 10:45

重要/ 5段階

深度40%

キーポイント

品質低下の真因はモデル而非ハッチスバグ

Anthropicのポストモーテムにより、Claude Codeの出力品質低下はモデル自体ではなく、セッション管理や状態クリア処理のバグに起因することが判明した。

アイドルセッションクリア機能の重大な実装ミス

レイテンシ削減目的で導入した「1時間以上アイドル状態のセッション思考履歴クリア」機能にバグがあり、毎ターン実行されてClaudeが忘却・反復する現象を引き起こした。

エージェント開発におけるハッチス管理の重要性

モデルの非確実性を除いても、セッション状態やインフラ周りを扱うハッチスのバグはシステム品質に直結するため、エージェント構築時はインフラ層のテストを徹底する必要がある。

重要な引用

The models themselves were not to blame, but three separate issues in the Claude Code harness caused complex but material problems which directly affected users.

A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive.

If you're building agentic systems it's worth reading this article in detail - the kinds of bugs that affect harnesses are deeply complicated, even if you put aside the inherent non-deterministic nature of the models themselves.

影響分析・編集コメントを表示

影響分析

本記事は、生成AIツールの開発において「モデル性能」だけでなく「運用・制御層（ハッチス）」の品質がユーザー体験を決定づけることを示唆している。開発者はエージェント構築時、セッション状態管理やメモリクリアロジックの堅牢性を重視し、実際のユースケースに即した包括的なテストフレームワークを構築する必要がある。これはAIソフトウェア開発のベストプラクティスを成熟させる重要な示唆となる。

編集コメント

モデルの性能向上に注目が集まりがちだが、実際のプロダクション環境ではセッション管理や状態同期といったインフラ層の品質がユーザー満足度を左右する。エージェント開発者は「モデル以外のバグ」を軽視せず、包括的なQAプロセスを構築すべきである。

Claude Codeの品質に関する最近の報告への更新

過去2ヶ月にわたり、Claude Codeがより低い品質の結果を出力しているとの苦情が多数寄せられていたが、それは実際の問題に根ざしていたことが判明した。

モデル自体に責任があるわけではなかったが、Claude Codeのハルネス（harness）における3つの別々の問題が、複雑ではあるものの実態を伴う問題を発生させ、ユーザーに直接影響を与えていた。

Anthropicによるポストモーテム（postmortem）はこれらの問題を詳細に記述している。特に私にとって目立ったのが以下のものだ：

3月26日、ユーザーがセッションを再開した際のレイテンシ（latency）を削減するため、1時間以上アイドル状態だったセッションからClaudeの以前の思考履歴を削除する変更をリリースした。しかし、バグによりこの処理がセッション終了まで1回ではなく毎ターン実行され続け、Claudeが物忘れが多く反復的な挙動を示すようになった。

私はClaude Codeのセッションを、1時間（あるいはしばしば1日以上）放置してから再度作業するというケースを*頻繁に*経験する。現在、私は11個のそのようなセッションを開いている（ps aux | grep 'claude 'による確認）。これは先ほど数十個を閉じた後の数だ。

これらの「古びた（stale）」セッションでのプロンプティングに費やす時間は、最近始めたセッションよりも多いと推測される！

エージェントシステム（agentic systems）を構築している場合、この記事を詳細に読む価値がある。モデル自体が持つ本質的な非確定的（non-deterministic）な性質を脇に置いたとしても、ハルネスに影響するバグの種別は極めて複雑である。

Via Hacker News

タグ：ai, prompt-engineering, generative-ai, llms, anthropic, coding-agents, claude-code

原文を表示

An update on recent Claude Code quality reports

It turns out the high volume of complaints that Claude Code was providing worse quality results over the past two months was grounded in real problems.

The models themselves were not to blame, but three separate issues in the Claude Code harness caused complex but material problems which directly affected users.

Anthropic's postmortem describes these in detail. This one in particular stood out to me:

On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive.

I *frequently* have Claude Code sessions which I leave for an hour (or often a day or longer) before returning to them. Right now I have 11 of those (according to ps aux | grep 'claude ') and that's after closing down dozens more the other day.

I estimate I spend more time prompting in these "stale" sessions than sessions that I've recently started!

If you're building agentic systems it's worth reading this article in detail - the kinds of bugs that affect harnesses are deeply complicated, even if you put aside the inherent non-deterministic nature of the models themselves.

Via Hacker News

Tags: ai, prompt-engineering, generative-ai, llms, anthropic, coding-agents, claude-code

この記事をシェア

The Decoder重要度42026年4月25日 19:18

アンストロピック「強力なAIモデルはより良い取引を実現し、劣るモデルを使う利用者は気づかない」

Anthropic Engineering重要度42026年4月23日 09:00

Claude Codeの品質報告に関する最新アップデート

The Decoder2026年4月24日 19:52

Anthropic、Claude Codeの品質問題を認め、より厳格な品質管理を約束

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む

Simon Willison Blog·2026年4月24日 10:31·約2分

Claude Codeの品質報告に関する最新アップデート

#LLM #コーディングエージェント #Claude Code #Anthropic #AIハッチス設計

TL;DR

AI深層分析2026年4月24日 10:45

重要/ 5段階

深度40%

キーポイント

品質低下の真因はモデル而非ハッチスバグ

アイドルセッションクリア機能の重大な実装ミス

エージェント開発におけるハッチス管理の重要性

重要な引用

The models themselves were not to blame, but three separate issues in the Claude Code harness caused complex but material problems which directly affected users.

A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive.

If you're building agentic systems it's worth reading this article in detail - the kinds of bugs that affect harnesses are deeply complicated, even if you put aside the inherent non-deterministic nature of the models themselves.

影響分析・編集コメントを表示

影響分析

編集コメント

Claude Codeの品質に関する最近の報告への更新

Anthropicによるポストモーテム（postmortem）はこれらの問題を詳細に記述している。特に私にとって目立ったのが以下のものだ：

これらの「古びた（stale）」セッションでのプロンプティングに費やす時間は、最近始めたセッションよりも多いと推測される！

Via Hacker News

タグ：ai, prompt-engineering, generative-ai, llms, anthropic, coding-agents, claude-code

原文を表示

An update on recent Claude Code quality reports

It turns out the high volume of complaints that Claude Code was providing worse quality results over the past two months was grounded in real problems.

The models themselves were not to blame, but three separate issues in the Claude Code harness caused complex but material problems which directly affected users.

Anthropic's postmortem describes these in detail. This one in particular stood out to me:

On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive.

I estimate I spend more time prompting in these "stale" sessions than sessions that I've recently started!

Via Hacker News

Tags: ai, prompt-engineering, generative-ai, llms, anthropic, coding-agents, claude-code

この記事をシェア

The Decoder重要度42026年4月25日 19:18

アンストロピック「強力なAIモデルはより良い取引を実現し、劣るモデルを使う利用者は気づかない」

Anthropic Engineering重要度42026年4月23日 09:00

Claude Codeの品質報告に関する最新アップデート

The Decoder2026年4月24日 19:52

Anthropic、Claude Codeの品質問題を認め、より厳格な品質管理を約束

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み