Mythos との協働がもたらす感覚について
One Useful Thing の著者が Claude 5 Fable(Mythos クラス)の早期アクセスを通じて、その圧倒的な性能と人間との関係性の変化を報告し、複雑なタスク実行能力や倫理的・心理的インパクトを分析している。
キーポイント
Claude 5 Fable の圧倒的性能
他の公開モデルを大幅に上回る性能を持ち、単一のプロンプトで高度な学術論文や複雑な詩を作成するなどの驚異的な結果を出した。
数学的生成による創造的ゲーム開発
画像生成機能がないにもかかわらず、数式のみを用いて画像や 3D オブジェクトを含む動作可能なゲームを複数作成し、ユーザーの期待を超えた成果を上げた。
人間と AI の関係性の変化
指示を出した瞬間に結果が得られる「喜び」と、そのあまりの効率さに感じる「不気味さ」の間で揺れる、新しい AI 利用体験を提示している。
セキュリティ用途への制限と実用性
Mythos クラスのセキュリティ影響が議論される中、Fable はガードレールによりサイバーセキュリティ用途には使えないが、それ以外の領域では極めて強力なツールである。
自律的なエージェントワークフローの活用
AI は単独で作業するのではなく、複数のサブエージェントを起動して研究を行い、コードを実行し、さらに検証用の対立グループ(adversarial groups)を組んで結果を検証する高度な自律プロセスを実行しました。
複雑なデータ収集と修正能力
2,200 件以上の航空便、TGV や新幹線の時刻表、各国の道路速度などの詳細データを収集し、グリーンランドやピトケアン島といった遠隔地への正確な移動時間を取得するために、指示に応じてワークフローを動的に修正・拡張しました。
AI の自律性とブラックボックス化
ユーザーの役割が最小限に抑えられ、AI が数百もの判断を自律的に行うため、その意思決定プロセスは不透明なブラックボックスとなる。
影響分析・編集コメントを表示
影響分析
この記事は、単なる性能比較の域を超え、AI が人間の創造的プロセスや思考パターンに与える心理的影響(喜びと不気味さ)を深く掘り下げた点で重要です。Claude 5 Fable の登場は、AI が「ツール」から「共創者」へと役割を変化させる転換点を示しており、今後の AI 開発における倫理的・実用的な議論の重要な材料となります。
編集コメント
Mythos クラスの AI がもたらす「不気味なほど高い効率性」への言及は、技術的な性能評価だけでなく、人間心理への影響という新たな視点を提供しており、業界全体が直面する課題を浮き彫りにしています。
I had early access to the first Mythos-class AI model being released to the public, Claude 5 Fable. Much of the discussion of Mythos has centered on its impact on software security, but I tested it on everything except that (the guardrails around Fable essentially prevent it from being used for cybersecurity at all). My conclusion is that it represents a very real leap over every model I have used before, and, maybe more important, suggests our relationship with AI is changing in drastic ways.
First, how good is Fable? In experiment after experiment I conducted, it outperformed basically every other public model I have used by a considerable margin. It was capable across many problems and produced some startling results — it would work up to a dozen hours executing on multi-page specifications. I’ll walk you through a couple of more complex, and serious, use cases shortly, but you could see the general improvement across the board on every task. The problem about communicating this in a post is that many of the most impressive results are going to be interesting to only small portions of my readers. For example, it made the most sophisticated academic social science paper I have yet seen from an AI from a single prompt and one piece of feedback. It also created a 10-page epic rhyming poem about a haircut where every word starts with the letter s.
So, as a more accessible and entertaining example, I also had it create a bunch of games you can try. All of these are one initial prompt in Claude Code where Fable had to take my vague prompts and generate something workable, followed by a couple of additional prompts with minor encouragement (“make it better”) or feedback. What makes these especially impressive is that Claude cannot generate images, so every piece of art or 3D object was made with math alone, not using any external assets. You can try any of them: a game about flipping coins (prompt: “Balatro, but for the game of coin flips”) that is quite fun; a snake game where the snake is self-aware and crazy things happen; or a game about descending into the depths to see what is there.
So the output is impressive. But, especially as I turned to more serious projects, I often felt using the tool was somewhere between delightful and unnerving. Delightful because I just asked for something at it happened. And also unnerving because I just asked for something and it happened.
Maps and Methods
To see why, it helps to understand the way in which Fable gets work done, and for that I want to turn to an example I have tested on many previous AI models: building an isochrone map. This is a map that shows the distance you can travel in a given length of time, and the first one was created in 1881 showing travel times from London.

The original map
No previous model did an even halfway useful job with trying to create a map like this because it involves researching thousands of potential trip distances and a lot of small judgement calls and decisions. I decided to try it on Fable using Claude Code with this prompt: i want you to build a fully researched and beautiful isochronic map that lets me pick various cities and see real isochronic lines based on real data. I want the design to be unique. You should take into account airports (and travel time to and from airports) trains, walking, driving. The data does not need to be live but should be real based on your research and data. You can start with a few cities but more general is better, this should be an entirely new project. It then suggested that it do this in the style of the original map. I agreed, and it got to work.
It is worth a second looking at the transcript of the multiple hour building session the AI went through on its own, because you can see some unusual things. First, the AI launched multiple other AIs (I believe mostly the cheaper Claude Sonnet) to help it conduct research on travel times, ultimately retrieving over 2,200 specific flights, the rail schedules for trains from the TGV to the Shinkansen, and road speeds per country from multiple academic papers. And while those agents were running, it started coding. Then it launched yet more agents and tests to verify its code, all the while taking notes about its progress.

The result was a fully functioning map of impressive sophistication that looked a lot like the 1881 original, but that doesn’t mean it was perfect. I noticed that a lot of remote locations (like Greenland) just contained estimates of travel time, not exact numbers, so I told Fable to fix it, including the instructions: actually get travel times to remote airports and locations. This time the AI launched a workflow, adversarial groups of agents that did research and tested each others results. It figured out how often ships sail to Pitcairn Island in the Pacific and how to get to Grise Fjord from Ottawa. And it used a tremendous number of tokens in a very short period of time (more on this soon).

The results were impressive. I pushed a few more times in directions that interested me (including asking for other visualization approaches, etc.). I would recommend spending a couple minutes clicking around the results, and you can read its methods and sources at the bottom of the graph.

What the AI generated. Click on the map to go to the interactive version
This is probably not a useful project for you unless you really like travel and maps, but it is indicative of AI solving a hard problem involving research, math, visual development, taste, judgement, complex coding, and more. And, the unnerving part was how little I did. I gave a really ambitious instruction, the AI followed it. I gave a couple of minor pieces of feedback, and the AI figured it out. My role was extremely limited.
Importantly, it was just limited in how much work I did relative to the model, it was also limited in how much control I had over how the model did things, why the model chose particular approaches, or even how in-depth its results would be. The details of the AI’s decision making are not shown to me, and the process would be too long to even be worth following. The map required the AI to make judgement calls about hundreds of little choices, and it just made them, without me understanding the choices or having a chance to weigh in. In many ways, it is miraculous (I can always ask for edits at the end) on the other, it turns AI into the ultimate black box.
Working with a Mythos-class model
The most ambitious project I got from Fable takes a little more explanation. I do a lot of research where humans produce messy answers and doing any sort of analysis requires categorize those answers properly: how innovative is an idea? why do people like this book? To figure this out, we used human researchers to make a judgement call about a piece of information, and statistically compare their answers with others to figure out whether we can trust the data. A lot of recent research has shown that AIs might be able to do this important work, but calibrating AI and human judgement has been difficult and expensive. So I asked Fable to solve the problem, first generating a complex 19 page design document and then executing it.
It worked for nine and a half hours.

The result was an extremely sophisticated piece of software the AI called Concord that could take in multiple datasets, calibrate human and AI responses, and then conduct complex data analysis on the results. Again, it wasn’t perfect. As an expert, I was able to spot some errors and omissions (some as a result of the design I had asked for) that I had the AI correct. But the scope of the delivery on this project, and many others, exceeded anything I had seen before. In this case, it was a piece of software that researchers have needed for years but was never profitable to create. You can now just use or modify the code here. I am sure it is not perfect (I only spent an hour working with the results), but a software engineer would iron out the remaining potential bugs that I could not find quickly (which is one reason we may need more, not less, coders in the future, to help with the explosion of new uses for software).
This power goes hand in hand with strangeness and limits. Among those limits is its token usage. Fable is twice as expensive as Opus, and it burns through tokens at a rate that suggests the answer to how much it costs in production is “a lot,” though its clever delegation to cheaper models may lower the real price considerably. The guardrails for Fable also trip at the faintest hint of a security problem, defaulting to the less powerful Claude 4.8 Opus, and it happens way too often. And the jagged frontier is still there. For example, the AI still writes in the same weird style (in fact the software Fable produces bears traces of Claudisms; so do its progress reports, all that carrying the weight and earning the answer). But the deeper strangeness is how little I had to do, and how little I could see while it was being done.
Last year I called this working with a wizard: you chant the spell and something happens. With Fable the spell has gotten powerful enough that I am no longer sure I am the wizard. I am closer to a patron. I describe what I want, I pay for it, and I judge the result. The conjuring happens somewhere I cannot watch, in hundreds of small choices I never get a vote on. The work has shifted from process to outcome. I no longer steer; I commission.
It is possible the sidelining is temporary, just an artifact of interfaces that haven’t caught up, and that we’ll get better windows into what these models are doing and better ways to steer them midstream. It is also possible that the opposite is true: that the more capable the model, the less there is for a human to meaningfully do, and the black box is the price of the power. I suspect that is more likely to be the real direction. None of this is a loss of control in the obvious sense. I can still steer Fable, and it follows instructions remarkably well: the more ambitious the instruction, the better the result. But steering is no longer the same as doing. I brief the model, it spins up its own agents to research and write and check one another’s work, and what comes back is finished. A patron commissions a single artist. Fable is closer to a whole studio, where I am the client who signs off on the final work without ever setting foot on the floor.
Subscribe now
Share
関連記事
Anthropic、Fable 5 モデルの議論禁止トピックを公表
Anthropic は新モデル「Claude Fable 5」を発表したが、サイバーセキュリティや生物学など悪用されるリスクがある分野への回答を制限する安全装置を搭載した。
エージェントシステムにおける意図と実行の架橋
Amazon Science は、AI エージェントのパフォーマンスはモデル自体の問題ではなく、LLM とツール間の仲介役となるハッチ(OS)の設計がボトルネックであると指摘し、意図を実行に移すシステムの重要性を強調した。
Anthropic の Claude スキル構築完全ガイド
Anthropic は、Claude のスキルを技術的に定義する方法から設計・実装、テスト、配布までの完全な手順と、失敗時の対処法を解説したガイドを発表しました。