Simon Willison Blog·2026年4月3日 05:40·約17分で読める

レニーのポッドキャストでのエージェント工学に関する対談のハイライト

#エージェント工学 #AI自動化 #ソフトウェア開発 #労働市場変革 #実践的課題 #転換点

TL;DR

AI専門家のSimon Willisonは、Lenny Rachitskyのポッドキャストで、2025年11月のAI転換点を経て、ソフトウェア開発の自動化が既に始まり、エージェント工学の実践的課題と社会経済的影響について議論した。

AI深層分析2026年4月3日 06:41

注目/ 5段階

深度40%

キーポイント

AIの転換点と自動化の始まり

2025年11月をAIの転換点と位置付け、ソフトウェア開発を含む情報労働者の自動化が既に進行中であると指摘している。

エージェント工学の実践的課題

コーディングエージェントの活用や「責任あるバイブコーディング」といった実践を紹介しながら、テストのボトルネック化やソフトウェア評価の難しさなど、現場での課題を具体的に論じている。

社会経済的影響と労働環境の変化

「ダークファクトリー」の到来予測や、中間層の労働者への影響、中断コストの低下など、AI自動化がもたらす広範な社会経済的変化について分析している。

影響分析・編集コメントを表示

影響分析

この議論は、AIエージェントが現実の業務プロセスに統合され始めた段階における、実践的な課題と社会的影響を早期に示している。ソフトウェア開発の自動化が進む中で、労働市場の再編や業務プロセスの根本的な見直しが必要となることを示唆しており、業界関係者にとって重要な示唆を提供する。

編集コメント

AI転換点後の具体的な現場変化を、実践者の視点から率直に語った貴重な内容。自動化の楽観論だけでなく、テストや評価の難しさといった現実的課題にも焦点を当てている点が評価できる。

私は、Lenny Rachitsky のポッドキャストにゲストとして出演し、新しいエピソード「An AI state of the union: We've passed the inflection point, dark factories are coming, and automation timelines」で話しました。このエピソードは YouTube、Spotify、そして Apple Podcasts で利用可能です。会話のハイライトと関連リンクを以下にまとめます。

11 月の転換点
ソフトウェアエンジニアは他の情報労働者の指標となる
スマホでコードを書くこと
責任ある「バイブコーディング」
ダークファクトリーと StrongDM
ボトルネックがテストに移った
この仕事は疲れる
中断のコストは今では大幅に低下した
ソフトウェアの見積もり能力が機能しなくなった
中間層の人々にとって厳しい状況だ
ソフトウェアの評価が難しくなった
AI ツールは簡単だという誤解
コーディングエージェントは現在、セキュリティ研究で有用である
OpenClaw
ジャーナリストは信頼できない情報源に対処するのが上手い
ペリカンベンチマーク
最後に、オウムに関する朗報
YouTube のチャプター

11月の転換点

4:19 - これら二つのラボがコード作成におけるモデルの性能向上のために全力を尽くした結果、11月には私が「転換点」と呼ぶ局面が訪れました。そこでは GPT 5.1 と Claude Opus 4.5 が登場しました。

両者とも前世代のモデルから漸進的に改善されていましたが、それまでコードは概ね動作していたものの、非常に細心の注意を払う必要があった状況から、一歩越えた閾値を超えました。そして突然、私たちは「ほぼ常に指示した通りに動作する」世界へと移行しました。これがすべてを変えます。

今ではコーディングエージェントを起動して、「これを行う Mac アプリケーションを作成してほしい」と依頼すれば、何も機能しないバグだらけのゴミのような出力が返ってくることはもはやありません。

ソフトウェアエンジニアは他の情報労働者の指標となる

5:49 - 私は一日で 10,000 行のコードを書き上げることができます。そしてそのほとんどが動作します。これは良いことでしょうか？では、どうすれば「概ね動作する」状態から「すべてが動作する」状態へと移行できるのでしょうか？私たちは今、多くの新たな問いに直面しています。私はこれが、他の情報労働者にとっての指標（ベルウェザー）となる理由だと考えています。

エージェントに提示する問題の中で、コードはほぼすべての他の問題よりも扱いやすいです。なぜならコードは明らかに正解か不正解かのどちらかだからで、動作するかしないかは明確だからです。いくつかの微妙な隠れたバグが存在する可能性はありますが、一般的にはそのものが実際に機能しているかどうかを判断できます。

もしエージェントがエッセイを書いたり、訴訟書類を作成したりする場合、それが本当に良い仕事をしたのか、正解したのか不正解だったのかを導き出すのははるかに難しくなります。しかし、これはソフトウェアエンジニアである私たちにも起こっていることです。まず私たちに訪れました。

私たちは今、キャリアの行方はどうなるのか？以前は時間の大部分を占めていた作業の一部がもはやそうではなくなった場合、チームとしてどのように働くべきなのか？それはどのような姿になるのか？を考えています。そして、これが将来他の情報処理業務にどのように波及していくかを見るのは非常に興味深いでしょう。

弁護士たちはこの問題にひどく苦しんでいます。AI 幻覚事例データベースは現在1,228件に達しています！

さらに、冒頭部分の「コールドオープン」からのこの一節も重要です：

以前は、ChatGPT にコードを求めると、いくつかのコードが出力され、それを実行してテストする必要がありました。しかし現在、コーディングエージェントがこのステップを代行してくれます。私にとって未解決の疑問は、他のどの知識労働分野が実際にこれらの「エージェントループ」に陥りやすいのかということです。

スマホでコードを書く

8:19 - 私はスマホで非常に多くのコードを書きます。驚くべきことに、犬を連れてビーチを散歩しながらでも良い仕事ができるのは素晴らしいことです。

私は主にこれに Claude の iPhone アプリを使用しています。通常の Claude チャットセッション（現在コードを実行できます）で使う場合や、Web 版の Claude Code を制御するために使用する場合です。

責任ある「バイブコーディング」

9:55 もしバグがあった場合、あなた自身しか被害を受けない自分用の何かを「バイブコーディング」（直感や雰囲気だけでコードを書く手法）で開発しているなら、思いっきりやって構いません。それは完全に問題ありません。しかし、そのバイブコーディングで作ったコードを他人が使うためにリリースし、あなたのバグが実際に誰かを傷つける可能性がある瞬間には、一歩引いて考える必要があります。

また、いつバイブコーディングが許されるか？もご覧ください。

ダークファクトリーと StrongDM

12:49 「ダークファクトリー」と呼ばれる理由は、工場自動化におけるある考え方にあります。つまり、工場があまりにも自動化されていて人が必要ない場合、電気を消すことができるという考え方です。もし工場内で人が作業する必要がなければ、機械は完全な暗闇の中で稼働できるからです。それがソフトウェアの世界ではどう見えるのでしょうか？[...]

あるポリシーがあり、誰もコードを書かないというものです：コンピュータにコードを入力してはなりません。正直言って、6 ヶ月前なら私はこれを狂気だと考えていました。しかし今日では、私が生成するコードの約 95% は自分で入力したものではありません。この世界はすでに現実のものとなっています。なぜなら、最新のモデルは十分に優秀で、「その変数名を変更し」「リファクタリングを行い」「ここにこの行を追加して」と指示すれば、彼らはそれを実行してくれるからです。これはキーボードを叩いて自分で入力するよりも速いのです。

しかし、次のルールがあります：誰もコードを読まないということです。これが StrongDM が昨年始めた取り組みです。

私は 2 月に StrongDM のダークファクトリー探求についてもっと詳しく書きました。

ボトルネックはテストに移った

21:27 - かつては、仕様書を策定してエンジニアリングチームに渡すものでした。そして幸運にも 3 週間後には、実装が戻ってくるという具合でした。しかし今では、その種のコーディングエージェント（coding agents）がどの程度確立されているかによりますが、それはわずか 3 時間で済むようになりました。では次に、どこにボトルネックがあるのでしょうか？

製品開発を行ったことのある人なら誰でも知っている通り、最初のアイデアは常に間違っています。重要なのは、それらを証明し、テストすることです。

現在、作業可能なプロトタイプを非常に迅速に構築できるため、あらゆるものを以前よりもはるかに速くテストできるようになりました。そこで私が自分の仕事でよく行っている面白いことがあります。設計したい機能がある場合、その機能がどのように機能しうるかについて、3 つの異なるアプローチのプロトタイプを作成することが多いのです。なぜなら、これにはほとんど時間がかからないからです。

私は常にプロトタイピングを愛しており、今ではプロトタイピングはさらに価値が高まっています。

22:40 - 現在、UI プロトタイプ（ユーザーインターフェースのプロトタイプ）は無料です。ChatGPT や Claude は、あなたが説明するあらゆるものに対して非常に説得力のある UI をすぐに構築してくれます。これが、あなたが取るべき働き方です。製品デザインに取り組んでいて、わずかなプロトタイプを「バイブコーディング」していない人は、そのステップで得られる最も強力なブーストを見逃していると思います。

では、次にどうすればよいのでしょうか？1 つの選択肢ではなく 3 つの選択肢が与えられた場合、どのようにして自分がどれが最善であるかを証明すればよいでしょうか。それに対する自信のある答えは私にはありません。おそらく、ここで古くからある使いやすさテスト（usability testing）が役立ってくるのでしょう。

プロトタイピングについては後ほどさらに詳しく触れます：

46:35 - 私のキャリア全体を通じて、私の超能力はプロトタイピングでした。私は物事の実用的なプロトタイプを非常に素早く作成することに長けていました。会議に出席して「見て、こうすれば機能するかもしれません」と示すことができる人物が私だったのです。それがまさに私の独自の売り出しポイントでした。しかし、それはもう過去のものになりました。今や誰でも私ができていたことをできるようになっています。

この仕事は疲れる

26:25 - コーディングエージェントを効果的に活用するには、私の 25 年間のソフトウェアエンジニアとしての経験すべてを注ぎ込む必要があり、精神的に非常に疲弊します。一度に 4 つのエージェントを起動して、それぞれが異なる問題に取り組ませることもできます。しかし、午前 11 時にはその日の疲れがピークに達し、完全に力尽きてしまいます。[...]

私たちは新しい限界を見つけるための個人的なスキルを習得する必要があります。つまり、燃え尽き症候群（バーンアウト）を起こさないために、どのように責任を持って行動すべきかという点です。

私は、睡眠時間を削っている人々によく会います。彼らは「コーディングエージェントが私の代わりに作業してくれるはずだ」と考え、さらに 30 分だけ起きて追加のタスクを起動し、朝の 4 時に目が覚めてしまうのです。これは明らかに持続不可能な状態です。[...]

これらのツールの一部をどのように使用するかには、一種のギャンブルや依存症の要素が含まれています。

中断のコストは以前より遥かに低くなった

45:16 - 人々は、コーディング担当者を中断しないことがいかに重要かを語ります。コーディング担当者には、メンタルモデルを構築しコードを生み出すために、確固たる 2〜4 時間の連続した作業ブロックが必要だとされます。しかし、これは完全に変わりました。私のプログラミング作業では、次に何をすべきかエージェントにプロンプト（指示）するために、時々 2 分ほど中断する必要があります。その後、他の用事を済ませ、また戻ってくることができます。私は以前よりもはるかに「中断されやすい」状態になっています。

ソフトウェアの工期見積もり能力が崩壊しました

28:19 - 私は何かを構築するのにどれくらい時間がかかるかについて、25 年の経験を持っています。しかし、それはすべて無効になりました。以前なら「これは 2 週間かかるからやる価値がない」と言えた問題も、今では「おそらく 20 分で済むかもしれない」と考えざるを得ないからです。かつて 2 週間かかった理由の多くは、AI が現在私たちの代わりに処理してくれるような、古びたコード記述（crufty coding）に関するものだったのです。

私は AI に、自分でも実行不可能だと考えているタスクを頻繁に投げかけます。なぜなら、時々それが成功するからです。そして失敗したときには、そこから学べますよね？しかし、AI が実際に何かを成し遂げたとき、特に以前のモデルではできなかったことを成し遂げたときは、それはまさに最先端の AI 研究（AI research）なのです。

関連するエピソードがあります：

36:56 - 私の友人たちの多くが、サイドプロジェクトのバックログを抱えていると話しています。過去 10 年、あるいは 15 年にわたって、彼らは完成しきれないプロジェクトをいくつも抱えてきました。そして最近では、「もう全部片付けた」と言う人もいます。ここ数ヶ月で、私は毎晩「このプロジェクトを終わらせよう」と進めてきたのです。そして最後には、バックログが消えたことに対する喪失感のようなものさえ感じているようです。「さて、バックログはなくなった。次は何を作ろうか？」と。

真ん中の立場にいる人々にとっての困難

29:29 - ThoughtWorks という大手 IT コンサルティングファームが、先月ほど前にオフサイトイベントを開催し、異なる企業から多くのエンジニアリング VP を集めてこの話題について議論しました。彼らが出した興味深い仮説の一つは、この技術は経験豊富なエンジニアにとって非常に有益であり、そのスキルを増幅させるというものです。また、新入りのエンジニアにとっても、オンボーディング（入社研修）における多くの課題を解決してくれるため、大変良いことです。問題は真ん中の立場にいる人々です。キャリアの中途にあり、まだスーパーシニアエンジニアの地位には達していないが、かといって新人でもない層こそが、現在最も困難な状況にある可能性が高いのです。

私は Cloudflare が 1,000 人のインターンを採用した件や、Shopify も同様の動きをしている件について言及しました。

Lenny は、その真ん中の立場で行き詰まっている人々に対する私のアドバイスを求めました：

31:21 - それは私に大きな責任を押し付けていますね！今後の道筋は、この技術に積極的に取り組むことだと考えます。そして、どうすればこれが私のスキル向上に役立つのかを考え抜くことです。

多くの人がスキルの低下を心配しています：AI が代わりにやってくれるなら、何も学べないという考え方です。もしあなたがそれを心配しているなら、それに対抗する必要があります。その技術をどのように適用するかを意識し、「このツールはあらゆる質問に答え、*しばしば*正解する」という事実を受け入れつつ、どうすれば自分のスキルを強化し、新しいことを学び、より野心的なプロジェクトに取り組めるかを考える必要があります。

33:05 - 今やあらゆるものが急速に変化しています。唯一の普遍的なスキルは、変化に柔軟に対応できる能力です。これが私たちが皆必要としているものです。

AI とどのようにして卓越した成果を上げるかというこれらの会話で最も頻繁に出てくる用語は*エンタージ（主体性）*です。私は、エージェントには全くエンタージがないと考えます。AI が決して持つことのできない唯一のものこそがエンタージであり、それは人間のような動機を持たないからです。

したがって、重要なのは自分自身のエンタージに投資し、この技術をどのように活用して自分の業務をより良く行い、新しいことを成し遂げるかを学ぶことに注力することです。

ソフトウェアの評価が難しくなっている

詳細なドキュメントと堅牢なテスト（tests）を備えたソフトウェアを作成することが非常に容易になったという事実は、何が信頼できるプロジェクトなのかを見極めることがより困難になっていることを意味します。

37:47 時々、ソフトウェアや Python ライブラリなどのアイデアが浮かぶことがあります。すると、1 時間ほどで完成させ、ドキュメントもテストも整った状態にできます。以前なら数週間かけていたようなレベルのソフトウェアができあがり、それを GitHub にアップロードすることも可能です。

しかし……私はそれを信じていません。私がそれを信じない理由は、それらすべてを急いで作り上げたからです。おそらく品質は良いのでしょうが、その品質に自信を持てるほど十分に時間を費やしていません。最も重要なのは、*まだ実際に使っていない*ということです。

実は、他人のソフトウェアを使う際に私が最も重視するのは、開発者がそのソフトウェアを数ヶ月間使い込んでいたかどうかです。

私が作った非常に素晴らしいソフトウェアがいくつかありますが、*一度も使ったことがありません*。実際に試して使うよりも、作る方が早かったのです！

AI ツールは簡単だという誤解

41:31 - 誰もが「きっと簡単だろう」と言います。「ただのチャットボットだ」と。しかし、簡単ではありません。AI における大きな誤解の一つに、「これらのツールを効果的に使うのは簡単だ」という考えがあります。実際には、多くの練習が必要であり、うまくいかないことを試したり、うまくいったことを試したりする過程が不可欠です。

コーディングエージェントは現在、セキュリティ研究において有用である

19:04 - 過去およそ3〜6か月の間に、彼らはセキュリティ研究者として信頼できる存在となり始め、それがセキュリティ研究業界に衝撃を与えています。

Thomas Ptacek の記事 Vulnerability Research Is Cooked を参照してください。

同時に、オープンソースプロジェクトは質の低いセキュリティレポートで溢れかえっています：

20:05 - 自分たちが何をしているのか分かっていない人々が、ChatGPT にセキュリティホールを見つけさせてから、その結果をメンテナに報告しています。レポート自体はそれらしく見えます。ChatGPT は脆弱性に関する非常にフォーマットされたレポートを作成できますが、実際には問題として検証されていないため、時間の無駄です。

これを正しく行う良い例として、Anthropic と Firefox の協力が挙げられます。ここでは Anthropic のセキュリティチームが、Mozilla に引き渡す前にすべてのセキュリティ問題を*検証*しています。

OpenClaw

もちろん、これについて話さなければなりませんでした

原文を表示

I was a guest on Lenny Rachitsky's podcast, in a new episode titled An AI state of the union: We've passed the inflection point, dark factories are coming, and automation timelines. It's available on YouTube, Spotify, and Apple Podcasts. Here are my highlights from our conversation, with relevant links.

The November inflection point

Software engineers as bellwethers for other information workers

Writing code on my phone

Responsible vibe coding

Dark Factories and StrongDM

The bottleneck has moved to testing

This stuff is exhausting

Interruptions cost a lot less now

My ability to estimate software is broken

It's tough for people in the middle

It's harder to evaluate software

The misconception that AI tools are easy

Coding agents are useful for security research now

OpenClaw

Journalists are good at dealing with unreliable sources

The pelican benchmark

And finally, some good news about parrots

YouTube chapters

The November inflection point

4:19 - The end result of these two labs throwing everything they had at making their models better at code is that in November we had what I call the inflection point where GPT 5.1 and Claude Opus 4.5 came along.
They were both incrementally better than the previous models, but in a way that crossed a threshold where previously the code would mostly work, but you had to pay very close attention to it. And suddenly we went from that to... almost all of the time it does what you told it to do, which makes all of the difference in the world.
Now you can spin up a coding agent and say, build me a Mac application that does this thing, and you'll get something back which won't just be a buggy pile of rubbish that doesn't do anything.

Software engineers as bellwethers for other information workers

5:49 - I can churn out 10,000 lines of code in a day. And most of it works. Is that good? Like, how do we get from most of it works to all of it works? There are so many new questions that we're facing, which I think makes us a bellwether for other information workers.
Code is easier than almost every other problem that you pose these agents because code is obviously right or wrong - either it works or it doesn't work. There might be a few subtle hidden bugs, but generally you can tell if the thing actually works.
If it writes you an essay, if it prepares a lawsuit for you, it's so much harder to derive if it's actually done a good job, and to figure out if it got things right or wrong. But it's happening to us as software engineers. It came for us first.
And we're figuring out, OK, what do our careers look like? How do we work as teams when part of what we did that used to take most of the time doesn't take most of the time anymore? What does that look like? And it's going to be very interesting seeing how this rolls out to other information work in the future.

Lawyers are falling for this really badly. The AI hallucination cases database is up to 1,228 cases now!

Plus this bit from the cold open at the start:

It used to be you'd ask ChatGPT for some code, and it would spit out some code, and you'd have to run it and test it. The coding agents take that step for you now. And an open question for me is how many other knowledge work fields are actually prone to these agent loops?

Writing code on my phone

8:19 - I write so much of my code on my phone. It's wild. I can get good work done walking the dog along the beach, which is delightful.

I mainly use the Claude iPhone app for this, both with a regular Claude chat session (which can execute code now) or using it to control Claude Code for web.

Responsible vibe coding

9:55 If you're vibe coding something for yourself, where the only person who gets hurt if it has bugs is you, go wild. That's completely fine. The moment you ship your vibe coding code for other people to use, where your bugs might actually harm somebody else, that's when you need to take a step back.

Dark Factories and StrongDM

12:49 The reason it's called the dark factory is there's this idea in factory automation that if your factory is so automated that you don't need any people there, you can turn the lights off. Like the machines can operate in complete darkness if you don't need people on the factory floor. What does that look like for software? [...]
So there's this policy that nobody writes any code: you cannot type code into a computer. And honestly, six months ago, I thought that was crazy. And today, probably 95% of the code that I produce, I didn't type myself. That world is practical already because the latest models are good enough that you can tell them to rename that variable and refactor and add this line there... and they'll just do it - it's faster than you typing on the keyboard yourself.
The next rule though, is nobody reads the code. And this is the thing which StrongDM started doing last year.

I wrote a lot more about StrongDM's dark factory explorations back in February.

The bottleneck has moved to testing

21:27 - It used to be, you'd come up with a spec and you hand it to your engineering team. And three weeks later, if you're lucky, they'd come back with an implementation. And now that maybe takes three hours, depending on how well the coding agents are established for that kind of thing. So now what, right? Now, where else are the bottlenecks?
Anyone who's done any product work knows that your initial ideas are always wrong. What matters is proving them, and testing them.
We can test things so much faster now because we can build workable prototypes so much quicker. So there's an interesting thing I've been doing in my own work where any feature that I want to design, I'll often prototype three different ways it could work because that takes very little time.

I've always loved prototyping things, and prototyping is even more valuable now.

22:40 - A UI prototype is free now. ChatGPT and Claude will just build you a very convincing UI for anything that you describe. And that's how you should be working. I think anyone who's doing product design and isn't vibe coding little prototypes is missing out on the most powerful boost that we get in that step.
But then what do you do? Given your three options that you have instead of one option, how do you prove to yourself which one of those is the best? I don't have a confident answer to that. I expect this is where the good old fashioned usability testing comes in.

This stuff is exhausting

26:25 - I'm finding that using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems. And by like 11 AM, I am wiped out for the day. [...]
There's a personal skill we have to learn in finding our new limits - what's a responsible way for us not to burn out.
I've talked to a lot of people who are losing sleep because they're like, my coding agents could be doing work for me. I'm just going to stay up an extra half hour and set off a bunch of extra things... and then waking up at four in the morning. That's obviously unsustainable. [...]
There's an element of sort of gambling and addiction to how we're using some of these tools.

Interruptions cost a lot less now

45:16 - People talk about how important it is not to interrupt your coders. Your coders need to have solid two to four hour blocks of uninterrupted work so they can spin up their mental model and churn out the code. That's changed completely. My programming work, I need two minutes every now and then to prompt my agent about what to do next. And then I can do the other stuff and I can go back. I'm much more interruptible than I used to be.

My ability to estimate software is broken

28:19 - I've got 25 years of experience in how long it takes to build something. And that's all completely gone - it doesn't work anymore because I can look at a problem and say that this is going to take two weeks, so it's not worth it. And now it's like... maybe it's going to take 20 minutes because the reason it would have taken two weeks was all of the sort of crufty coding things that the AI is now covering for us.
I constantly throw tasks at AI that I don't think it'll be able to do because every now and then it does it. And when it doesn't do it, you learn, right? But when it does do something, especially something that the previous models couldn't do, that's actually cutting edge AI research.

And a related anecdote:

36:56 - A lot of my friends have been talking about how they have this backlog of side projects, right? For the last 10, 15 years, they've got projects they never quite finished. And some of them are like, well, I've done them all now. Last couple of months, I just went through and every evening I'm like, let's take that project and finish it. And they almost feel a sort of sense of loss at the end where they're like, well, okay, my backlog's gone. Now what am I going to build?

It's tough for people in the middle

29:29 - So ThoughtWorks, the big IT consultancy, did an offsite about a month ago, and they got a whole bunch of engineering VPs in from different companies to talk about this stuff. And one of the interesting theories they came up with is they think this stuff is really good for experienced engineers, like it amplifies their skills. It's really good for new engineers because it solves so many of those onboarding problems. The problem is the people in the middle. If you're mid-career, if you haven't made it to sort of super senior engineer yet, but you're not sort of new either, that's the group which is probably in the most trouble right now.

I mentioned Cloudflare hiring 1,000 interns, and Shopify too.

Lenny asked for my advice for people stuck in that middle:

31:21 - That's a big responsibility you're putting on me there! I think the way forward is to lean into this stuff and figure out how do I help this make me better?
A lot of people worry about skill atrophy: if the AI is doing it for you, you're not learning anything. I think if you're worried about that, you push back at it. You have to be mindful about how you're applying the technology and think, okay, I've been given this thing that can answer any question and often gets it right. How can I use this to amplify my own skills, to learn new things, to take on much more ambitious projects? [...]
33:05 - Everything is changing so fast right now. The only universal skill is being able to roll with the changes. That's the thing that we all need.
The term that comes up most in these conversations about how you can be great with AI is agency. I think agents have no agency at all. I would argue that the one thing AI can never have is agency because it doesn't have human motivations.
So I'd say that's the thing is to invest in your own agency and invest in how to use this technology to get better at what you do and to do new things.

It's harder to evaluate software

The fact that it's so easy to create software with detailed documentation and robust tests means it's harder to figure out what's a credible project.

37:47 Sometimes I'll have an idea for a piece of software, Python library or whatever, and I can knock it out in like an hour and get to a point where it's got documentation and tests and all of those things, and it looks like the kind of software that previously I'd have spent several weeks on - and I can stick it up on GitHub
And yet... I don't believe in it. And the reason I don't believe in it is that I got to rush through all of those things... I think the quality is probably good, but I haven't spent enough time with it to feel confident in that quality. Most importantly, I haven't used it yet.
It turns out when I'm using somebody else's software, the thing I care most about is I want them to have used it for months.
I've got some very cool software that I built that I've never used. It was quicker to build it than to actually try and use it!

The misconception that AI tools are easy

41:31 - Everyone's like, oh, it must be easy. It's just a chat bot. It's not easy. That's one of the great misconceptions in AI is that using these tools effectively is easy. It takes a lot of practice and it takes a lot of trying things that didn't work and trying things that did work.

Coding agents are useful for security research now

19:04 - In the past sort of three to six months, they've started being credible as security researchers, which is sending shockwaves through the security research industry.

See Thomas Ptacek: Vulnerability Research Is Cooked.

At the same time, open source projects are being bombarded with junk security reports:

20:05 - There are these people who don't know what they're doing, who are asking ChatGPT to find a security hole and then reporting it to the maintainer. And the report looks good. ChatGPT can produce a very well formatted report of a vulnerability. It's a total waste of time. It's not actually verified as being a real problem.

A good example of the right way to do this is Anthropic's collaboration with Firefox, where Anthropic's security team verified* every security problem before passing them to Mozilla.

OpenClaw

Of course we had to talk abou

この記事をシェア

404 Media★42026年5月13日 22:00

ソフトウェア開発者らが「AI が脳を腐らせている」と指摘

メタ、グーグル、マイクロソフトなどの企業幹部は AI が経済を急速に変えると確信しているが、現場のソフトウェア開発者は AI 依存により思考力が低下し、脳の機能が劣化していると懸念を示している。

GitHub Blog★42026年5月9日 01:30

開発者にとって年齢確認法が重要な理由

世界各国の政策立案者が、オンライン上の児童・青少年を保護するため年齢確認法案を推進している。一部の案は未成年者のアクセス制限を課し、他方はデバイスやアプリストアに年齢情報の収集と提供を義務付ける。適切な範囲設定がない場合、オープンソースソフトウェアや開発インフラサービスに過重な負担を強いるリスクがある。

Simon Willison Blog★42026年5月6日 23:24

バイブコーディングとエージェントエンジニアリングの融合への懸念

著者は、AI コーディングツールのポッドキャスト出演を通じて、自身の業務において「バイブコーディング」と「エージェントエンジニアリング」が思わぬほど接近し始めているという驚きと懸念を表明した。

ニュース一覧に戻る元記事を読む

Simon Willison Blog·2026年4月3日 05:40·約17分で読める

レニーのポッドキャストでのエージェント工学に関する対談のハイライト

#エージェント工学 #AI自動化 #ソフトウェア開発 #労働市場変革 #実践的課題 #転換点

TL;DR

AI深層分析2026年4月3日 06:41

注目/ 5段階

深度40%

キーポイント

AIの転換点と自動化の始まり

2025年11月をAIの転換点と位置付け、ソフトウェア開発を含む情報労働者の自動化が既に進行中であると指摘している。

エージェント工学の実践的課題

社会経済的影響と労働環境の変化

影響分析・編集コメントを表示

影響分析

編集コメント

11 月の転換点
ソフトウェアエンジニアは他の情報労働者の指標となる
スマホでコードを書くこと
責任ある「バイブコーディング」
ダークファクトリーと StrongDM
ボトルネックがテストに移った
この仕事は疲れる
中断のコストは今では大幅に低下した
ソフトウェアの見積もり能力が機能しなくなった
中間層の人々にとって厳しい状況だ
ソフトウェアの評価が難しくなった
AI ツールは簡単だという誤解
コーディングエージェントは現在、セキュリティ研究で有用である
OpenClaw
ジャーナリストは信頼できない情報源に対処するのが上手い
ペリカンベンチマーク
最後に、オウムに関する朗報
YouTube のチャプター

11月の転換点

ソフトウェアエンジニアは他の情報労働者の指標となる

弁護士たちはこの問題にひどく苦しんでいます。AI 幻覚事例データベースは現在1,228件に達しています！

さらに、冒頭部分の「コールドオープン」からのこの一節も重要です：

スマホでコードを書く

責任ある「バイブコーディング」

また、いつバイブコーディングが許されるか？もご覧ください。

ダークファクトリーと StrongDM

しかし、次のルールがあります：誰もコードを読まないということです。これが StrongDM が昨年始めた取り組みです。

私は 2 月に StrongDM のダークファクトリー探求についてもっと詳しく書きました。

ボトルネックはテストに移った

私は常にプロトタイピングを愛しており、今ではプロトタイピングはさらに価値が高まっています。

プロトタイピングについては後ほどさらに詳しく触れます：

この仕事は疲れる

これらのツールの一部をどのように使用するかには、一種のギャンブルや依存症の要素が含まれています。

中断のコストは以前より遥かに低くなった

ソフトウェアの工期見積もり能力が崩壊しました

関連するエピソードがあります：

真ん中の立場にいる人々にとっての困難

私は Cloudflare が 1,000 人のインターンを採用した件や、Shopify も同様の動きをしている件について言及しました。

Lenny は、その真ん中の立場で行き詰まっている人々に対する私のアドバイスを求めました：

ソフトウェアの評価が難しくなっている

実は、他人のソフトウェアを使う際に私が最も重視するのは、開発者がそのソフトウェアを数ヶ月間使い込んでいたかどうかです。

AI ツールは簡単だという誤解

コーディングエージェントは現在、セキュリティ研究において有用である

Thomas Ptacek の記事 Vulnerability Research Is Cooked を参照してください。

同時に、オープンソースプロジェクトは質の低いセキュリティレポートで溢れかえっています：

OpenClaw

もちろん、これについて話さなければなりませんでした

原文を表示

The November inflection point

Software engineers as bellwethers for other information workers

Writing code on my phone

Responsible vibe coding

Dark Factories and StrongDM

The bottleneck has moved to testing

This stuff is exhausting

Interruptions cost a lot less now

My ability to estimate software is broken

It's tough for people in the middle

It's harder to evaluate software

The misconception that AI tools are easy

Coding agents are useful for security research now

OpenClaw

Journalists are good at dealing with unreliable sources

The pelican benchmark

And finally, some good news about parrots

YouTube chapters

The November inflection point

4:19 - The end result of these two labs throwing everything they had at making their models better at code is that in November we had what I call the inflection point where GPT 5.1 and Claude Opus 4.5 came along.
They were both incrementally better than the previous models, but in a way that crossed a threshold where previously the code would mostly work, but you had to pay very close attention to it. And suddenly we went from that to... almost all of the time it does what you told it to do, which makes all of the difference in the world.
Now you can spin up a coding agent and say, build me a Mac application that does this thing, and you'll get something back which won't just be a buggy pile of rubbish that doesn't do anything.

Software engineers as bellwethers for other information workers

5:49 - I can churn out 10,000 lines of code in a day. And most of it works. Is that good? Like, how do we get from most of it works to all of it works? There are so many new questions that we're facing, which I think makes us a bellwether for other information workers.
Code is easier than almost every other problem that you pose these agents because code is obviously right or wrong - either it works or it doesn't work. There might be a few subtle hidden bugs, but generally you can tell if the thing actually works.
If it writes you an essay, if it prepares a lawsuit for you, it's so much harder to derive if it's actually done a good job, and to figure out if it got things right or wrong. But it's happening to us as software engineers. It came for us first.
And we're figuring out, OK, what do our careers look like? How do we work as teams when part of what we did that used to take most of the time doesn't take most of the time anymore? What does that look like? And it's going to be very interesting seeing how this rolls out to other information work in the future.

Lawyers are falling for this really badly. The AI hallucination cases database is up to 1,228 cases now!

Plus this bit from the cold open at the start:

It used to be you'd ask ChatGPT for some code, and it would spit out some code, and you'd have to run it and test it. The coding agents take that step for you now. And an open question for me is how many other knowledge work fields are actually prone to these agent loops?

Writing code on my phone

8:19 - I write so much of my code on my phone. It's wild. I can get good work done walking the dog along the beach, which is delightful.

I mainly use the Claude iPhone app for this, both with a regular Claude chat session (which can execute code now) or using it to control Claude Code for web.

Responsible vibe coding

9:55 If you're vibe coding something for yourself, where the only person who gets hurt if it has bugs is you, go wild. That's completely fine. The moment you ship your vibe coding code for other people to use, where your bugs might actually harm somebody else, that's when you need to take a step back.

Dark Factories and StrongDM

12:49 The reason it's called the dark factory is there's this idea in factory automation that if your factory is so automated that you don't need any people there, you can turn the lights off. Like the machines can operate in complete darkness if you don't need people on the factory floor. What does that look like for software? [...]
So there's this policy that nobody writes any code: you cannot type code into a computer. And honestly, six months ago, I thought that was crazy. And today, probably 95% of the code that I produce, I didn't type myself. That world is practical already because the latest models are good enough that you can tell them to rename that variable and refactor and add this line there... and they'll just do it - it's faster than you typing on the keyboard yourself.
The next rule though, is nobody reads the code. And this is the thing which StrongDM started doing last year.

I wrote a lot more about StrongDM's dark factory explorations back in February.

The bottleneck has moved to testing

21:27 - It used to be, you'd come up with a spec and you hand it to your engineering team. And three weeks later, if you're lucky, they'd come back with an implementation. And now that maybe takes three hours, depending on how well the coding agents are established for that kind of thing. So now what, right? Now, where else are the bottlenecks?
Anyone who's done any product work knows that your initial ideas are always wrong. What matters is proving them, and testing them.
We can test things so much faster now because we can build workable prototypes so much quicker. So there's an interesting thing I've been doing in my own work where any feature that I want to design, I'll often prototype three different ways it could work because that takes very little time.

I've always loved prototyping things, and prototyping is even more valuable now.

22:40 - A UI prototype is free now. ChatGPT and Claude will just build you a very convincing UI for anything that you describe. And that's how you should be working. I think anyone who's doing product design and isn't vibe coding little prototypes is missing out on the most powerful boost that we get in that step.
But then what do you do? Given your three options that you have instead of one option, how do you prove to yourself which one of those is the best? I don't have a confident answer to that. I expect this is where the good old fashioned usability testing comes in.

This stuff is exhausting

26:25 - I'm finding that using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems. And by like 11 AM, I am wiped out for the day. [...]
There's a personal skill we have to learn in finding our new limits - what's a responsible way for us not to burn out.
I've talked to a lot of people who are losing sleep because they're like, my coding agents could be doing work for me. I'm just going to stay up an extra half hour and set off a bunch of extra things... and then waking up at four in the morning. That's obviously unsustainable. [...]
There's an element of sort of gambling and addiction to how we're using some of these tools.

Interruptions cost a lot less now

45:16 - People talk about how important it is not to interrupt your coders. Your coders need to have solid two to four hour blocks of uninterrupted work so they can spin up their mental model and churn out the code. That's changed completely. My programming work, I need two minutes every now and then to prompt my agent about what to do next. And then I can do the other stuff and I can go back. I'm much more interruptible than I used to be.

My ability to estimate software is broken

28:19 - I've got 25 years of experience in how long it takes to build something. And that's all completely gone - it doesn't work anymore because I can look at a problem and say that this is going to take two weeks, so it's not worth it. And now it's like... maybe it's going to take 20 minutes because the reason it would have taken two weeks was all of the sort of crufty coding things that the AI is now covering for us.
I constantly throw tasks at AI that I don't think it'll be able to do because every now and then it does it. And when it doesn't do it, you learn, right? But when it does do something, especially something that the previous models couldn't do, that's actually cutting edge AI research.

And a related anecdote:

36:56 - A lot of my friends have been talking about how they have this backlog of side projects, right? For the last 10, 15 years, they've got projects they never quite finished. And some of them are like, well, I've done them all now. Last couple of months, I just went through and every evening I'm like, let's take that project and finish it. And they almost feel a sort of sense of loss at the end where they're like, well, okay, my backlog's gone. Now what am I going to build?

It's tough for people in the middle

29:29 - So ThoughtWorks, the big IT consultancy, did an offsite about a month ago, and they got a whole bunch of engineering VPs in from different companies to talk about this stuff. And one of the interesting theories they came up with is they think this stuff is really good for experienced engineers, like it amplifies their skills. It's really good for new engineers because it solves so many of those onboarding problems. The problem is the people in the middle. If you're mid-career, if you haven't made it to sort of super senior engineer yet, but you're not sort of new either, that's the group which is probably in the most trouble right now.

I mentioned Cloudflare hiring 1,000 interns, and Shopify too.

Lenny asked for my advice for people stuck in that middle:

31:21 - That's a big responsibility you're putting on me there! I think the way forward is to lean into this stuff and figure out how do I help this make me better?
A lot of people worry about skill atrophy: if the AI is doing it for you, you're not learning anything. I think if you're worried about that, you push back at it. You have to be mindful about how you're applying the technology and think, okay, I've been given this thing that can answer any question and often gets it right. How can I use this to amplify my own skills, to learn new things, to take on much more ambitious projects? [...]
33:05 - Everything is changing so fast right now. The only universal skill is being able to roll with the changes. That's the thing that we all need.
The term that comes up most in these conversations about how you can be great with AI is agency. I think agents have no agency at all. I would argue that the one thing AI can never have is agency because it doesn't have human motivations.
So I'd say that's the thing is to invest in your own agency and invest in how to use this technology to get better at what you do and to do new things.

It's harder to evaluate software

The fact that it's so easy to create software with detailed documentation and robust tests means it's harder to figure out what's a credible project.

37:47 Sometimes I'll have an idea for a piece of software, Python library or whatever, and I can knock it out in like an hour and get to a point where it's got documentation and tests and all of those things, and it looks like the kind of software that previously I'd have spent several weeks on - and I can stick it up on GitHub
And yet... I don't believe in it. And the reason I don't believe in it is that I got to rush through all of those things... I think the quality is probably good, but I haven't spent enough time with it to feel confident in that quality. Most importantly, I haven't used it yet.
It turns out when I'm using somebody else's software, the thing I care most about is I want them to have used it for months.
I've got some very cool software that I built that I've never used. It was quicker to build it than to actually try and use it!

The misconception that AI tools are easy

41:31 - Everyone's like, oh, it must be easy. It's just a chat bot. It's not easy. That's one of the great misconceptions in AI is that using these tools effectively is easy. It takes a lot of practice and it takes a lot of trying things that didn't work and trying things that did work.

Coding agents are useful for security research now

19:04 - In the past sort of three to six months, they've started being credible as security researchers, which is sending shockwaves through the security research industry.

See Thomas Ptacek: Vulnerability Research Is Cooked.

At the same time, open source projects are being bombarded with junk security reports:

20:05 - There are these people who don't know what they're doing, who are asking ChatGPT to find a security hole and then reporting it to the maintainer. And the report looks good. ChatGPT can produce a very well formatted report of a vulnerability. It's a total waste of time. It's not actually verified as being a real problem.

A good example of the right way to do this is Anthropic's collaboration with Firefox, where Anthropic's security team verified* every security problem before passing them to Mozilla.

OpenClaw

Of course we had to talk abou

この記事をシェア

404 Media★42026年5月13日 22:00

ソフトウェア開発者らが「AI が脳を腐らせている」と指摘

GitHub Blog★42026年5月9日 01:30

開発者にとって年齢確認法が重要な理由

Simon Willison Blog★42026年5月6日 23:24

バイブコーディングとエージェントエンジニアリングの融合への懸念

ニュース一覧に戻る元記事を読む

キーポイント

影響分析

編集コメント

11月の転換点

ソフトウェアエンジニアは他の情報労働者の指標となる

スマホでコードを書く

責任ある「バイブコーディング」

ダークファクトリーと StrongDM

ボトルネックはテストに移った

この仕事は疲れる

中断のコストは以前より遥かに低くなった

ソフトウェアの工期見積もり能力が崩壊しました

真ん中の立場にいる人々にとっての困難

ソフトウェアの評価が難しくなっている

AI ツールは簡単だという誤解

OpenClaw

The November inflection point

Software engineers as bellwethers for other information workers

Writing code on my phone

Responsible vibe coding

Dark Factories and StrongDM

The bottleneck has moved to testing

This stuff is exhausting

Interruptions cost a lot less now

My ability to estimate software is broken

It's tough for people in the middle

It's harder to evaluate software

The misconception that AI tools are easy

Coding agents are useful for security research now

OpenClaw

関連記事

キーポイント

影響分析

編集コメント

11月の転換点

ソフトウェアエンジニアは他の情報労働者の指標となる

スマホでコードを書く

責任ある「バイブコーディング」

ダークファクトリーと StrongDM

ボトルネックはテストに移った

この仕事は疲れる

中断のコストは以前より遥かに低くなった

ソフトウェアの工期見積もり能力が崩壊しました

真ん中の立場にいる人々にとっての困難

ソフトウェアの評価が難しくなっている

AI ツールは簡単だという誤解

OpenClaw

The November inflection point

Software engineers as bellwethers for other information workers

Writing code on my phone

Responsible vibe coding

Dark Factories and StrongDM

The bottleneck has moved to testing

This stuff is exhausting

Interruptions cost a lot less now

My ability to estimate software is broken

It's tough for people in the middle

It's harder to evaluate software

The misconception that AI tools are easy

Coding agents are useful for security research now

OpenClaw

関連記事