The Register AI/ML·2026年4月29日 21:51·約6分

AWS キーノートは AI を魔法のように称賛するが、同社のエンジニアたちは異なる現実を語る

#Agentic AI #LLM Hallucination #Kiro #Human-in-the-loop #Amazon AWS

TL;DR

AWS の内部開発責任者は、AI エージェントによる高速なコード生成が「魔法」のように見えても、ハルシネーションやセキュリティリスクにより、最終的な品質保証には必ず人間のレビューが必要不可欠であると強調している。

AI深層分析2026年5月8日 00:11

重要/ 5段階

深度40%

キーポイント

人間による検証の絶対性

AI エージェントがコードを生成・デプロイするプロセスにおいて、セキュリティと信頼性を確保するため、最終的に人間が必ず出力を確認し承認することが非交渉条件であると明言している。

仕様駆動型開発の限界

Kiro のような「仕様駆動型開発」はハルシネーションやプロンプトインジェクションを完全に排除するものではなく、リスクを低減させるに過ぎず、それでも仕様の範囲を超えた行動が発生しうると指摘している。

人材育成と組織の持続性

AI による自動化が進む中でエンジニアを削減することは危険であり、システムを維持・管理できる若手エンジニアの継続的な育成と採用が不可欠であると強く主張している。

影響分析・編集コメントを表示

影響分析

この記事は、大規模テック企業における AI エージェント活用の現実的な課題を浮き彫りにしており、業界全体が「AI による完全自動化」から「AI と人間の協調（Human-in-the-loop）」への回帰を迫られていることを示唆しています。特に、セキュリティと信頼性を最優先する姿勢は、他の企業や開発者にとって重要な指針となり、過度な期待を抑制し、実用的な導入戦略を見直す契機となるでしょう。

編集コメント

「魔法」のような効率性を謳うキーノートと、現場のエンジニアが直面する現実的なリスク管理の間にある緊張関係が鮮明に描かれています。これは、AI ツールの導入において技術的ポテンシャルだけでなく、組織文化や人材戦略の再構築がいかに重要かを如実に示す事例です。

INTERVIEW Amazon Stores のディレクターである Steve Tarcza は、彼のチーム「StoreGen」は小売大手の開発者がより迅速に動き、摩擦を減らすのを支援するために存在すると述べています。しかし、AI による要請があるにもかかわらず、一つだけ譲れない原則があります：人間が最初に確認しない限り、何も出荷されないということです。このユニットの焦点は AWS の顧客ではなく、巨大な小売サイトとオペレーションのための Amazon 内部の開発チームにあります。

Tarcza は先週行われた AWS ロンドンサミットで私たちに話しましたが、私たちは直ちに彼に会いました。その直前に行われた基調講演では、UK アイルランド担当の VP 兼 managing director の Alison Kay が、参加者に対して AI テクノロジーは「魔法のように感じる」と述べました。

Kay は例として、生成 AI サービスである Bedrock（ベッドロック）の背後にある推論エンジンが、Kiro（キロ）というエージェント型コーディングサービスの利用により、6 人のエンジニアによってわずか 76 日で再構築されたことを挙げました。「エンジニアたちが眠っている間も、エージェントは建設を続けていました」と彼女は述べ、彼らが「コードを書き、テストし、バグを見つけ、修正し、24 時間体制でデプロイした」様子を説明しました。

The Register は Tarcza に、AI のよく知られたセキュリティや信頼性の問題により、これについてある種の懸念や不安があるのではないかと提案しました。彼のチームが遭遇した具体的な問題はどのようなものですか？「それは誰もが知っていることです」と彼は私たちに言います。「ハルシネーション（幻覚）です。また、ガードレール内にとどめることです。」

彼によると、AI が「あなたが要求していない作業まで行い、あなたの意図を超えて進んでしまう」ケースもあるそうです。

Tarcza は spec-driven development（仕様駆動型開発）の熱心な支持者です。これは 2025 年 7 月の初公開時に Kiro の主要機能であったもので、AI がコードを書く前に、改善と承認のためのタスクセットを生成するという考え方に基づいています。

spec-driven development はハルシネーションやプロンプトインジェクションのような問題を解決するのでしょうか？「いいえ」とTarczaは答えます。「最良の場合でも軽減されるだけです。それでもなお、仕様に従わないケースがあります。」

Kiro が昨年のサービス障害に関与した可能性がありますが、これは公式には否定されており、その事故は従業員のミスに起因するとされています。

では、どのようにしてエージェント型 AI を安全かつ信頼性の高いものにするのでしょうか？「エンジニアが常に出力を確認しなければならないという立場をとっています。誰かが確認し検証しない限り、何も出荷されません」と Tarcza は私たちに伝えます。「仕様駆動型開発は、それが人々が望む形にほぼ整うため、その時間を短縮するのに役立ちます。」

エンジニアがコードを検証する必要があるなら、それは彼らにそのスキルがあることを意味しますが、AI の進展に伴い AWS を含む企業がエンジニアの削減を進めている中で、人間が一度もレビューしていないコードが本番環境に投入される可能性が高まっているのではないか？「私はこれについて非常に強い立場をとります」と Tarcza は言います。「放置すれば、それはあなたが起こりうると考える自然な帰結です。しかし、それは誤った結果だと考えます… 私たちはより若手エンジニアが入ってくる状況に至ってはなりません。私たちは人材を成長させ続けなければなりません。これらのシステムを維持する人がいなくなるような状況に陥ってはいけません。」

すべての手動レビューが必要とされる中で、AI 開発を魔法と呼ぶことが適切なのでしょうか？「そう呼べることもあります」と Tarcza は言います。「私たちのエンジニアは、コアエンジニアリング、コードの記述、ソフトウェア設計に取り組む時間に全体の 30% 未満しか費やしていません。彼らはプロセスの他の側面にも多くの時間を費やしています。私たちが行ったのは摩擦を取り除くことで、彼らが常にステータスレポートを書き続ける必要がなくなったことです。これらのフェーズをより速く通過できるという意味で、それは魔法の箱です。」

「しかし、ステップ 1 から最終ステップまでを一気に達成する魔法の箱という考え方は存在しません。そして、それが私たちが望む世界だとは思いません。」

基調講演のテーマはエージェントの時代でしたが、Tarcza は「agentic AI（エージェント型 AI）」という言葉にはあまり乗り気ではありません。「私たちは人間が主導するプロセスに焦点を当て、その中心に AI を据えて再構築すべきだと考えます」と彼は私たちに伝えます。

それでもなお、彼もデプロイのような agentic actions（エージェント行動）や AI 生成コードについては同じ見解を持っています。「現在、AI が行う変更のあらゆるステップには、人間の承認が必要です」と彼は言います。「文書を読んでほしい誰かに公開することまで、すべてその範囲に含まれます。」

さらに「少なくとも Stores では、デプロイ支援に AI を使用していません。AWS が提供した優れたメカニズムがあり、それは決定論的な自動デプロイを行うものです。決定論的なシステムを持ち、望む結果を達成できるなら、それが好ましいです」と付け加えました。

Tarcza が説明するのは、直前の基調講演で耳にした息を呑むような agentic AI の過剰な宣伝とは対照的に見える、慎重なアプローチです。

トークンコストの上昇に伴い、AI は依然として価値があるのでしょうか？「これについての最高レベルの思考は、大きなイノベーションを見逃すコストは何でしょうか？行わないことのほうが、トークンコストよりもほぼ確実に高くなるはずです」と彼は言います。®

原文を表示

INTERVIEW Steve Tarcza, director of Amazon Stores, says his team — StoreGen — exists to help the retail giant's developers move faster and cut friction. But despite the AI mandate, one principle is non-negotiable: nothing ships without a human checking it first. The unit focuses not on AWS customers, but on Amazon's internal development teams for its mammoth retail site and operations. Tarcza spoke to us at the AWS London Summit last week, we met him immediately following the keynote at which Alison Kay, VP and managing director UK and Ireland, told attendees that AI technology feels "like magic." Kay cited as an example how the inference engine behind Bedrock, a generative AI service, was rebuilt in 76 days by six engineers, thanks to using the Kiro agentic coding service. "While the engineers slept, the agents kept building," she said, describing how they "wrote code, tested it, found bugs, fixed them, and deployed it around the clock." The Register suggested to Tarcza that there is some trepidation and worry about this, thanks to well-known security and reliability issues with AI. What issues have his team encountered? "It's the things everybody knows about," he tells us. "It's the hallucinations, it's keeping it within the guardrails." There are cases, he said, where the AI is "even doing work that you didn't ask it to, going further than you wanted to." He is an enthusiast for spec-driven development, which was the key feature of Kiro when it was first previewed in July 2025, the idea being that the AI generates a set of tasks for refinement and approval before writing any code. Does spec-driven development solve problems like hallucination and prompt injection? "No," says Tarcza. "It reduces it at best. And even then, there are cases where it still does go beyond the specification." Kiro was possibly involved in a service outage last year though this was officially denied, with the incident blamed on an employee error. How, then, can agentic AI be made secure and reliable? "We've taken the position that engineers always have to be looking at the output. Nothing ships without someone looking at it and validating it. Spec-driven development helps reduce how much time that takes, because it is then in roughly the form that folks want it to be in," Tarcza tells us. If engineers have to review the code, that implies they have the skills to do so, but with companies – including AWS - busy laying off engineers in the light of AI advances, is it not increasingly likely that code no human has reviewed will go into production? "I take a very strong stance on this," Tarcza says. "If left on its own, that's a natural conclusion to what you may see happen. I think that's a wrong outcome … we can't get to the point where we don't have more junior engineers coming in. We have to continue to grow the talent. We can't end up in a spot where there are not folks to maintain these systems." With all the manual review required, is it right to call AI development magic? "It can be," says Tarcza. "Our engineers spend less than 30 percent of their time working on core engineering, writing code, doing software designs. They spend a lot of time on other aspects of the process. What we've done is remove the friction so they're not writing status reports all the time. It is a magic box in that you can get through these phases faster. "But the idea of it being a magic box that gets you from step one to the final step, it's not there. And I don't think that's the world we want to have." The theme of the keynote was the age of agents but Tarcza is not keen on the term "agentic AI." "I think we should be focusing on taking human-driven processes and re-architecting them with AI at the center," he tells us. Nevertheless, he has the same view regarding agentic actions such as deployment as for AI-generated code. "Right now, every mutating step that an AI might do requires a human to approve it," he says. "That is all the way down to publishing a document for someone to read." He adds that "at least in Stores, we aren't using AI to assist with the deployment. We have great mechanisms that AWS has provided to do automated deployment that is deterministic. If we can have a deterministic system and it accomplishes the outcome that we want, that's preferred." What Tarcza describes is a measured approach that seems in contrast to the breathless agentic AI hype we heard in the preceding keynote. With token costs rising, is AI still worth it? "The highest level thinking about this is, what's the cost of missing a big innovation? The cost of not doing it is almost guaranteed to be higher than the token cost," he says. ®

この記事をシェア

AWS Machine Learning Blog重要度42026年6月26日 23:38

Stripe の金融コンプライアンス向け本番級 AI エージェント：AWS ベッドロックでの構築教訓

AWS Machine Learning Blog重要度42026年6月26日 01:35

AWS で現代的なデータメッシュ戦略を用いたエージェント型 AI アプリケーションの構築

LangChain Blog重要度42026年6月25日 23:53

最高の AI エージェントはシンプルである：Sierra の Zack Reneau-Wedeen が語る、Max Agency Podcast での議論

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む