Vercel Blog·2026年6月25日 16:00·約13分で読める

Vercel でエージェントに製品デザインを教える

TL;DR

Vercel は、コードベースに存在しない設計判断の文脈を AI エージェントに提供するため、「product-design」と呼ばれる構造化されたシステムとスキルを導入し、AI による UI 生成の質と一貫性を飛躍的に向上させるアプローチを発表した。

AI深層分析2026年6月25日 21:02

重要/ 5段階

深度40%

キーポイント

設計判断の文脈化（Contextualization）

コーディングエージェントは既存のパターンを模倣できるが、なぜそのパターンが存在するのかという「理由」を理解できないため、Vercel は製品決定をコード同様に扱い、リポジトリ内で管理・共有する仕組みを構築した。

3 要素からなるシステム構成

このアプローチは、判断が必要な際の文脈を提供する「エージェントスキル」、ルールを自動執行する「リンター」、そして Slack や Figma などの非構造化データから証拠を集めてガイドラインを更新する「レビューループ」の 3 つで構成される。

リポジトリ内での構造化実装

AGENTS.md、SKILL.md、references/、exemplars/ などのファイル構造をリポジトリ内に配置し、エージェントがタスク（形状、実装、レビュー等）に応じて適切な文脈を動的に読み込むルーティングロジックを実装している。

汎用可能なテンプレート化

Vercel の具体的な実装例を示すことで、他のチームも自社の標準や基準に合わせて同様の構造を構築し、AI エージェントの自律的な品質向上を実現できる道筋を示している。

検出された問題の追跡可能性と検証プロセス

提案されたパッチは、リポジトリのビルド、テスト、リンターを含む安全な Vercel サンドボックスで検証され、変更が確定する前に安定した ID を持つルールを通じて根拠を追跡可能にします。

文脈に応じた静的チェックと動的判断の使い分け

2〜3 個の選択肢がある場合など明確なパターンには高速なリンターを使用し、破壊的アクションの名付けや製品コンテキストが必要な判断は設計スキルが担当することで、効率的かつ正確なフィードバックを実現します。

実装済み事例に基づく評価と継続的な改善

未見のインターフェースに対してエージェントの挙動をテストする評価（evals）を行い、実装済みのコードが欠陥を含んでいる場合でもそれを再現せず改善することをスコアリング基準に含めてガイドラインの精度を高めています。

影響分析・編集コメントを表示

影響分析

この記事は、生成 AI が単なるコード生成ツールから、設計思想を理解し自律的に判断できる高度なパートナーへと進化するための具体的な実装パターンを示した点で極めて重要です。特に「文脈の欠如」が AI エージェントの限界要因となっている現状に対し、組織的なナレッジを構造化して AI に注入する手法は、今後 AI 開発チームにおける標準プラクティスとして急速に普及すると予想されます。

編集コメント

AI エージェントの能力向上において、単なるプロンプトエンジニアリングではなく、組織ナレッジを構造化してシステムに埋め込む「インフラ設計」の重要性が浮き彫りになった一報です。

コーディングエージェントは素早く動作する UI を生成できますが、異なる形状のものを生み出すのはより困難です。彼らは製品のスタイルをコピーし、パターンに合わせ、その慣習に従おうとします。しかし、なぜそれらのパターンが存在するのかを理解することはできません。コードはエージェントに「何が shipped されたか」を示しますが、「なぜ特定のコンポーネント、フレーズ、またはインタラクションが標準となったのか」という理由までは示しません。その推論プロセスはデザインレビュー、PR のコメント、Slack のスレッド、そして会議室にいた人々の間に存在します。エージェントにとって、コードベースに含まれていない文脈は存在しないのと同じです。

Vercel はエージェントネイティブなチームです。私たちは受け入れられた製品上の決定をコードのように扱い、それらをリポジトリ内に保持し、変更をそれらに対してレビューし、そこで働くすべてのエージェントが利用できるようにしています。

そのための仕組みとして「product-design」があります。これは 3 つの要素からなるシステムです:

コーディングエージェントに、製品やコードベースに関する判断が必要な決定の背後にある文脈を提供するエージェントスキル。

明確なルールを自動的に強制するリンター。

Slack、Figma、GitHub から証拠を集め、ガイドラインの更新案をレビュー用に準備するレビューループ。

どのチームでも、自社の基準を中心に同様の構造を構築できます。

product-design スキルの内部

このスキルは、自身が管理するコードと同じリポジトリ内に存在します。その構造の簡略化された概要は以下の通りです:

リポジトリ内の AGENTS.md ファイルが、コーディングエージェントにいつスキルを読み込むべきかを指示します。スキルローカルの AGENTS.md ファイルが読み込み順序、検証、ガバナンスを定義します。SKILL.md ファイルがランタイムワークフローを管理します。

references/ ディレクトリには、製品判断力、インターフェース品質、レジリエンス、コピー、正規の製品名、インタラクションパターン、および表面固有の意思決定が格納されています。

exemplars/ ディレクトリには、出荷されたプルリクエストから繰り返すべき意思決定と、避けるべきミステイクが文書化されています。coverage-gaps.md には、まだ標準が確立されていない領域の一覧が記載されています。

copywriting-eval/ は、コピーおよびインターフェース言語の動作を検証します。広範な製品デザインワークフローを評価するものではありません。

スキルルーティングの方法

SKILL.md はまずリクエストモード（形状、実装、レビュー、コピー、強化）を解決します。これにより、監査が編集に、コピーパスが再設計に拡大することを防ぎます。バックエンド専用作業、テレメトリ、コンソールエラー、生成されたファイル、および出荷済み UI への影響がないテストはスキップされます。

このスキルは、それらを複製するのではなく、正規のソースへルーティングします。コンポーネント API、デザインシステムルール、アクセシビリティ基準、インタラクションガイダンスは、それぞれの所有者が管理します。

ルーティングはタスクと表面の両方に固有です。マテリアル変更では、まず製品判断力とインターフェース品質をロードします。コピー、コンポーネント、レイアウト、インタラクション、アクセシビリティ、レジリエンスはそれぞれに焦点を当てた参照へルーティングされます。モーダルでは破壊的アクションパターンと正規の動詞がロードされ、設定フォームではラベル、バリデーション、段階的開示、およびアクセシブル名ガイダンスがロードされます。

この簡略化された構造を開始点として使用し、パスと標準を独自のものと置き換えることができます：

ルーティングは、このスキルを有用にする要素の一部に過ぎません。もう一つの重要な部分は、スキルによって発見された課題が、その後にどのように追跡可能として維持されるかです。

発見事項を追跡可能にする

コピールールには安定した ID が割り当てられ、それらは正規のソースを指し示します:

Vercel Agent がパッチを提案する際、提案を投稿する前に、リポジトリのビルド、テスト、リンターを使用して、安全な Vercel サンドボックス内で変更を検証します。

より迅速なフィードバックのためにリンターを活用する

リンターがルールを確実に適用できる場合、私たちは決定論的なチェックを好みます。リンターは実行が高速でコストも低いため、開発者やコーディングエージェントは後日のレビューを待たずに、作業中にフィードバックを得ることができます。

コードには 2 つまたは 3 つの静的オプションが存在する場合があり、その場合リンターはラジオボタンの使用を推奨できます。破壊的なアクションに対して適切なオブジェクトと結果名を指定するには製品コンテキストが必要となるため、この部分はスキルが担当します。

コードベース内の例としては、以下のようなルールがあります:

フォーカス管理、キーボードナビゲーション、レイヤーリングを破綻させるネストされたモーダルを防ぐ。

2 つまたは 3 つの静的オプションに対してセレクトボックスではなくラジオボタンを推奨し、すべての選択肢が常に表示されるようにする。

アイコンボタンやフォームコントロールにアクセシブルな名前を要求し、共有フォーカストークンを迂回するカスタムフォーカスリングを拒否する。

レイアウトクラスは許可しつつ、className がデザインシステムのコンポーネントの色、半径、またはシャドウを上書きすることを防ぐ。

長いコンテンツが正しくスクロールし、ヘッダーとフッターが固定されたまま表示されるようにするため、Modal.Body の使用を必須とする。

生の影（シャドウ）をテーマ対応型の Material クラスに置き換え、Material の組み込み処理と重複するボーダーは拒否してください。

4px グリッドから外れた任意のスペーシングを検出し、該当する標準ユーティリティが存在する場合はそれを提案してください。

各ルールでは、なぜそのパターンが問題となるのかを説明し、具体的な修正方法を提示します。一部のルールは、非推奨の Tailwind ユーティリティ名を置き換えるなど、安全な移行を自動修正（autofix）します。

承認された決定には、以下のいくつかの形式があります：

Geist コンポーネントに関連する箇所に記載される人間が読みやすいガイダンス（例：Checkbox のベストプラクティス）。

製品デザインスキルにおけるエージェントへのガイダンス。

コードで確実に検証可能な場合のリントレール（lint rule）。

以下のリントレールは、ある製品ガイドラインがどのように決定論的なチェックとしてエンコードされるかを示しています：

これら各項目は自動的にミスのクラスを検出するため、実際の判断を要する決定にのみコードレビューのリソースを割くことができます。

ガイダンスの評価（evals）によるテスト方法

リントレールは決定論的ですが、エージェントの動作は変動しうるため、未見のインターフェースに対してスキルをテストします。

エージェントが「before state」（修正前状態）を編集した後、ジャッジが結果をルブリック（評価基準）に基づいて検証します。

評価データは、スキルに文書化された実稼働例から取得されます。ホールドアウト（保留セット）では期待される編集内容を隠し、ガイダンスの一般化能力をテストします。また、スキルなしでフィクスチャを実行し、それがエージェントの動作に変化をもたらしたかを測定します。

ルール正答性と実稼働結果への類似度は別々にスコアリングされます。実稼働コードには欠陥が含まれている可能性があり、エージェントはそれを再現するのではなく改善すべきだからです。

ガイダンスを最新の状態に保つ

製品標準は、コンポーネント、名称、ワークフロー、および失敗状態の変化に伴って変化し、すべての更新には証拠と人間のレビューが必要です。

毎週の証拠収集ワークフローでは、製品デザインを改善する可能性のあるデザインフィードバックを集めます。Slack の会話から検索を行い、Figma ファイルへのリンク、プルリクエスト、レビューコメント、プレビューなどを証拠として保存します。証拠が不十分な場合、検証に必要なコードまたはコミットを記録します。

このワークフローでは、収集と判断を分離しています。

コレクターはルールを提案することなく、メッセージ、リンク、および周辺のコンテキストを集めます。

別の判別者が証拠をグループ化し、ソースを検証し、未解決の質問を記録します。

このジョブは、候補、却下されたトピック、フォローアップ要求、カバレッジのギャップを含むレビューパケットを作成します。

すべての候補はソースへのリンクを持ち、保留状態のままです。経験豊富なレビュアーからのコメントで優先度を上げることができますが、すべての候補には依然として証拠が必要です。

自動化はレビューパケット作成で終了します。人間が、候補をエージェントガイダンス、リントレール、例、評価のいずれにするか、あるいは変更なしとするかを決定します。承認された変更は、最も関連性の高いファイルに格納され、マージ前に該当するチェックを通過します。

コードベースに製品デザインを組み込む方法

私たちのセットアップは Vercel の製品、コンポーネント、レビュー履歴を反映していますが、他のチームも自社の標準に合わせてこの構造を適応させることができます。

繰り返しの意思決定から始める

同じレビューコメントが繰り返し現れる製品面を一つ選びます：破壊的なアクション、エラー状態、設定フォーム、空の状態、またはナビゲーションです。 shipped code（出荷済みコード）や実際のレビューから例を集め、決定内容、その重要性、例外、および出典を書き留めてください。

明確、洗練された、直感的といった広範な形容詞で始めないでください。エージェントには観測可能な決定が必要です。「破壊的なアクションは動詞＋名詞の形式が使いやすい」は具体例ですが、「ボタンは明確であるべきだ」は具体性に欠けます。

他の面へ展開する前に、ご自身の製品面に固有の項目をすべて埋めてください。

明示的なトリガーと確固たる境界を設定する

永続リポジトリの指示においてエージェントがいつスキルを読み込むかを伝え、対象となるファイルと面、そして回避すべき領域を定義してください。別々の Next.js 評価では、利用可能なスキルを呼び出さなかったケースが 56% に達しました。スキルの読み込み失敗とルールの遵守失敗は異なる問題であるため、トリガーはガイダンスとは別にテストしてください。

エージェントに、読み込んだ面と参照先を報告させ、その発見がそれらのソースを引用しているか検証してください。

ルーティング、ルール、証拠を分離する

短縮されたエントリーポイントを使用して対象の面を特定し、焦点を絞った参照を読み込みます。詳細は、レビュアーがすでに議論している面と決定を中心に整理します：フォーム、モーダル、ナビゲーション、製品用語、ワークフロー状態、および跨面的パターンです。

ルールには安定した ID を付与し、例やソースとリンクさせてください。実装された例については、有用な判断と既知の欠点の両方を記録し、カバーされていないガイドラインはカバレッジギャップリストとして可視化してください。

カバレッジギャップリストにより、不足しているガイドラインを明確にします。

明確なルールにはコードを使用する

リンターが問題を確実に特定できる場合は、その場でルールを適用してください。判断に製品やコードベースの文脈が必要な場合は、エージェントガイダンスを利用してください。新しい基準、ポリシーの選択、未解決の製品上の決定は人間が担当するようにしてください。

トレーニング用フィクスチャは、文書化された例と、スキル内で期待される編集が現れないインターフェースからのホールドアウトから構築します。検索と適用は別々にテストしてください。なぜなら、エージェントがスキルを読み込んだかどうかと、ルールに従ったかどうかは異なる問いだからです。

例外が多すぎるとルールが信頼できなくなる場合は、そのルールをエージェントガイダンスに戻してください。

所有権と更新ループを割り当てる

新しい証拠は定期的にレビューしますが、ガイダンスやチェックを変更する前には必ず人間の承認を得てください。何が変わり、なぜ変わったのか、どのソースがそれを裏付けたかを記録した意思決定ログを維持してください。新しいルールは製品変更として扱い、それぞれをレビュー・テストし、もはや役立たなくなったものは削除してください。

まずは、チームがすでに繰り返している一つの表面と、その判断から始めましょう。その判断はコードが記述されレビューされる場所に置き、標準となる部分については人間に責任を持たせてください。

自分たちで構築する

最も難しいのは最初のインターフェースを選ぶことです。すべてのチームには、コード化する価値のある意思決定があります。問題は、それが誰かの頭の中にあるのか、それともエージェントが見つめられる場所にあるのかです。このパターンを使って何かを構築した場合や、セットアップ方法について質問がある場合は、お知らせください。

原文を表示

Coding agents can produce working UI fast, but what's harder is a different shape. They can copy your product's style, match its patterns, and try to follow its conventions. What they cannot do is understand why those patterns exist. Code shows agents what shipped, not why one component, phrase, or interaction became your standard. That reasoning lives in design reviews, PR comments, Slack threads, and with the people who were in the room. For an agent, context that isn't in the codebase doesn't exist.

Vercel is an agent-native team. We treat accepted product decisions like code, keeping them in the repository, reviewing changes against them, and making them available to every agent working there.

The way we do this is through product-design. It's a system with three parts:

An agent skill that gives coding agents the context behind decisions that require product or codebase judgment.

Linters that enforce clear rules automatically.

A review loop that gathers evidence from Slack, Figma, and GitHub, then prepares guideline updates for review.

Any team can build the same structure around their own standards.

Inside the product-design skill

The skill lives inside the repository alongside the code it governs. Here's a simplified view of its structure:

The repository AGENTS.md tells coding agents when to load the skill. The skill-local AGENTS.md defines load order, validation, and governance. SKILL.md owns the runtime workflow.

references/ stores product-judgment, interface-quality, resilience, copy, canonical product names, interaction patterns, and surface-specific decisions.

exemplars/ documents decisions worth repeating from shipped pull requests, along with mistakes to avoid. coverage-gaps.md lists areas where we do not have a standard yet.

copywriting-eval/ tests copy and interface-language behavior. It does not evaluate the broader product-design workflow.

How the skill routes

SKILL.md resolves the request mode first: shape, implement, review, copy, or harden. This keeps audits from becoming edits and copy passes from expanding into redesigns. It skips backend-only work, telemetry, console errors, generated files, and tests with no shipped UI impact.

The skill routes to canonical sources instead of duplicating them. Component APIs, design-system rules, accessibility criteria, and interaction guidance stay with their owners.

Routing is specific to both task and surface. Material changes load product-judgment and interface-quality first. Copy, component, layout, interaction, accessibility, and resilience work each route to focused references. A modal loads destructive-action patterns and canonical verbs. A settings form loads labels, validation, progressive disclosure, and accessible-name guidance.

You can use this simplified structure as a starting point and replace the paths and standards with your own:

Routing is only part of what makes the skill useful. The other part is how findings stay traceable once the skill produces them.

Make findings traceable

Copy rules have stable IDs and point to their canonical sources:

When Vercel Agent proposes a patch, it validates the change in a secure Vercel Sandbox with the repository's builds, tests, and linters before posting the suggestion.

Use linters for faster feedback

We prefer deterministic checks when a linter can enforce a rule reliably. Linters are fast and cheap to run, so developers and coding agents get feedback while they work instead of waiting for a later review.

Code can count two or three static options, so a linter can recommend radio buttons. Naming the right object and consequence for a destructive action requires product context, so the skill handles it.

Examples in the codebase include rules that:

Prevent nested modals, which break focus management, keyboard navigation, and layering.

Recommend radio buttons instead of a select for two or three static options, so every choice stays visible.

Require accessible names for icon buttons and form controls, and reject custom focus rings that bypass shared focus tokens.

Prevent className from overriding a design-system component's color, radius, or shadow while still allowing layout classes.

Require Modal.Body so long content scrolls correctly and headers and footers can remain sticky.

Replace raw shadows with theme-aware Material classes and reject borders that duplicate a Material's built-in treatment.

Flag arbitrary spacing that falls off the 4px grid and suggest a standard utility when one exists.

Each rule explains why the pattern is a problem and suggests a concrete fix. Some rules autofix safe migrations, such as replacing deprecated Tailwind utility names.

Accepted decisions can take several forms:

Human-readable guidance next to the relevant Geist component, such as Checkbox best practices.

Agent guidance in the product-design skill.

A lint rule when code can check it reliably.

The lint rule below shows how one product guideline is encoded as a deterministic check:

Each of these catches a class of mistake automatically, freeing code review for the decisions that actually require judgment.

How we test the guidance with evals

Lint rules are deterministic, but agent behavior can vary, so we test the skill on interfaces it has not seen before.

An agent edits a before state, then a judge checks the results against a rubric.

Evals come from shipped examples documented in the skill. Holdouts hide their expected edits, testing whether the guidance generalizes. We also run fixtures without the skill to measure whether it changed the agent's behavior.

We score rule correctness separately from similarity to the shipped result. Shipped code can contain a flaw that the agent should improve instead of reproduce.

Keep the guidance current

Product standards change as components, names, workflows, and failure states change, and every update needs evidence and human review.

Our weekly evidence-intake workflow collects design feedback that may improve product-design. It searches Slack conversations and preserves links to Figma files, pull requests, review comments, and previews as evidence. When evidence is incomplete, it records the code or commit needed for verification.

The workflow separates collection from judgment:

A collector gathers messages, links, and nearby context without proposing rules.

A separate judge groups the evidence, verifies sources, and records open questions.

The job creates a review packet with candidates, rejected topics, follow-up requests, and coverage gaps.

Every candidate links to its source and remains pending. A comment from an experienced reviewer can raise its priority, but every candidate still needs evidence.

Automation ends with the review packet. A human decides whether a candidate becomes agent guidance, a lint rule, an example, an eval, or no change. Accepted changes go into the narrowest relevant file and pass the relevant checks before merging.

How to build product-design into your codebase

Our setup reflects Vercel's product, components, and review history, but other teams can adapt the structure to their own standards.

Start with repeated decisions

Choose one product surface where the same review comments keep appearing: destructive actions, error states, settings forms, empty states, or navigation. Collect examples from shipped code and real reviews, and write down the decision, why it matters, exceptions, and the source.

Avoid starting with broad adjectives like clear, polished, or intuitive. Agents need observable decisions. Destructive actions use Verb + Noun is usable. Buttons should be clear is not.

Fill in the fields specific to your surface before expanding to others.

Add an explicit trigger and firm boundaries

Tell agents when to load the skill in persistent repository instructions, and define the files and surfaces it covers along with the areas it must skip. In separate Next.js evals, agents failed to invoke an available skill in 56% of cases. Test the trigger separately from the guidance, because failing to load the skill and failing to follow a rule are different problems.

Ask the agent to report which surfaces and references it loaded, then verify that its findings cite those sources.

Separate routing, rules, and evidence

Use a short entry point to identify the surface and load focused references. Organize the details around surfaces and decisions reviewers already discuss: forms, modals, navigation, product vocabulary, workflow states, and cross-surface patterns.

Give rules stable IDs and link them to examples and sources. Record shipped examples with both useful decisions and known flaws, and keep missing guidance visible in a coverage-gap list.

A coverage-gap list makes missing guidance explicit.

Use code for clear rules

If a linter can identify a problem reliably, enforce the rule there. Use agent guidance when the decision needs product or codebase context. Keep new standards, policy choices, and unresolved product decisions with people.

Build training fixtures from documented examples and holdouts from interfaces whose expected edits do not appear in the skill. Test retrieval and application separately, because whether the agent loaded the skill and whether it followed the rule are different questions.

If a rule cannot stay reliable without many exceptions, move it back to agent guidance.

Assign ownership and an update loop

Review new evidence regularly, but require human approval before changing the guidance or checks. Keep a decision log that records what changed, why, and which source supported it. Treat new rules as product changes, reviewing and testing each one, and removing those that stop helping.

Start with one surface and the decisions your team already repeats. Put those decisions where code is written and reviewed, and keep people responsible for what becomes a standard.

Build your own

The hardest part is picking the first surface. Every team has decisions worth encoding. The question is whether they live in someone's head or somewhere agents can find them. If you build something using this pattern or have questions about how we set it up, let us know.

この記事をシェア

AWS Machine Learning Blog★42026年6月26日 01:35

AWS で現代的なデータメッシュ戦略を用いたエージェント型 AI アプリケーションの構築

AWS は、顧客サービスエージェントが自律的にデータベースを照会し回答を合成する際、組織内の複数のデータソースにまたがるガバナンスアクセスが必要であると指摘。現代のデータメッシュでは、データ相互作用チェーンの各層で厳密なアクセス制御を適用することが重要であるとしている。

LangChain Blog★32026年6月26日 00:04

SmithDB の全文検索用逆インデックス構築の仕組み

LangChain チームが、SmithDB で高速な全文検索を実現するために採用した逆インデックスの設計手法と実装プロセスを解説している。

Vercel Blog★42026年6月25日 22:00

AI SDK 7 の発表

Vercel は、週に 1600 万回のダウンロードがある TypeScript 製 AI SDK の新バージョン「7」を発表した。このアップデートにより、推論制御やツール承認機能など、エージェント開発の生産性を高める機能が強化された。

今日のまとめ

AI日報で今日の重要ニュースをまとめ読み

ニュース一覧に戻る元記事を読む