Open SWE:社内コーディングエージェントのためのオープンソースフレームワーク
LangChainはStripeやRampなどの実務例から抽出した設計パターンを統合し、内部向けコーディングエージェント構築のためのオープンソースフレームワーク「Open SWE」を公開した。
キーポイント
生産環境での設計パターン収束
Stripe, Ramp, Coinbaseの事例から、サンドボックス実行・ツール厳選・Slack統合・サブエージェント委譲などの共通アーキテクチャが抽出された。
Open SWEの技術基盤と実装
Deep AgentsとLangGraphを基盤とし、実証済みの設計パターンを実装可能な形で提供している。
運用上の最適化とリスク管理
実行環境の隔離、約500個に厳選されたツールセット、Rich Contextの事前取得により、実務でのオーバーヘッドとリスクを最小化している。
Deep Agentsフレームワークとの構成(Composition)
既存エージェントのフォークやゼロからの構築ではなく、Deep Agentsを基盤に構成することで、アップグレードパスの確保と組織固有のカスタマイズ(ツール・プロンプト)を設定レベルで管理可能にしている。
分離型クラウドサンドボックス
各タスクを独立したLinux環境で実行し、スレッド単位での永続化・並列実行・自動再生成をサポートする。アクセス制御は「まず隔離し、境界内で権限を付与」の原則に従う。
厳選されたツールセット設計
テスト・保守・推論を容易にするため、必要最小限のツールに絞り込む。組織固有のシステムやAPIが必要な場合は明示的に追加する設計哲学を採用している。
コンテキストの自動注入と二層アプローチ
`AGENTS.md`やLinear/Slackの完全なスレッド履歴をシステムプロンプトに直接注入し、リポジトリ全体の規約とタスク固有の情報を追加ツール呼び出しなしで提供します。
影響分析・編集コメントを表示
影響分析
LangChainが実務で検証されたエージェント設計パターンをオープンソース化することで、企業内のAIコーディングエージェント構築の参入障壁が下がる。これにより、独自開発から標準フレームワークへの移行が進み、業界全体のセキュリティと運用効率の基準が統一される可能性がある。
編集コメント
LangChainが実務事例を抽象化して公開した点は評価できるが、フレームワークの採用には既存セキュリティポリシーとの整合性確認が必須となる。実証データ付きのベンチマーク公開を期待したい。
組織向けのカスタマイズ
Open SWEは完成品ではなく、カスタマイズ可能な基盤として設計されています。すべての主要コンポーネントは差し替え可能です:
サンドボックスプロバイダー: Modal、Daytona、Runloop、LangSmithの間で切り替えられます。内部インフラ要件がある場合は、独自のサンドボックスバックエンドを実装できます。
モデル: 任意のLLMプロバイダーを使用できます。デフォルトはClaude Opus 4ですが、異なるサブタスクに対して異なるモデルを設定できます。
ツール: 内部API、デプロイメントシステム、テストフレームワーク、監視プラットフォーム用のツールを追加できます。不要なツールは削除できます。
トリガー: Slack、Linear、GitHub統合ロジックを変更できます。メール、ウェブフック、カスタムUIなどの新しいトリガーインターフェースを追加できます。
システムプロンプト: 基本プロンプトとAGENTS.mdファイルを組み込むロジックをカスタマイズできます。組織固有の指示、制約、規約を追加できます。
ミドルウェア: 検証、承認ゲート、ロギング、安全性チェックのための独自のミドルウェアフックを追加できます。
カスタマイズガイドでは、これらの拡張ポイントそれぞれについて例を交えて説明しています。
内部実装との比較
公開情報に基づく、Open SWEとStripe、Ramp、Coinbaseの内部システムとの比較です:
| 決定事項 | Open SWE | Stripe (Minions) | Ramp (Inspect) | Coinbase (Cloudbot) |
|---|---|---|---|---|
| ハーネス | 構成型 (Deep Agents/LangGraph) | フォーク型 (Goose) | 構成型 (OpenCode) | スクラッチ構築 |
| サンドボックス | 差し替え可能 (Modal、Daytonaなど) | AWS EC2 devboxes (事前ウォーム済み) | Modalコンテナ (事前ウォーム済み) | 社内製 |
| ツール | 約15、厳選 | 約500、エージェントごとに厳選 | OpenCode SDK + 拡張 | MCPs + カスタムスキル |
| コンテキスト | AGENTS.md + 課題/スレッド | ルールファイル + 事前ハイドレーション | OpenCode組み込み | Linear優先 + MCPs |
| オーケストレーション | サブエージェント + ミドルウェア | ブループリント (決定論的 + エージェント的) | セッション + 子セッション | 3モード |
| 起動 | Slack、Linear、GitHub | Slack + 埋め込みボタン | Slack + Web + Chrome拡張機能 | Slackネイティブ |
| 検証 | プロンプト駆動 + PR安全網 | 3層 (ローカル + CI + 1回再試行) | 視覚的DOM検証 | エージェント評議会 + 自動マージ |
コアパターンは類似しています。違いは実装の詳細、内部統合、組織固有のツールにあり、これはフレームワークを異なる環境に適応させる際にまさに予想されることです。
始め方
Open SWEはGitHubで現在利用可能です。
インストールガイド: GitHubアプリ作成、LangSmith設定、Linear/Slack/GitHubトリガー、本番デプロイメントについて説明します。
カスタマイズガイド: 組織向けにサンドボックス、モデル、ツール、トリガー、システムプロンプト、ミドルウェアを交換する方法を示します。
フレームワークはMITライセンスです。フォークしてカスタマイズし、内部でデプロイできます。これに基づいて興味深いものを構築した場合は、ぜひお知らせください。
いくつかのエンジニアリング組織は、内部コーディングエージェントを本番環境に正常にデプロイしています。Open SWEは、異なるコードベースとワークフロー向けにカスタマイズされるように設計された、類似のアーキテクチャパターンのオープンソース実装を提供します。異なるコンテキストで何が機能するかはまだ学習中ですが、このフレームワークはこのアプローチを探求するチームの出発点となります。
Open SWEを試す: github.com/langchain-ai/open-swe
Deep Agentsについて学ぶ: docs.langchain.com/oss/python/deepagents
LangSmithサンドボックスウェイトリストに登録: https://www.langchain.com/langsmith-sandboxes-waitlist
ドキュメントを読む: Open SWEドキュメント
原文を表示
imageOver the past year, we've observed several engineering organizations building internal coding agents that operate alongside their development teams. Stripe developed Minions, Ramp built Inspect, and Coinbase created Cloudbot. These systems integrate into existing workflows (accessible through Slack, Linear, and GitHub) rather than requiring engineers to adopt new interfaces.
While these systems were developed independently, they've converged on similar architectural patterns: isolated cloud sandboxes, curated toolsets, subagent orchestration, and integration with developer workflows. This convergence suggests some common requirements for deploying AI agents in production engineering environments.
Today, we're releasing Open SWE, an open-source framework that captures these patterns in a customizable form. Built on Deep Agents and LangGraph, Open SWE provides the core architectural components we've observed across these implementations. If your organization is exploring internal coding agents, this can serve as a starting point.
Patterns from Production Deployments
When we looked at how Stripe, Ramp, and Coinbase built their coding agents, we noticed they made similar architectural decisions. Here's what these systems have in common:
Isolated execution environments: Tasks run in dedicated cloud sandboxes with full permissions inside strict boundaries. This isolates the blast radius of any mistake from production systems while allowing agents to execute commands without approval prompts for each action.
Curated toolsets: According to Stripe's engineering team, their agents have access to around 500 tools, but these are carefully selected and maintained rather than accumulated over time. Tool curation appears to matter more than tool quantity.
Slack-first invocation: All three systems integrate with Slack as a primary interface, meeting developers in their existing communication workflows rather than requiring context switches to new applications.
Rich context at startup: These agents pull full context from Linear issues, Slack threads, or GitHub PRs before beginning work, reducing the overhead of discovering requirements through tool calls.
Subagent orchestration: Complex tasks get decomposed and delegated to specialized child agents, each with isolated context and focused responsibilities.
These architectural choices have proven effective across multiple production deployments, though organizations will likely need to adapt specific components to their own environments and requirements.
Open SWE's Architecture
Open SWE provides an open-source implementation of similar architectural patterns. Here's how the framework maps to what we've observed:
- Agent Harness: Composed on Deep Agents
Rather than forking an existing agent or building from scratch, Open SWE composes on the Deep Agents framework. This approach is similar to how Ramp's team built Inspect on top of OpenCode.
Composition provides two advantages:
Upgrade path: When Deep Agents improves (better context management, more efficient planning, optimized token usage), you can incorporate those improvements without rebuilding your customizations.
Customization without forking: You can maintain org-specific tools, prompts, and workflows as configuration rather than as modifications to core agent logic.
create_deep_agent(
model="anthropic:claude-opus-4-6",
system_prompt=construct_system_prompt(repo_dir, ...),
tools=[
http_request,
fetch_url,
commit_and_open_pr,
linear_comment,
slack_thread_reply
],
backend=sandbox_backend,
middleware=[
ToolErrorMiddleware(),
check_message_queue_before_model,
...
],
)
Deep Agents provides infrastructure that can support these patterns: built-in planning via write_todos, file-based context management, native subagent spawning via the task tool, and middleware hooks for deterministic orchestration.
- Sandbox: Isolated Cloud Environments
Each task runs in its own isolated cloud sandbox, a remote Linux environment with full shell access. The repository is cloned in, the agent receives complete permissions, and any errors are contained within that environment.
Open SWE supports multiple sandbox providers out of the box:
Modal
Daytona
Runloop
LangSmith
You can also implement your own sandbox backend.
This follows a pattern we've observed: isolate first, then grant full permissions inside the boundary.
Key behaviors:
Each conversation thread gets a persistent sandbox, reused across follow-up messages
Sandboxes automatically recreate if they become unreachable
Multiple tasks run in parallel, each in its own sandbox
- Tools: Curated, Not Accumulated
Open SWE ships with a focused toolset:
Tool
Purpose
execute
Shell commands in the sandbox
fetch_url
Fetch web pages as markdown
http_request
API calls (GET, POST, etc.)
commit_and_open_pr
Git commit and open a GitHub draft PR
linear_comment
Post updates to Linear tickets
slack_thread_reply
Reply in Slack threads
Plus the built-in Deep Agents tools: read_file, write_file, edit_file, ls, glob, grep, write_todos, and task (subagent spawning).
A smaller, curated toolset can be easier to test, maintain, and reason about. When you need additional tools for your organization (internal APIs, custom deployment systems, specialized testing frameworks), you can add them explicitly.
- Context Engineering: AGENTS.md + Source Context
Open SWE gathers context from two sources:
AGENTS.md file: If your repository contains an AGENTS.md file at the root, it's read from the sandbox and injected into the system prompt. This file can encode conventions, testing requirements, architectural decisions, and team-specific patterns that every agent run should follow.
Source context: The full Linear issue (title, description, comments) or Slack thread history is assembled and passed to the agent before it starts, providing task-specific context without additional tool calls.
This two-layer approach balances repository-wide knowledge with task-specific information.
- Orchestration: Subagents + Middleware
Open SWE's orchestration combines two mechanisms:
Subagents: The Deep Agents framework supports spawning child agents via the task tool. The main agent can delegate independent subtasks to isolated subagents, each with its own middleware stack, todo list, and file operations.
Middleware: Deterministic middleware hooks run around the agent loop:
check_message_queue_before_model: Injects follow-up messages (Linear comments or Slack messages that arrive mid-run) before the next model call. This allows users to provide additional input while the agent is working.
open_pr_if_needed: Acts as a safety net that commits and opens a PR if the agent didn't complete this step. This ensures critical steps happen reliably.
ToolErrorMiddleware: Catches and handles tool errors gracefully.
This separation between agentic (model-driven) and deterministic (middleware-driven) orchestration can help balance reliability with flexibility.
- Invocation: Slack, Linear, and GitHub
We've observed that many teams converge on Slack as a primary invocation surface. Open SWE follows a similar pattern:
Slack: Mention the bot in any thread. Supports repo:owner/name syntax to specify which repository to work on. The agent replies in-thread with status updates and PR links.
Linear: Comment @openswe on any issue. The agent reads the full issue context, reacts with to acknowledge, and posts results back as comments.
GitHub: Tag @openswe in PR comments on agent-created PRs to have it address review feedback and push fixes to the same branch.
Each invocation creates a deterministic thread ID, so follow-up messages on the same issue or thread route to the same running agent.
- Validation: Prompt-Driven + Safety Nets
The agent is instructed to run linters, formatters, and tests before committing. The open_pr_if_needed middleware acts as a backstop—if the agent finishes without opening a PR, the middleware handles it automatically.
You can extend this validation layer by adding deterministic CI checks, visual verification, or review gates as additional middleware.
Why Deep Agents
Deep Agents provides the foundation that makes this architecture composable and maintainable.
Context management: Long-running coding tasks can produce large amounts of intermediate data (file contents, command outputs, search results). Deep Agents handles this through file-based memory, offloading large results instead of keeping everything in the conversation history. This can help prevent context overflow when working on larger codebases.
Planning primitives: The built-in write_todos tool provides a structured way to break down complex work, track progress, and adapt plans as new information emerges. We've found this particularly helpful for multi-step tasks that span extended periods.
Subagent isolation: When the main agent spawns a child agent via the task tool, that subagent gets its own isolated context. Different subtasks don't pollute each other's conversation history, which can lead to clearer reasoning on complex, multi-faceted work.
Middleware hooks: Deep Agents' middleware system allows you to inject deterministic logic at specific points in the agent loop. This is how Open SWE implements message injection and automatic PR creation—behaviors that need to happen reliably.
Upgrade path: Because Deep Agents is actively developed as a standalone library, improvements to context compression, prompt caching, planning efficiency, and subagent orchestration can flow to Open SWE without requiring you to rebuild your customizations.
This composability offers similar advantages to what Ramp's team described when building on OpenCode: you get the benefits of a maintained, improving foundation while retaining control over your org-specific layer.
Customization for Your Organization
Open SWE is intended as a customizable foundation rather than a finished product. Every major component is pluggable:
Sandbox provider: Swap between Modal, Daytona, Runloop, or LangSmith. Implement your own sandbox backend if you have internal infrastructure requirements.
Model: Use any LLM provider. The default is Claude Opus 4, but you can configure different models for different subtasks.
Tools: Add tools for your internal APIs, deployment systems, testing frameworks, or monitoring platforms. Remove tools you don't need.
Triggers: Modify the Slack, Linear, and GitHub integration logic. Add new trigger surfaces like email, webhooks, or custom UIs.
System prompt: Customize the base prompt and the logic for incorporating AGENTS.md files. Add org-specific instructions, constraints, or conventions.
Middleware: Add your own middleware hooks for validation, approval gates, logging, or safety checks.
The Customization Guide walks through each of these extension points with examples.
Comparison to Internal Implementations
Here's how Open SWE compares to the internal systems at Stripe, Ramp, and Coinbase based on publicly available information:
Decision
Open SWE
Stripe (Minions)
Ramp (Inspect)
Coinbase (Cloudbot)
Harness
Composed (Deep Agents/LangGraph)
Forked (Goose)
Composed (OpenCode)
Built from scratch
Sandbox
Pluggable (Modal, Daytona, Runloop, etc.)
AWS EC2 devboxes (pre-warmed)
Modal containers (pre-warmed)
In-house
Tools
~15, curated
~500, curated per-agent
OpenCode SDK + extensions
MCPs + custom Skills
Context
AGENTS.md + issue/thread
Rule files + pre-hydration
OpenCode built-in
Linear-first + MCPs
Orchestration
Subagents + middleware
Blueprints (deterministic + agentic)
Sessions + child sessions
Three modes
Invocation
Slack, Linear, GitHub
Slack + embedded buttons
Slack + web + Chrome extension
Slack-native
Validation
Prompt-driven + PR safety net
3-layer (local + CI + 1 retry)
Visual DOM verification
Agent councils + auto-merge
The core patterns are similar. The differences lie in implementation details, internal integrations, and org-specific tooling—which is exactly what you'd expect when adapting a framework to different environments.
Getting Started
Open SWE is available now on GitHub.
Installation Guide: Walks through GitHub App creation, LangSmith setup, Linear/Slack/GitHub triggers, and production deployment.
Customization Guide: Shows how to swap the sandbox, model, tools, triggers, system prompt, and middleware for your organization.
The framework is MIT-licensed. You can fork it, customize it, and deploy it internally. If you build something interesting on top of it, we'd be interested to hear about it.
Several engineering organizations have successfully deployed internal coding agents in production. Open SWE provides an open-source implementation of similar architectural patterns, designed to be customized for different codebases and workflows. While we're still learning what works across different contexts, this framework offers a starting point for teams exploring this approach.
Try Open SWE: github.com/langchain-ai/open-swe
Learn about Deep Agents: docs.langchain.com/oss/python/deepagents
Sign up for the LangSmith Sandboxes Waitlist: https://www.langchain.com/langsmith-sandboxes-waitlist
Read the docs: Open SWE Documentation
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み