WebBrain の紹介:Chrome と Firefox で動作するオープンソースのローカルファースト AI ブラウザエージェント
Emre Sokullu 氏が開発したオープンソースのローカルファースト AI ブラウザエージェント「WebBrain」は、Chrome と Firefox で動作し、データ抽出やタスク自動化を可能にする画期的なツールである。
キーポイント
ローカルファーストとプライバシー保護
基本機能はローカルモデルで完結し、ページデータが外部に送信されないため、認証セッションや機密情報を保持したまま安全に操作可能。
高度な自動化技術(Act Mode)
Chrome DevTools Protocol を活用して DOM 構造を直接制御し、コンテンツスクリプトでは見えないクロスオリジン iframe や Shadow DOM へのアクセスも実現。
セキュリティとプロンプトインジェクション対策
デフォルトで「Ask Mode」から開始し、重要な操作前にユーザー確認を求める設計とし、UI を介した操作を強制することで悪意ある指令を防ぐ。
多言語対応と拡張性
英語、スペイン語、フランス語など 5 カ国語に対応し、必要に応じてクラウド API との接続も可能で、MIT ライセンスで GitHub で公開されている。
コスト削減とモデル構成の最適化
画像の圧縮やコンテキストのトリミングによりトークンコストを抑制し、プランニングには安価なテキストモデル、視覚処理には別々のビジョンモデルを使用する構成が可能です。
多様なローカルおよびクラウドモデル対応
llama.cppやOllamaなどのローカル環境からOpenAIやClaudeなどのクラウドAPIまで幅広くサポートしており、Qwen 3.6 35Bが推奨モデルとして挙げられています。
ブラウザ拡張と開発者向けフレームワークの位置づけ
WebBrainはエンドユーザー向けのチャットパネル型拡張機能であり、ヘッドレスパイプライン用のSDKであるOpenClawやBrowser-Useとは異なるカテゴリに属します。
影響分析・編集コメントを表示
影響分析
このツールは、AI エージェントがブラウザ上で安全かつ自律的にタスクを実行する新たな基準を示しており、特にデータプライバシーへの懸念が高い企業や個人ユーザーにとって重要な選択肢となる。ローカルモデルとクラウド API の柔軟な組み合わせにより、汎用性とセキュリティの両立を図った実用的なアーキテクチャは、今後の Browser Agent 分野の標準的な設計思想に影響を与える可能性がある。
編集コメント
プライバシーを重視するユーザーにとって、クラウド依存の既存プラグインに対する強力な代替案となる。特に「Act Mode」における CDP の活用は、複雑な Web サイトへの自動化アクセスにおいて技術的なブレイクスルーと言える。
WebBrain is a free, open-source browser agent for Chrome and Firefox. It reads pages, extracts data, and automates multi-step tasks. Unlike most browser AI plugins, it can also run entirely on a local model.
It is built by Emre Sokullu and licensed under MIT. The full source lives on GitHub.
Run the agent against a local model, and no page data leaves your machine. Connect a cloud API when you want more capability.
What is WebBrain?
WebBrain lives in your browser’s side panel. In Chrome it uses Manifest V3 and the sidePanel API. In Firefox it uses Manifest V2 and sidebar_action. Each tab keeps its own conversation history.
The extension operates inside your existing authenticated session. It sees your logged-in accounts exactly as you do. It stores no data externally and adds no telemetry or accounts.
The plugin ships in English, Español, Français, Türkçe, and 中文. It auto-detects your browser language on first launch.
Ask Mode, Act Mode, and How Actions Actually Fire
WebBrain has two modes: Ask mode is read-only and cannot change the page. Act mode can click, type, scroll, navigate, and run workflows.
Ask mode reads pages through ordinary content scripts. Act mode is different. It drives the page through the Chrome DevTools Protocol via the chrome.debugger API. That produces trusted input events that modern sites actually honor. It also reaches cross-origin iframes and shadow DOM that content scripts cannot see.
That power is scoped deliberately. WebBrain attaches the debugger only when an action needs it, per tab. Chrome surfaces its standard ‘WebBrain started debugging this browser’ banner while attached. Firefox has no CDP equivalent, so its Act mode is meaningfully weaker.
Temperatures are fixed for predictability. Act mode uses temperature 0.15. Ask mode uses 0.3. Dedicated vision screenshot descriptions use 0.
The Security Model
Browser agents run on an adversarial surface. Web pages can hide prompt injections that hijack an agent’s behavior. WebBrain’s design addresses this directly.
The agent starts in read-only Ask mode. It asks before consequential actions. You can disable those prompts in the Permissions settings. They are on by default.
There is also a UI-first rule for mutations. For anything that creates, sends, submits, or buys, WebBrain uses the visible UI. It refuses to call REST or GraphQL endpoints directly for mutations. A per-conversation /allow-api override exists when the UI genuinely fails.
Reading is treated separately. Fetching a README or comparing prices uses background HTTP through the fetch_url and research_url tools. Reading changes nothing remotely, so the strict rules do not apply.
Use Cases, With Concrete Examples
Data extraction is the obvious one: Open a catalog and ask: ‘Extract all product names and prices from this page.’ The agent reads the structure and returns rows. It also works with PDFs.
Research summaries are another: Ask ‘Summarize this article,’ then follow up with a specific question. WebBrain detects paywalls honestly and does not try to bypass them. It also dismisses common cookie-consent banners before reading.
Form filling suits repetitive signups: An optional Profile auto-fill stores a short bio in local plaintext. That text is sent to your configured LLM to complete low-stakes forms. Keep important passwords out of it.
Automation spans multiple steps: Try ‘Navigate to github.com and find trending repositories.’ In Act mode, the agent chains navigation, reads, and clicks.
Keeping Token Costs Down
Cloud tokens add up on long sessions. WebBrain bounds the cost in three ways.
Screenshots are resized and iteratively JPEG-compressed before they leave your machine. That keeps image tokens small.
Conversation history and tool outputs are trimmed oldest-first as the context window fills.
You can also pair a cheap text model for planning with a separate vision model for screenshots.
How It Compares
WebBrain sits between browser AI plugins and full agent frameworks. Here is the plugin comparison, drawn from the project’s own documentation.
FeatureWebBrainClaude in Chrome
Open sourceMIT LicenseProprietary
PriceFree foreverRequires Claude Pro ($20/mo)
Local LLM supportllama.cpp, OllamaNo — Claude only
Multi-providerAll OpenAI-compatible endpointsClaude only
ChromeYes (MV3)Yes
FirefoxYes (MV2)No
Side panel UIYesYes
Ask / Act modesYesSimilar
Fully offlineYes (with local LLM)No — cloud required
Self-hostableYesNo
Frameworks like OpenClaw or Browser-Use are a different category. Those are developer SDKs for headless pipelines. WebBrain is an end-user extension you drive from a chat panel. You can use both.
Running It: Providers and Setup
WebBrain supports local and cloud models through one interface. Local options include llama.cpp, Ollama, LM Studio, Jan, vLLM, and SGLang. Cloud options include OpenAI, Anthropic Claude, Gemini, Mistral, DeepSeek, and xAI Grok. It also supports Groq, MiniMax, Alibaba Cloud (Qwen), Nvidia NIM, and OpenRouter.
A built-in managed option, WebBrain Cloud, needs no local setup. It costs $5 per month per device profile under a fair-use policy. For local use, llama.cpp needs no API key.
Starting a local server takes one command:
Copy CodeCopiedUse a different Browser
llama.cpp — load at least a 16k-token context window
llama-server -m your-model.gguf -c 16384 --port 8080
Ollama (OpenAI-compatible) — set the extension-origin env var
OLLAMA_ORIGINS="*" ollama serve
then set the base URL to http://localhost:11434/v1 in settings
Point WebBrain at the endpoint in settings. For a cross-machine vLLM server, enable CORS with –allowed-origins ‘[“*”]’.
The recommended model is Qwen 3.6 35B (Qwen3.6-35B-A3B). It beat Gemma 4 on the project’s screenshot benchmark. An RTX 5090 is ideal; an RTX 4090 works with INT4 AutoRound quantization.
Each provider is a class that extends BaseLLMProvider. It normalizes to one response shape:
Copy CodeCopiedUse a different Browser
{ content: string, toolCalls: Array|null, usage: Object|null }
Key Takeaways
WebBrain is a free, MIT-licensed AI browser agent for Chrome and Firefox, built by Emre Sokullu.
It runs on local models (llama.cpp, Ollama; Qwen 3.6 35B recommended) or any cloud API — no page data leaves your machine when local.
Ask mode reads pages read-only; Act mode clicks and types via the Chrome DevTools Protocol for trusted input events.
Security-first by design: starts read-only, approves consequential actions, and uses the UI instead of direct API calls for mutations.
Free forever self-hosted, or $5/month per device profile for the managed WebBrain Cloud under fair use.
Interactive Explainer with Demo
Demo-1
(function(){
window.addEventListener("message", function(e){
if(e && e.data && typeof e.data.wbHeight === "number"){
var f = document.getElementById("webbrain-frame");
if(f){ f.style.height = e.data.wbHeight + "px"; }
}
});
})();
Demo-2
(function(){
window.addEventListener("message", function(e){
if(e && e.data && e.data.ch === "wb-dev-demo" && typeof e.data.wbHeight === "number"){
var f = document.getElementById("webbrain-frame-2");
if(f){ f.style.height = e.data.wbHeight + "px"; }
}
});
})();
WebBrain is available on the Chrome Web Store, Firefox Add-ons, and GitHub. Product details at webbrain website.
The post Meet WebBrain: An Open-Source, Local-First AI Browser Agent That Reads Pages and Automates Tasks in Chrome and Firefox appeared first on MarkTechPost.
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み