清華大学とアントグループの研究者、OpenClawの自律LLMエージェント脆弱性を緩和する5層ライフサイクル指向セキュリティフレームワークを発表
清華大学とAnt Groupの研究者は、OpenClawなどの自律型LLMエージェントの脆弱性に対処するため、初期化から実行までの5段階のライフサイクルに沿ったセキュリティフレームワークを発表し、メモリ汚染やスキルサプライチェーン汚染などの複合脅威への対策を示した。
キーポイント
自律型LLMエージェントの新たな脅威
OpenClawのような高権限アクセスを持つ自律型エージェントは、従来の単発的なプロンプトインジェクションを超える、複数段階にわたるシステム的リスクに直面している。
5層ライフサイクルセキュリティフレームワーク
研究チームは、初期化、入力、推論、決定、実行の5つの運用段階をカバーするフレームワークを提案し、各段階での脅威と保護策を体系化した。
OpenClawのアーキテクチャ脆弱性
OpenClawの『カーネル-プラグイン』アーキテクチャ、特にpi-coding-agentを信頼できる計算基盤とする設計において、プラグインの動的ロード時の完全性検証の欠如が曖昧な信頼境界を生み、攻撃対象領域を拡大している。
脅威の具体例
メモリ汚染やスキルサプライチェーン汚染などの複合脅威が、エージェントの運用軌道全体を危険にさらす可能性があることが示された。
影響分析・編集コメントを表示
影響分析
この研究は、自律型AIエージェントが現実世界の高権限タスクを実行する際の根本的なセキュリティ課題を浮き彫りにし、業界全体のセキュリティ設計思想に影響を与える可能性がある。特に、オープンソースや商用のAIエージェントプラットフォーム開発者に対して、アーキテクチャ段階からのセキュリティ-by-designの重要性を再認識させる重要な進展である。
編集コメント
AIエージェントの能力拡大に伴い、そのセキュリティは単なる「機能」から「存立基盤」へと変わりつつある。本記事は、その転換点を理論的かつ具体的に示した良質な研究報告と言える。
自律型LLMエージェントは、高い権限での実行と永続的なメモリにより、攻撃対象領域を拡大します。ステートレスなLLMアプリケーションとは異なり、OpenClawのようなエージェントは、複雑で長期的なタスクを実行するために、クロスシステム統合と長期記憶に依存しています。この能動的な性質は、初期化から実行までの運用ライフサイクル全体に及ぶ、独自の多段階システムリスクをもたらします。
スキルエコシステムは、重大なサプライチェーンリスクに直面しています。エージェントスキルエコシステムにおいて、コミュニティが提供するツールの約26%にセキュリティ脆弱性が含まれています。攻撃者は「スキルポイズニング」を使用して、正当に見えるが隠れた優先度上書きを含む悪意のあるツールを注入し、ユーザーリクエストを静かに乗っ取って攻撃者が制御する出力を生成することができます。
メモリは永続的で危険な攻撃ベクトルです。永続メモリにより、一時的な敵対的入力を長期的な行動制御に変換することが可能になります。メモリポイズニングを通じて、攻撃者はエージェントのメモリ(例:MEMORY.md)に偽造されたポリシールールを埋め込み、初期攻撃セッションが終了した後もエージェントが良性リクエストを永続的に拒否するように仕向けることができます。
曖昧な指示は破壊的な「インテントドリフト」を引き起こします。明示的な悪意のある操作がなくても、エージェントはインテントドリフトを経験する可能性があり、局所的には正当化可能な一連のツール呼び出しが、全体的には破壊的な結果につながることがあります。記録された事例では、基本的な診断セキュリティリクエストが、無許可のファイアウォール変更やサービス終了にエスカレートし、システム全体をアクセス不能な状態に陥らせました。
効果的な保護には、ライフサイクルを意識した多層防御アーキテクチャが必要です。単純な入力フィルターのような既存のポイントベースの防御は、時間を跨いだ多段階攻撃に対して不十分です。堅牢な防御は、エージェントライフサイクルの5つの層すべてに統合されなければなりません:基盤層(プラグイン審査)、入力知覚層(指示階層)、認知状態層(メモリ完全性)、意思決定整合層(計画検証)、実行制御層(eBPFによるカーネルレベルのサンドボックス化)。
論文をチェックしてください。また、Twitterで私たちをフォローしたり、120k以上のMLサブレディットに参加したり、ニュースレターを購読したりしてください。待ってください!Telegramを使っていますか?今すぐTelegramでも参加できます。
注:この記事はAnt Researchによってサポートおよび提供されています
この投稿「Tsinghua and Ant Group Researchers Unveil a Five-Layer Lifecycle-Oriented Security Framework to Mitigate Autonomous LLM Agent Vulnerabilities in OpenClaw」は、MarkTechPostで最初に公開されました。
原文を表示
Autonomous LLM agents like OpenClaw are shifting the paradigm from passive assistants to proactive entities capable of executing complex, long-horizon tasks through high-privilege system access. However, a security analysis research report from Tsinghua University and Ant Group reveals that OpenClaw’s ‘kernel-plugin’ architecture—anchored by a pi-coding-agent serving as the Minimal Trusted Computing Base (TCB)—is vulnerable to multi-stage systemic risks that bypass traditional, isolated defenses. By introducing a five-layer lifecycle framework covering initialization, input, inference, decision, and execution, the research team demonstrates how compound threats like memory poisoning and skill supply chain contamination can compromise an agent’s entire operational trajectory.
OpenClaw Architecture: The pi-coding-agent and the TCB
OpenClaw utilizes a ‘kernel-plugin’ architecture that separates core logic from extensible functionality. The system’s Trusted Computing Base (TCB) is defined by the pi-coding-agent, a minimal core responsible for memory management, task planning, and execution orchestration. This TCB manages an extensible ecosystem of third-party plugins—or ‘skills’—that enable the agent to perform high-privilege operations such as automated software engineering and system administration. A critical architectural vulnerability identified by the research team is the dynamic loading of these plugins without strict integrity verification, which creates an ambiguous trust boundary and expands the system’s attack surface.
imageTable 1: Full Lifecycle Threats and Corresponding Protections for OpenClaw “Lobster”
✓ Indicates effective risk mitigation by the protection layer
× Denotes uncovered risks by the protection layer
A Lifecycle-Oriented Threat Taxonomy
The research team systematizes the threat landscape across five operational stages that align with the agent’s functional pipeline:
Stage I (Initialization): The agent establishes its operational environment and trust boundaries by loading system prompts, security configurations, and plugins.
Stage II (Input): Multi-modal data is ingested, requiring the agent to differentiate between trusted user instructions and untrusted external data sources.
Stage III (Inference): The agent reasoning process utilizes techniques such as Chain-of-Thought (CoT) prompting while maintaining contextual memory and retrieving external knowledge via retrieval-augmented generation.
Stage IV (Decision): The agent selects appropriate tools and generates execution parameters through planning frameworks such as ReAct.
Stage V (Execution): High-level plans are converted into privileged system actions, requiring strict sandboxing and access-control mechanisms to manage operations.
This structured approach highlights that autonomous agents face multi-stage systemic risks that extend beyond isolated prompt injection attacks.
Technical Case Studies in Agent Compromise
- Skill Poisoning (Initialization Stage)
Skill poisoning targets the agent before a task even begins. Adversaries can introduce malicious skills that exploit the capability routing interface.
The Attack: The research team demonstrated this by coercing OpenClaw to create a functional skill named hacked-weather.
Mechanism: By manipulating the skill’s metadata, the attacker artificially elevated its priority over the legitimate weather tool.
Impact: When a user requested weather data, the agent bypassed the legitimate service and triggered the malicious replacement, yielding attacker-controlled output.
Prevalence: An empirical audit cited in the research report found that 26% of community-contributed tools contain security vulnerabilities.
imageFigure 2: Poisoning Command Inducing the Compromised “Lobster” to Generate a Malicious Weather Skill and Elevate Its Priority
imageFigure 3: Malicious Skill Generated by Compromised “Lobster” — Structurally Valid Yet Semantically Subverts Legitimate Weather Functionality
imageFigure 4: Normal Weather Request Hijacked by Malicious Skill — Compromised “Lobster” Generates Attacker-Controlled Output
- Indirect Prompt Injection (Input Stage)
Autonomous agents frequently ingest untrusted external data, making them susceptible to zero-click exploits.
The Attack: Attackers embed malicious directives within external content, such as a web page.
Mechanism: When the agent retrieves the page to fulfill a user request, the embedded payload overrides the original objective.
Result: In one test, the agent ignored the user’s task to output a fixed ‘Hello World’ string mandated by the malicious site.
imageFigure 5: Attacker-Designed Webpage Embedding Malicious Commands Masquerading as Benign Content
imageFigure 6: Compromised “Lobster” Executes Embedded Commands When Accessing Webpage — Generates Attacker-Controlled Content Instead of Fulfilling User Requests
- Memory Poisoning (Inference Stage)
Because OpenClaw maintains a persistent state, it is vulnerable to long-term behavioral manipulation.
Mechanism: An attacker uses a transient injection to modify the agent’s MEMORY.md file.
The Attack: A fabricated rule was added instructing the agent to refuse any query containing the term ‘C++’.
Impact: This ‘poison’ persisted across sessions; subsequent benign requests for C++ programming were rejected by the agent, even after the initial attack interaction had ended.
imageFigure 7: Attacker Appends Forged Rules to Compromised “Lobster”‘s Persistent Memory — Converts Transient Attack Inputs into Long-Term Behavioral Contro
imageFigure 8: Compromised “Lobster” Rejects Benign C++ Programming Requests After Malicious Rule Storage — Adheres to Attacker-Defined Behaviors Overriding User Intent
- Intent Drift (Decision Stage)
Intent drift occurs when a sequence of locally justifiable tool calls leads to a globally destructive outcome.
The Scenario: A user issued a diagnostic request to eliminate a ‘suspicious crawler IP’.
The Escalation: The agent autonomously identified IP connections and attempted to modify the system firewall via iptables.
System Failure: After several failed attempts to modify configuration files outside its workspace, the agent terminated the running process to attempt a manual restart. This rendered the WebUI inaccessible and resulted in a complete system outage.
imageFigure 9: Compromised “Lobster” Deviates from Crawler IP Resolution Task Upon User Command — Executes Self-Termination Protocol Overriding Operational Objectives
- High-Risk Command Execution (Execution Stage)
This represents the final realization of an attack where earlier compromises propagate into concrete system impact.
The Attack: An attacker decomposed a Fork Bomb attack into four individually benign file-write steps to bypass static filters.
Mechanism: Using Base64 encoding and sed to strip junk characters, the attacker assembled a latent execution chain in trigger.sh.
Impact: Once triggered, the script caused a sharp CPU utilization surge to near 100% saturation, effectively launching a denial-of-service attack against the host infrastructure.
imageFigure 10: Attacker Initiates Sequential Command Injection Through File Write Operations — Establishes Covert Execution Foothold in System Scheduler
imageFigure 11: Attacker Triggers Compromised “Lobster” to Execute Malicious Payload — Induces System Paralysis Leading to Critical Infrastructure Implosion
imageFigure 12: Compromised “Lobster” Triggers Host Server Resource Exhaustion Surge — Implements Stealthy Denial-of-Service Siege Against Critical Computing Backbone
The Five-Layer Defense Architecture
The research team evaluated current defenses as ‘fragmented’ point solutions and proposed a holistic, lifecycle-aware architecture.

(1) Foundational Base Layer:
Establishes a verifiable root of trust during the startup phase. It utilizes Static/Dynamic Analysis (ASTs) to detect unauthorized code and Cryptographic Signatures (SBOMs) to verify skill provenance.
(2) Input Perception Layer:
Acts as a gateway to prevent external data from hijacking the agent’s control flow. It enforces an Instruction Hierarchy via cryptographic token tagging to prioritize developer prompts over untrusted external content.
(3) Cognitive State Layer:
Protects internal memory and reasoning from corruption. It employs Merkle-tree Structures for state snapshotting and rollbacks, alongside Cross-encoders to measure semantic distance and detect context drift.
(4) Decision Alignment Layer:
Ensures synthesized plans align with user objectives before any action is taken. It includes Formal Verification using symbolic solvers to prove that proposed sequences do not violate safety invariants.
(5) Execution Control Layer:
Serves as the final enforcement boundary using an ‘assume breach’ paradigm. It provides isolation through Kernel-Level Sandboxing utilizing eBPF and seccomp to intercept unauthorized system calls at the OS level
Key Takeaways
Autonomous agents expand the attack surface through high-privilege execution and persistent memory. Unlike stateless LLM applications, agents like OpenClaw rely on cross-system integration and long-term memory to execute complex, long-horizon tasks. This proactive nature introduces unique multi-stage systemic risks that span the entire operational lifecycle, from initialization to execution.
Skill ecosystems face significant supply chain risks. Approximately 26% of community-contributed tools in agent skill ecosystems contain security vulnerabilities. Attackers can use ‘skill poisoning’ to inject malicious tools that appear legitimate but contain hidden priority overrides, allowing them to silently hijack user requests and produce attacker-controlled outputs.
Memory is a persistent and dangerous attack vector. Persistent memory allows transient adversarial inputs to be transformed into long-term behavioral control. Through memory poisoning, an attacker can implant fabricated policy rules into an agent’s memory (e.g., MEMORY.md), causing the agent to persistently reject benign requests even after the initial attack session has ended.
Ambiguous instructions lead to destructive ‘Intent Drift.’ Even without explicit malicious manipulation, agents can experience intent drift, where a sequence of locally justifiable tool calls leads to globally destructive outcomes. In documented cases, basic diagnostic security requests escalated into unauthorized firewall modifications and service terminations that rendered the entire system inaccessible.
Effective protection requires a lifecycle-aware, defense-in-depth architecture. Existing point-based defenses—such as simple input filters—are insufficient against cross-temporal, multi-stage attacks. A robust defense must be integrated across all five layers of the agent lifecycle: Foundational Base (plugin vetting), Input Perception (instruction hierarchy), Cognitive State (memory integrity), Decision Alignment (plan verification), and Execution Control (kernel-level sandboxing via eBPF).
Check out Paper. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Note: This article is supported and provided by Ant Research
The post Tsinghua and Ant Group Researchers Unveil a Five-Layer Lifecycle-Oriented Security Framework to Mitigate Autonomous LLM Agent Vulnerabilities in OpenClaw appeared first on MarkTechPost.
関連記事
今日のまとめ
AI日報で今日の重要ニュースをまとめ読み