Skip to content

Threat Model

This document describes the threat model for agent extensions (skills, MCP servers, plugins, connectors).

Assets to protect

Asset Risk
User secrets (API keys, OAuth tokens) Exfiltration via over-privileged extensions
Confidential documents and emails Unauthorized read access
Account integrity (email, calendar, ticketing) Unauthorized actions via tool access
Host machine integrity (files, processes) Arbitrary code execution, persistence

Threat actors

Actor Attack vector
Malicious publishers Typosquats, impersonation of legitimate extensions
Compromised maintainers / CI Supply-chain injection through trusted update channels
Registry compromise Serving malicious artifacts from a trusted source
Social engineering Prompting users to install unverified extensions

Attack classes

Attack Description
Exfiltration via tool servers Over-privileged MCP servers or skills leak data to external endpoints
Instruction malware Malicious commands embedded in SKILL.md content
Dependency attacks Malicious npm/pip packages bundled inside extensions
Update channel compromise Serving a malicious "latest" version through a legitimate update path
Archive attacks ZipSlip, symlink traversal, decompression bombs in .aext files

Mitigations

Mitigation Status
Signature verification with trusted keys (--pub) Implemented
Install-time policy enforcement (fail closed) Implemented
Least-privilege manifest defaults Implemented
Strict JSON parsing (unknown fields rejected) Implemented
Archive hardening (symlink blocking, size limits, ratio checks) Implemented
Heuristic scanning of skill content and scripts Implemented
Sigstore/Cosign keyless signing with identity binding Planned
SLSA/in-toto provenance with verification Planned
Real SBOM generation and vulnerability scanning Planned
Runtime permission enforcement (not just install-time) Planned
Secure update metadata (TUF) Planned
Revocation/quarantine mechanism Planned

What this scaffold does not protect against

  • Runtime enforcement: Permissions are checked at install time only. A skill that declares allow_shell=false is not sandboxed at runtime — the manifest is a declaration, not an enforcement boundary.
  • Identity verification: Current Ed25519 dev keys prove integrity (the artifact wasn't modified) but not authenticity (who signed it). Sigstore integration will add identity binding.
  • Dependency analysis: The scanner checks skill content and shell scripts, not transitive dependencies (npm, pip, etc.).

Next steps