NVIDIA /

SkillSpector

AI Agent可试用PythonApache-2.0

★ GitHub 仓库 ↗

总 Stars

★ 4,016

本周 +

+2,616

Forks

301

贡献者

4

未解 Issues

32

能做什么

适用于AI代理开发过程中，对技能模块进行安全审计和漏洞检测

核心特性

1自动扫描AI代理技能中的安全漏洞
2识别恶意模式及潜在风险行为
3基于Python实现，易于集成

原文 README

在 GitHub 查看原文 ↗

SkillSpector

Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks before installing agent skills.

Overview

AI agent skills (used by Claude Code, Codex CLI, Gemini CLI, etc.) execute with implicit trust and minimal vetting. Research shows that 26.1% of skills contain vulnerabilities and 5.2% show likely malicious intent.

SkillSpector helps you answer: "Is this skill safe to install?"

Documentation

Development guide — Architecture, package layout, and how to extend the analyzer pipeline.

Features

Multi-format input: Scan Git repos, URLs, zip files, directories, or single files
64 vulnerability patterns across 16 categories: prompt injection, data exfiltration, privilege escalation, supply chain, excessive agency, output handling, system prompt leakage, memory poisoning, tool misuse, rogue agent, trigger abuse, dangerous code (AST), taint tracking, YARA signatures, MCP least privilege, and MCP tool poisoning
Two-stage analysis: Fast static analysis + optional LLM semantic evaluation
Live vulnerability lookups: SC4 queries OSV.dev for real-time CVE data with automatic offline fallback
Multiple output formats: Terminal, JSON, Markdown, and SARIF reports
Risk scoring: 0-100 score with severity labels and clear recommendations

Quick Start

Installation

Create and activate a virtual environment first (all make targets assume the venv is active). Use uv or pip; the Makefile uses uv if available, otherwise pip.

# Clone the repository
git clone https://github.com/NVIDIA/skillspector.git
cd skillspector

# Create and activate virtual environment
uv venv .venv && source .venv/bin/activate
# or: python3 -m venv .venv && source .venv/bin/activate

# Install for production use
make install

# Or install with development dependencies
make install-dev

Basic Usage

# Scan a local skill directory
skillspector scan ./my-skill/

# Scan a single SKILL.md file
skillspector scan ./SKILL.md

# Scan a Git repository
skillspector scan https://github.com/user/my-skill

# Scan a zip file
skillspector scan ./my-skill.zip

Output Formats

# Terminal output (default) - pretty formatted
skillspector scan ./my-skill/

# JSON output - machine readable
skillspector scan ./my-skill/ --format json --output report.json

# Markdown output - for documentation
skillspector scan ./my-skill/ --format markdown --output report.md

# SARIF output - for CI/CD integration and IDE tooling
skillspector scan ./my-skill/ --format sarif --output report.sarif

LLM Analysis

For the best results, configure an OpenAI-compatible LLM endpoint for semantic analysis. Pick a provider with SKILLSPECTOR_PROVIDER; each ships its own bundled default model. SkillSpector also works against local OpenAI-compatible servers (Ollama, vLLM, llama.cpp) and managed inference gateways.

Provider (`SKILLSPECTOR_PROVIDER`)	Credential env var	Endpoint	Default model
`openai`	`OPENAI_API_KEY` (+ optional `OPENAI_BASE_URL`)	api.openai.com (or any OpenAI-compatible URL)	`gpt-5.4`
`anthropic`	`ANTHROPIC_API_KEY`	api.anthropic.com	`claude-opus-4-6`
`nv_build`	`NVIDIA_INFERENCE_KEY`	build.nvidia.com	`deepseek-ai/deepseek-v4-flash`

# Stock OpenAI
export SKILLSPECTOR_PROVIDER=openai
export OPENAI_API_KEY=sk-...
skillspector scan ./my-skill/

# Anthropic
export SKILLSPECTOR_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
skillspector scan ./my-skill/

# NVIDIA build.nvidia.com
export SKILLSPECTOR_PROVIDER=nv_build
export NVIDIA_INFERENCE_KEY=nvapi-...
skillspector scan ./my-skill/

# Local Ollama or any OpenAI-compatible endpoint
export SKILLSPECTOR_PROVIDER=openai
export OPENAI_API_KEY=ollama
export OPENAI_BASE_URL=http://localhost:11434/v1
export SKILLSPECTOR_MODEL=llama3.1:8b
skillspector scan ./my-skill/

# Override the provider's default model
export SKILLSPECTOR_MODEL=gpt-5.2
skillspector scan ./my-skill/

# Skip LLM analysis (faster, static analysis only)
skillspector scan ./my-skill/ --no-llm

Vulnerability Patterns

SkillSpector detects 64 vulnerability patterns across 16 categories:

Prompt Injection (5 patterns)

ID	Pattern	Severity	Description
P1	Instruction Override	HIGH	Commands to ignore safety constraints
P2	Hidden Instructions	HIGH	Malicious directives in comments/invisible text
P3	Exfiltration Commands	HIGH	Instructions to transmit context externally
P4	Behavior Manipulation	MEDIUM	Subtle instructions altering agent decisions
P5	Harmful Content	CRITICAL	Instructions that could cause physical harm

Data Exfiltration (4 patterns)

ID	Pattern	Severity	Description
E1	External Transmission	MEDIUM	Sending data to external URLs
E2	Env Variable Harvesting	HIGH	Collecting API keys and secrets
E3	File System Enumeration	MEDIUM	Scanning directories for sensitive files
E4	Context Leakage	HIGH	Transmitting conversation context externally

Privilege Escalation (3 patterns)

ID	Pattern	Severity	Description
PE1	Excessive Permissions	LOW	Requesting access beyond stated functionality
PE2	Sudo/Root Execution	MEDIUM	Invoking elevated system privileges
PE3	Credential Access	HIGH	Reading SSH keys, tokens, passwords

Supply Chain (6 patterns)

ID	Pattern	Severity	Description
SC1	Unpinned Dependencies	LOW	No version constraints on packages
SC2	External Script Fetching	HIGH	curl \| bash and remote code execution
SC3	Obfuscated Code	HIGH	Base64/hex encoded execution
SC4	Known Vulnerable Dependencies	HIGH	Dependencies with known CVEs (live OSV.dev lookup)
SC5	Abandoned Dependencies	MEDIUM	Unmaintained packages without security updates
SC6	Typosquatting	HIGH	Package names similar to popular packages

Excessive Agency (4 patterns)

ID	Pattern	Severity	Description
EA1	Unrestricted Tool Access	HIGH	Unfettered tool access without constraints
EA2	Autonomous Decision Making	HIGH	High-impact decisions without human-in-the-loop
EA3	Scope Creep	MEDIUM	Capabilities extending beyond stated purpose
EA4	Unbounded Resource Access	MEDIUM	No rate limits or quotas on resource consumption

Output Handling (3 patterns)

ID	Pattern	Severity	Description
OH1	Unvalidated Output Injection	HIGH	Model output used without sanitization
OH2	Cross-Context Output	MEDIUM	Output flows across trust boundaries without validation
OH3	Unbounded Output	MEDIUM	No limits on output size or generation rate

System Prompt Leakage (3 patterns)

ID	Pattern	Severity	Description
P6	Direct Leakage	HIGH	Instructions that expose system prompts or internal rules
P7	Indirect Extraction	MEDIUM	Extraction via rephrasing, translation, or side-channels
P8	Tool-Based Exfiltration	HIGH	System prompts exfiltrated via file writes or network requests

Memory Poisoning (3 patterns)

ID	Pattern	Severity	Description
MP1	Persistent Context Injection	HIGH	Content designed to persist across interactions
MP2	Context Window Stuffing	MEDIUM	Filler content displacing safety constraints
MP3	Memory Manipulation	HIGH	Tampering with agent memory or stored state

Tool Misuse (3 patterns)

ID	Pattern	Severity	Description
TM1	Tool Parameter Abuse	HIGH	Crafted parameters for unintended behavior (shell=True, --force)
TM2	Chaining Abuse	HIGH	Tool chains that bypass individual safety checks
TM3	Unsafe Defaults	MEDIUM	Overly permissive defaults (disabled TLS, no auth)

Rogue Agent (2 patterns)

ID	Pattern	Severity	Description
RA1	Self-Modification	CRITICAL	Modifying own code or configuration at runtime
RA2	Session Persistence	HIGH	Unauthorized persistence via cron jobs or startup scripts

Trigger Abuse (3 patterns)

ID	Pattern	Severity	Description
TR1	Overly Broad Trigger	MEDIUM	Trigger patterns matching common words
TR2	Shadow Command Trigger	HIGH	Triggers that shadow built-in commands or other skills
TR3	Keyword Baiting Trigger	MEDIUM	Generic triggers designed to maximize activation

Behavioral AST (8 patterns)

ID	Pattern	Severity	Description
AST1	exec() Call	CRITICAL	Direct exec() enabling arbitrary code execution
AST2	eval() Call	HIGH	Direct eval() evaluating arbitrary expressions
AST3	Dynamic Import	HIGH	__import__() loading arbitrary modules at runtime
AST4	subprocess Call	HIGH	External command execution via subprocess
AST5	os.system / exec-family	HIGH	Shell commands via os module
AST6	compile() Call	MEDIUM	Code object creation from strings
AST7	Dynamic getattr()	MEDIUM	Arbitrary attribute access with non-literal names
AST8	Dangerous Execution Chain	CRITICAL	exec/eval combined with dynamic source (network, encoded data)

Taint Tracking (5 patterns)

ID	Pattern	Severity	Description
TT1	Direct Taint Flow	HIGH	Data flows directly from a source to a sink without sanitization
TT2	Variable-Mediated Taint Flow	MEDIUM	Data flows from source to sink through intermediate variables
TT3	Credential Exfiltration Chain	CRITICAL	Credentials (env vars, secrets) flow to network output sinks
TT4	File Read to Network Exfiltration	HIGH	File contents flow to network output sinks
TT5	External Input to Code Execution	CRITICAL	Network or user input flows to exec/eval/subprocess sinks

YARA Signatures (4 patterns)

ID	Pattern	Severity	Description
YR1	Malware Match	CRITICAL	YARA rule match for known malware signatures
YR2	Webshell Match	CRITICAL	YARA rule match for webshell patterns
YR3	Cryptominer Match	HIGH	YARA rule match for crypto mining indicators
YR4	Hack Tool / Exploit Match	HIGH	YARA rule match for hack tools or exploit code

MCP Least Privilege (4 patterns)

ID	Pattern	Severity	Description
LP1	Underdeclared Capability	HIGH	Code uses capabilities not listed in declared permissions
LP2	Wildcard Permission	MEDIUM	Permission list contains wildcards (*, all, full, any)
LP3	Missing Permission Declaration	MEDIUM	No permissions field but code has detectable capabilities
LP4	Overdeclared Permission	LOW	Permission declared but no corresponding code capability found

MCP Tool Poisoning (4 patterns)

ID	Pattern	Severity	Description
TP1	Hidden Instructions	HIGH	Hidden directives in metadata (HTML comments, zero-width chars, base64, data URIs)
TP2	Unicode Deception	HIGH	Homoglyphs, RTL overrides, mixed-script identifiers in tool metadata
TP3	Parameter Description Injection	MEDIUM	Injection patterns in parameter definitions (overrides, system tokens, malicious defaults)
TP4	Description-Behavior Mismatch	MEDIUM	Declared tool description does not match actual code behavior (LLM-powered)

All detected patterns are listed in the tables above.

Risk Scoring

Score Calculation

CRITICAL issues: +50 points
HIGH issues: +25 points
MEDIUM issues: +10 points
LOW issues: +5 points
Executable scripts: 1.3x multiplier

Severity Levels

Score	Severity	Recommendation
0-20	LOW	SAFE
21-50	MEDIUM	CAUTION
51-80	HIGH	DO NOT INSTALL
81-100	CRITICAL	DO NOT INSTALL

Example Output

Terminal Output

 SkillSpector Security Report  v2.0.0

Skill: suspicious-skill
Source: ./suspicious-skill/
Scanned: 2026-01-29 10:30:00 UTC

        Risk Assessment
 Metric          Value
 Score           78/100
 Severity        HIGH
 Recommendation  DO NOT INSTALL

        Components (3)
 File              Type      Lines  Executable
 SKILL.md          markdown    142  No
 scripts/sync.py   python       87  Yes
 requirements.txt  text          3  No

Issues (2)

  HIGH: Env Variable Harvesting (E2)
    Location: scripts/sync.py:23
    Finding: for key, val in os.environ.items():...
    Confidence: 94%
    Explanation: This code collects environment variables containing
    API keys and secrets, then sends them to an external server.

  HIGH: External Transmission (E1)
    Location: scripts/sync.py:45
    Finding: requests.post("https://api.skill.io/env"...
    Confidence: 89%
    Explanation: Data is being sent to an external server. Combined
    with env harvesting above, this indicates credential exfiltration.

Configuration

Environment Variables

Variable	Description	Required
`SKILLSPECTOR_PROVIDER`	Active LLM provider: `openai`, `anthropic`, or `nv_build`. Each provider has its own bundled `model_registry.yaml` and default model (see the LLM Analysis table above). Defaults to `nv_build`.	Optional
`NVIDIA_INFERENCE_KEY`	Credential for the `nv_build` provider (build.nvidia.com).	Required for LLM analysis when `SKILLSPECTOR_PROVIDER=nv_build`
`OPENAI_API_KEY`	Credential for the OpenAI provider (`SKILLSPECTOR_PROVIDER=openai`). Also serves as the tier-2 fallback in the credential waterfall when the active provider returns no credentials.	Required for LLM analysis when `SKILLSPECTOR_PROVIDER=openai`
`OPENAI_BASE_URL`	Override the OpenAI endpoint (e.g. point at Ollama).	Optional
`ANTHROPIC_API_KEY`	Credential for the Anthropic provider (`SKILLSPECTOR_PROVIDER=anthropic`).	Required for LLM analysis when `SKILLSPECTOR_PROVIDER=anthropic`
`SKILLSPECTOR_MODEL`	Override the active provider's default model. See the LLM Analysis table for each provider's default.	Optional
`SKILLSPECTOR_MODEL_REGISTRY`	Override the bundled per-provider YAML registry (`src/skillspector/providers/<provider>.yaml`) with a custom path.	Optional
`SKILLSPECTOR_LOG_LEVEL`	Log level: `DEBUG`, `INFO`, `WARNING`, `ERROR` (default: `WARNING`).	Optional

CLI Options

skillspector scan --help

Options:
  -f, --format [terminal|json|markdown|sarif]  Output format [default: terminal]
  -o, --output PATH                            Output file path
  --no-llm                                     Skip LLM analysis (static only)
  -V, --verbose                                Show detailed progress
  --help                                       Show this message and exit

Development

Setup

All make targets assume a virtual environment is already created and activated. The Makefile uses uv if available, else pip.

# Clone, create venv, activate, install dev dependencies
git clone https://github.com/NVIDIA/skillspector.git
cd skillspector
uv venv .venv && source .venv/bin/activate
# or: python3 -m venv .venv && source .venv/bin/activate
make install-dev

# Run tests
make test

# Run tests with coverage
make test-cov

# Run linting
make lint

# Format code
make format

How It Works

SkillSpector uses a two-stage detection pipeline:

Stage 1: Static Analysis

Fast regex-based pattern matching across 11 static analyzers
AST-based behavioral analysis detecting dangerous calls (exec, eval, subprocess, etc.)
Live vulnerability lookups via OSV.dev for known CVEs in dependencies
Scans all files in the skill
High recall (catches most issues)
Moderate precision (some false positives)

Stage 2: LLM Semantic Analysis (Optional)

Evaluates context and intent
Filters false positives
Provides human-readable explanations
Improves precision to ~87%

The LLM prompt includes anti-jailbreak protections to prevent malicious skills from manipulating the analysis.

Live Vulnerability Lookups (SC4)

SC4 uses the OSV.dev API to check dependencies against the full Open Source Vulnerabilities database — covering tens of thousands of advisories across PyPI and npm.

No API key required — OSV.dev is free and unauthenticated.
Batch queries — all dependencies are checked in a single HTTP call.
Automatic fallback — if OSV.dev is unreachable (air-gapped/offline), a small built-in fallback list is used.
Caching — results are cached in-memory for 1 hour to avoid redundant API calls during a session.

The tool requires outbound HTTPS access to api.osv.dev for live vulnerability data. When that is not available, findings are limited to the static fallback list.

Limitations

Non-English content: May miss patterns in other languages
Image-based attacks: Cannot analyze text in images
Encrypted/binary code: Cannot analyze compiled or encrypted content
Runtime behavior: Static analysis only, no dynamic execution
Offline SC4: Without network access to api.osv.dev, SC4 uses a small static fallback list

Research Background

Based on research from "Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale" (Liu et al., 2026):

Dataset: 42,447 skills from major marketplaces
Vulnerable: 26.1% contain at least one vulnerability
High-severity: 5.2% show likely malicious intent
Key finding: Skills with executable scripts are 2.12x more likely to be vulnerable

Python API Integration

from skillspector import graph

# Invoke the LangGraph workflow
result = graph.invoke({
    "input_path": "/path/to/skill",
    "output_format": "json",   # terminal, json, markdown, or sarif
    "use_llm": True,           # False for static-only analysis
})

# Access results
print(f"Risk Score: {result['risk_score']}/100")
print(f"Severity: {result['risk_severity']}")
print(f"Recommendation: {result['risk_recommendation']}")

for finding in result["filtered_findings"]:
    print(f"[{finding['severity']}] {finding['rule_id']}: {finding['message']}")