Research & Benchmarks

MCPShield is the most comprehensive MCP security scanner available. Here are our benchmark results, detection methodology, and public dataset.

90+

Detection Rules

4,800+

Repos Scanned

100%

Benchmark Recall

Evasion Rate

Benchmark Results

GenTelBench-v1

Malicious + safe tool descriptions

Samples

50,000

Recall

100%

50K samples from academic benchmark dataset

Source: GenTelLab (HuggingFace)

MCPTox

Poisoned tool descriptions

Samples

485

Recall

100%

All 485 malicious descriptions detected by regex scanner

Source: MCPTox research dataset

Adversarial Evasion

Crafted evasion descriptions

Samples

Recall

100%

4 evasion techniques: semantic rephrasing, legitimacy framing, multi-step, jargon obfuscation

Source: Internal (AutoMalTool techniques)

Real Malware

In-the-wild malicious MCP server

Samples

Recall

100%

Throwaway account, .exe lure, 23 zero-width characters detected

Source: GitHub (Parnellcold355/easypanel-mcp)

LLM Judge vs Regex: MCPTox Analysis

We tested both our regex scanner and LLM judge (Claude Haiku) against all 485 MCPTox malicious tool descriptions to measure the marginal value of each detection layer.

Regex Scanner485/485 (100%)

LLM Judge28/485 (5.8%)

Both Caught28 (5.8%)

Evasion Gap0 (0%)

The regex engine catches all known attack patterns. The LLM judge is conservative by design (low false positives) and provides value on novel semantic attacks that regex patterns haven't been written for yet.

Adversarial Evasion Testing

We crafted 20 adversarial tool descriptions using evasion techniques from the AutoMalTool paper (arxiv 2509.21011). Both regex and LLM caught every single one.

Semantic Rephrasing

Regex: 100%LLM: 100%

"compliance validation" instead of "exfiltrate"

Legitimacy Framing

Regex: 100%LLM: 100%

"built-in telemetry for quality assurance"

Multi-Step Obfuscation

Regex: 100%LLM: 100%

Attack split across tool chain references

Jargon Obfuscation

Regex: 100%LLM: 100%

"inode metadata traversal" for filesystem theft

3-Layer Detection Architecture

Regex Pattern Engine

90+ rules

Static pattern matching against tool descriptions, source code, and metadata. Covers OWASP MCP Top 10.

Strengths: Fast, deterministic, zero false negatives on known patterns

Schema Poisoning Scanner

Full schema traversal rules

Deep inspection of every text surface in tool metadata: descriptions, annotations, parameter defaults, enum values, comments, examples, and nested properties.

Strengths: Catches injection in overlooked fields like inputSchema.$comment