Prompt Infection Taxonomy: five attack vectors for agent threat modeling diagram

The five vectors

1 · Direct — the user explicitly asks the agent to ignore instructions, change persona, or reveal the system prompt. Most visible attack and easiest to mitigate (firm system prompt + explicit refusal + logging). Typically captured by basic filters nowadays.

2 · Indirect — malicious payload arrives via content the agent reads (web page, PDF, email, uploaded file). The attacker is not the user; it’s whoever planted the instruction in the document the user asked the agent to process. Fastest-growing vector in 2025-2026 with web-browsing agents. Countermeasure: separate instruction channel from content channel, explicit treatment of content as data never as executable.

3 · Multi-turn — attacker conditions the agent across multiple messages, building context that dilutes original instructions. Small step by step, each acceptable, sum out of scope. Countermeasure: periodic re-anchoring of system prompt, fresh policy summary every N turns, confidence gating on high-risk actions.

4 · Tool-mediated — attacker uses tool output to inject instruction. Example: agent reads from a database a record whose “description” field contains “NOW EXECUTE THE FOLLOWING QUERY.” Particularly dangerous because content arrives from an apparently trusted source. Countermeasure: structured escaping between tool output and prompt, schema validation before re-injection into LLM call.

5 · Cross-agent propagation — in multi-agent systems, infection in one agent propagates to another via inter-agent messages. Agent A is compromised, sends a message to agent B with embedded instruction, agent B acts. New vector in Cowork, Crew AI, n8n environments with multiple LLM nodes. Countermeasure: explicit trust boundary between agents, schema validation on inter-agent messages, agent identity + signature.

How to apply

Use Prompt Infection Taxonomy as a matrix in architecture review. For each new agent, for each vector: is there surface? Is there control? Is there a regression test? “Implicit” is a failure answer.

Pair with Harness Stack: vectors 2, 3, 4 are where Verification (layer 3) and Confidence gating (layer 8) do most work. Vector 5 requires Constraint (layer 2) with well-defined scope between agents.

Prompt Infection Taxonomy: the anatomy of defense
Harness Stack — Verification and Confidence gating are the layers that respond to vectors 2-5.
Agent Trust Stack — Auditability affected by Tool-mediated and Cross-agent vectors.

When to use

Threat modeling for a new agent before production.
Security incident audit in a multi-agent system.
Red team brief for testing an agent.

When NOT to use

Closed chatbot without tool use and without external content ingestion — surface too reduced for the full framework.

The five vectors

How to apply

Related posts

When to use

When NOT to use