Prompt Infection Taxonomy: five attack vectors for agent threat modeling
Direct · Indirect · Multi-turn · Tool-mediated · Cross-agent propagation
Prompt Infection Taxonomy is the Automation Labs lens in five vectors for classifying and defending against prompt injection in agent systems, from direct injection to cross-agent propagation.
The five vectors
1 · Direct — the user explicitly asks the agent to ignore instructions, change persona, or reveal the system prompt. Most visible attack and easiest to mitigate (firm system prompt + explicit refusal + logging). Typically captured by basic filters nowadays.
2 · Indirect — malicious payload arrives via content the agent reads (web page, PDF, email, uploaded file). The attacker is not the user; it’s whoever planted the instruction in the document the user asked the agent to process. Fastest-growing vector in 2025-2026 with web-browsing agents. Countermeasure: separate instruction channel from content channel, explicit treatment of content as data never as executable.
3 · Multi-turn — attacker conditions the agent across multiple messages, building context that dilutes original instructions. Small step by step, each acceptable, sum out of scope. Countermeasure: periodic re-anchoring of system prompt, fresh policy summary every N turns, confidence gating on high-risk actions.
4 · Tool-mediated — attacker uses tool output to inject instruction. Example: agent reads from a database a record whose “description” field contains “NOW EXECUTE THE FOLLOWING QUERY.” Particularly dangerous because content arrives from an apparently trusted source. Countermeasure: structured escaping between tool output and prompt, schema validation before re-injection into LLM call.
5 · Cross-agent propagation — in multi-agent systems, infection in one agent propagates to another via inter-agent messages. Agent A is compromised, sends a message to agent B with embedded instruction, agent B acts. New vector in Cowork, Crew AI, n8n environments with multiple LLM nodes. Countermeasure: explicit trust boundary between agents, schema validation on inter-agent messages, agent identity + signature.
How to apply
Use Prompt Infection Taxonomy as a matrix in architecture review. For each new agent, for each vector: is there surface? Is there control? Is there a regression test? “Implicit” is a failure answer.
Pair with Harness Stack: vectors 2, 3, 4 are where Verification (layer 3) and Confidence gating (layer 8) do most work. Vector 5 requires Constraint (layer 2) with well-defined scope between agents.
Related posts
- Prompt Infection Taxonomy: the anatomy of defense
- Harness Stack — Verification and Confidence gating are the layers that respond to vectors 2-5.
- Agent Trust Stack — Auditability affected by Tool-mediated and Cross-agent vectors.
When to use
- Threat modeling for a new agent before production.
- Security incident audit in a multi-agent system.
- Red team brief for testing an agent.
When NOT to use
- Closed chatbot without tool use and without external content ingestion — surface too reduced for the full framework.