Multi-agent orchestration patterns: 6 patterns with tradeoffs
Six multi-agent orchestration patterns with explicit tradeoffs. When it's worth it, when it's not, and how to avoid the 'crew' anti-pattern where 5 agents do the work of 1.
Multi-agent has become over-sold. In 80% of cases we see, a well-designed single agent would beat three “crew” agents. But there are genuine cases where multi-agent pays. Six patterns, with tradeoffs.
Pattern 1 · Sequential pipeline
Agent A → output → Agent B → output → Agent C.
When it’s worth it: tasks with distinct phases and appropriate model swap per phase. Researcher (Sonnet) → writer (Opus) → reviewer (Haiku for quick check).
Tradeoff:
- ✅ Each phase optimized by right model.
- ✅ High auditability (each output is a checkpoint).
- ❌ Latency adds up. 3 sequential agents = 3× the latency of 1.
- ❌ Error propagates. Failure in A becomes garbage in B and C.
Mitigation: validation between stages, with retry.
Pattern 2 · Specialist routing (router + experts)
Router agent classifies request → routes to appropriate expert (legal, financial, support). Each expert is specialized.
When it’s worth it: domain where real specialization exists and the router classifies with high confidence.
Tradeoff:
- ✅ Each expert can have a short, focused system prompt.
- ✅ Cost-efficient (simple expert can use Haiku).
- ❌ Wrong router destroys the system.
- ❌ Cross-domain (request touching legal + financial) becomes a problem.
Mitigation: router with confidence threshold; when low, escalate to multi-expert or human.
Pattern 3 · Debate / dialectical
Two agents argue opposing positions; a third decides.
When it’s worth it: decisions with sensitive tradeoffs (approve/reject complex request, choose between alternatives).
Tradeoff:
- ✅ Captures nuance a single agent skips.
- ❌ Very expensive. 3× the cost and higher latency.
- ❌ Risk of “theatrical debate” — agents generate arguments without real adversarial pressure.
Used in: Anthropic’s Constitutional AI training; rarely in direct commercial production.
Pattern 4 · Hierarchical (manager + workers)
Manager agent breaks task into subtasks, distributes to workers, aggregates.
When it’s worth it: tasks that decompose well (research across N parallel topics, code across N modules).
Tradeoff:
- ✅ Real parallelism.
- ❌ Coordination costs. Manager needs large context window.
- ❌ When subtasks are interdependent, manager becomes bottleneck.
Pattern 5 · Adversarial / red team
One agent generates, another tries to break. Iterates until stable.
When it’s worth it: producing robust output (code with edge cases covered, text without legal ambiguity).
Tradeoff:
- ✅ Final output much better than single-agent.
- ❌ Infinite iterations if design is bad. Time-box mandatory.
Pattern 6 · Single agent with advisor escalation
A main agent. Advisor (another model) consulted when confidence is low or stakes high. Iterative until stable.
When it’s worth it: most cases.
Tradeoff:
- ✅ Simplicity. Advisor max_uses=1 per interaction controls cost.
- ✅ Covers 80% of multi-agent’s value with a fraction of the overhead.
- ❌ Doesn’t cover tasks needing real parallelism.
This is our default pattern (literal).
The anti-pattern: “crew” for everything
Frameworks like CrewAI popularize “delegate tasks to a crew of specialized agents.” In real production, we see:
- 5 agents consuming 5× the tokens to deliver what 1 would.
- Bug in one agent cascades to others (vector 5 of Prompt Infection Taxonomy).
- Debugging becomes a nightmare (which agent erred?).
Heuristic: start with 1 agent. Add a second only when a specific bottleneck demonstrates decomposition would solve it.
Cost matrix (relative estimate)
| Pattern | Tokens | Latency | Complexity |
|---|---|---|---|
| Single agent | 1× | 1× | Low |
| Single + advisor (rare) | 1.5× | 1.5× | Medium |
| Sequential pipeline | 3× | 3× | Medium |
| Specialist routing | 1.5-2× | 1.5× | High |
| Debate (3 agents) | 3× | 2× | High |
| Hierarchical | 2-5× | 2-3× | Very high |
| Adversarial loop | 2-10× | 2-5× | Very high |
Where to go deeper
For the delegation framework (which task to delegate to which agent): Agent Trust Stack. For safety in multi-agent (cross-agent propagation): Prompt Infection Taxonomy.