Agent Trust Stack: when to trust which agent with which task
Extended Agent Trust Stack — 5 dimensions applied with practical examples from Claude Code, Cowork and production, plus full decision matrix.
The Agent Trust Stack defines 5 dimensions for deciding what an agent can do autonomously. This post extends with practical examples applied to Claude Code, Cowork, and general production.
The applied matrix
Score 0-3 per dimension, sum 0-15. Bands:
- 0-5: full autonomy.
- 6-10: autonomy with durable pause on specific actions.
- 11-15: block — requires human in the loop.
Example 1 · Claude Code running git status
| Dimension | Score | Why |
|---|---|---|
| Reversibility | 0 | Read-only, nothing changes |
| Blast radius | 0 | Local only |
| Auditability | 0 | Command logged |
| Cost | 0 | Trivial |
| Time | 0 | <1s |
| Sum | 0 | Full autonomy |
Claude Code runs without asking.
Example 2 · Claude Code running git push --force-with-lease
| Dimension | Score | Why |
|---|---|---|
| Reversibility | 2 | Recoverable via reflog, but team may have pulled |
| Blast radius | 2 | Shared branch |
| Auditability | 1 | Command logged, but history may confuse |
| Cost | 0 | Trivial |
| Time | 0 | <1s |
| Sum | 5 | Borderline. In practice, durable pause. |
Claude Code asks first.
Example 3 · Cowork agent sending email to external customer
| Dimension | Score | Why |
|---|---|---|
| Reversibility | 3 | Sent email is sent. No undo |
| Blast radius | 3 | External customer affected |
| Auditability | 1 | Email log exists |
| Cost | 0 | Cents |
| Time | 0 | Instant |
| Sum | 7 | Mandatory durable pause |
Even trusted agent requires human confirmation.
Example 4 · Agent automating bank transfer
| Dimension | Score | Why |
|---|---|---|
| Reversibility | 3 | Bank transfer is irreversible |
| Blast radius | 3 | Corporate account, real vendor |
| Auditability | 2 | Log exists but critical integration |
| Cost | 3 | Direct monetary value |
| Time | 0 | Instant |
| Sum | 11 | Block — human always in the loop |
Doesn’t matter how “good” the agent is. Policy blocks.
The gotcha of equal scores with different causes
Two flows summing to 7 may need different policies. Email (R3, B3) vs creating a public ticket (R2, B3, A2) has the same score, but:
- Email: reversibility is the pain. Durable pause before send.
- Ticket: auditability is the pain. Detailed logging + auditor permission.
Sum is screening; decomposition is design.
How to apply in code
function evaluateTrust(task: Task): TrustDecision {
const score = task.reversibility + task.blastRadius +
task.auditability + task.cost + task.time;
if (score <= 5) return { autonomy: 'full', requireGate: false };
if (score <= 10) return {
autonomy: 'partial',
requireGate: true,
gateType: chooseGate(task), // durable pause, confirmation, escalation
};
return { autonomy: 'blocked', requireHuman: true };
}
In practice, scoring comes from an editable policy file (YAML/JSON), not hardcoded. Allows tuning without redeploy.
The integration with Harness Stack
Trust Stack presumes Harness Stack is present. Without Harness, the dimensions degrade:
- No Verification (layer 3) → Reversibility worsens (any action can go wrong).
- No Failure corpus (layer 9) → Auditability worsens (no usable trace).
- No Durable pause (layer 7) → the 6-10 gate doesn’t exist operationally.
Build Harness first. Apply Trust Stack as decision layer on top.
Where to go deeper
Agent Trust Stack hub for the canonical framework. Harness Stack introduction for the infra that unlocks the decisions.