🟠 Builder

Agent Trust Stack: when to trust which agent with which task

Extended Agent Trust Stack — 5 dimensions applied with practical examples from Claude Code, Cowork and production, plus full decision matrix.

May 15, 2026 · 10 min · ai-agents

The Agent Trust Stack defines 5 dimensions for deciding what an agent can do autonomously. This post extends with practical examples applied to Claude Code, Cowork, and general production.

The applied matrix

Score 0-3 per dimension, sum 0-15. Bands:

0-5: full autonomy.
6-10: autonomy with durable pause on specific actions.
11-15: block — requires human in the loop.

Example 1 · Claude Code running `git status`

Dimension	Score	Why
Reversibility	0	Read-only, nothing changes
Blast radius	0	Local only
Auditability	0	Command logged
Cost	0	Trivial
Time	0	<1s
Sum	0	Full autonomy

Claude Code runs without asking.

Example 2 · Claude Code running `git push --force-with-lease`

Dimension	Score	Why
Reversibility	2	Recoverable via reflog, but team may have pulled
Blast radius	2	Shared branch
Auditability	1	Command logged, but history may confuse
Cost	0	Trivial
Time	0	<1s
Sum	5	Borderline. In practice, durable pause.

Claude Code asks first.

Example 3 · Cowork agent sending email to external customer

Dimension	Score	Why
Reversibility	3	Sent email is sent. No undo
Blast radius	3	External customer affected
Auditability	1	Email log exists
Cost	0	Cents
Time	0	Instant
Sum	7	Mandatory durable pause

Even trusted agent requires human confirmation.

Example 4 · Agent automating bank transfer

Dimension	Score	Why
Reversibility	3	Bank transfer is irreversible
Blast radius	3	Corporate account, real vendor
Auditability	2	Log exists but critical integration
Cost	3	Direct monetary value
Time	0	Instant
Sum	11	Block — human always in the loop

Doesn’t matter how “good” the agent is. Policy blocks.

The gotcha of equal scores with different causes

Two flows summing to 7 may need different policies. Email (R3, B3) vs creating a public ticket (R2, B3, A2) has the same score, but:

Email: reversibility is the pain. Durable pause before send.
Ticket: auditability is the pain. Detailed logging + auditor permission.

Sum is screening; decomposition is design.

How to apply in code

function evaluateTrust(task: Task): TrustDecision {
  const score = task.reversibility + task.blastRadius + 
                task.auditability + task.cost + task.time;
  
  if (score <= 5) return { autonomy: 'full', requireGate: false };
  if (score <= 10) return { 
    autonomy: 'partial', 
    requireGate: true,
    gateType: chooseGate(task), // durable pause, confirmation, escalation
  };
  return { autonomy: 'blocked', requireHuman: true };
}

In practice, scoring comes from an editable policy file (YAML/JSON), not hardcoded. Allows tuning without redeploy.

The integration with Harness Stack

Trust Stack presumes Harness Stack is present. Without Harness, the dimensions degrade:

No Verification (layer 3) → Reversibility worsens (any action can go wrong).
No Failure corpus (layer 9) → Auditability worsens (no usable trace).
No Durable pause (layer 7) → the 6-10 gate doesn’t exist operationally.

Build Harness first. Apply Trust Stack as decision layer on top.

Where to go deeper

Agent Trust Stack hub for the canonical framework. Harness Stack introduction for the infra that unlocks the decisions.