Skip to content
🟠 Builder

Agent Trust Stack: when to trust which agent with which task

Extended Agent Trust Stack — 5 dimensions applied with practical examples from Claude Code, Cowork and production, plus full decision matrix.

The Agent Trust Stack defines 5 dimensions for deciding what an agent can do autonomously. This post extends with practical examples applied to Claude Code, Cowork, and general production.

The applied matrix

Score 0-3 per dimension, sum 0-15. Bands:

  • 0-5: full autonomy.
  • 6-10: autonomy with durable pause on specific actions.
  • 11-15: block — requires human in the loop.

Example 1 · Claude Code running git status

DimensionScoreWhy
Reversibility0Read-only, nothing changes
Blast radius0Local only
Auditability0Command logged
Cost0Trivial
Time0<1s
Sum0Full autonomy

Claude Code runs without asking.

Example 2 · Claude Code running git push --force-with-lease

DimensionScoreWhy
Reversibility2Recoverable via reflog, but team may have pulled
Blast radius2Shared branch
Auditability1Command logged, but history may confuse
Cost0Trivial
Time0<1s
Sum5Borderline. In practice, durable pause.

Claude Code asks first.

Example 3 · Cowork agent sending email to external customer

DimensionScoreWhy
Reversibility3Sent email is sent. No undo
Blast radius3External customer affected
Auditability1Email log exists
Cost0Cents
Time0Instant
Sum7Mandatory durable pause

Even trusted agent requires human confirmation.

Example 4 · Agent automating bank transfer

DimensionScoreWhy
Reversibility3Bank transfer is irreversible
Blast radius3Corporate account, real vendor
Auditability2Log exists but critical integration
Cost3Direct monetary value
Time0Instant
Sum11Block — human always in the loop

Doesn’t matter how “good” the agent is. Policy blocks.

The gotcha of equal scores with different causes

Two flows summing to 7 may need different policies. Email (R3, B3) vs creating a public ticket (R2, B3, A2) has the same score, but:

  • Email: reversibility is the pain. Durable pause before send.
  • Ticket: auditability is the pain. Detailed logging + auditor permission.

Sum is screening; decomposition is design.

How to apply in code

function evaluateTrust(task: Task): TrustDecision {
  const score = task.reversibility + task.blastRadius + 
                task.auditability + task.cost + task.time;
  
  if (score <= 5) return { autonomy: 'full', requireGate: false };
  if (score <= 10) return { 
    autonomy: 'partial', 
    requireGate: true,
    gateType: chooseGate(task), // durable pause, confirmation, escalation
  };
  return { autonomy: 'blocked', requireHuman: true };
}

In practice, scoring comes from an editable policy file (YAML/JSON), not hardcoded. Allows tuning without redeploy.

The integration with Harness Stack

Trust Stack presumes Harness Stack is present. Without Harness, the dimensions degrade:

  • No Verification (layer 3) → Reversibility worsens (any action can go wrong).
  • No Failure corpus (layer 9) → Auditability worsens (no usable trace).
  • No Durable pause (layer 7) → the 6-10 gate doesn’t exist operationally.

Build Harness first. Apply Trust Stack as decision layer on top.

Where to go deeper

Agent Trust Stack hub for the canonical framework. Harness Stack introduction for the infra that unlocks the decisions.