topicadvanced

Capability Gating

Tier the destructive actions the agent can take — default: least privilege.

2 hours1 prereqs

Least-privilege agent: 3 tiers per tool:

Read-only — fetches info, doesn't change the world. Default-allowed.
Write (reversible) — drafts, prepares emails, creates files. User approval required (Claude Desktop pattern).
Destructive (irreversible) — deletes files, sends emails, makes payments. Mandatory HITL + double confirmation.

Implementation: capability_tier enum per tool + middleware. For tier 2/3 actions, show user a structured "approve/deny" prompt.

Anthropic Computer Use and Claude Desktop apply this — modal popup when destructive tool is called.

Prerequisites