Back to full roadmap
topicadvanced
Capability Gating
Tier the destructive actions the agent can take — default: least privilege.
2 hours1 prereqs
Least-privilege agent: 3 tiers per tool:
- Read-only — fetches info, doesn't change the world. Default-allowed.
- Write (reversible) — drafts, prepares emails, creates files. User approval required (Claude Desktop pattern).
- Destructive (irreversible) — deletes files, sends emails, makes payments. Mandatory HITL + double confirmation.
Implementation: capability_tier enum per tool + middleware. For tier 2/3 actions, show user a structured "approve/deny" prompt.
Anthropic Computer Use and Claude Desktop apply this — modal popup when destructive tool is called.