Back to full roadmap
topicadvanced
Agent Red-Teaming
Systematically try to break your own agent before shipping to production.
3 hours2 resources1 prereqs
Agent red-team scope is wider than single-shot LLM:
- Direct injection ("ignore prev")
- Indirect injection (poisoned web pages, files, emails)
- Capability misuse (does the model call destructive tools unauthorized?)
- Resource exhaustion (loop-based cost attack)
- Privacy leak (does the model leak PII to a tool?)
- Tool result tampering (manipulate model with wrong observations)
- Multi-step attacks (5-turn social engineering)
Tools: PyRIT (Microsoft), Garak, Anthropic datasets. Adversarial dataset → auto-run before every release.