Skip to content
Back to full roadmap
topicadvanced

Agent Red-Teaming

Systematically try to break your own agent before shipping to production.

3 hours2 resources1 prereqs

Agent red-team scope is wider than single-shot LLM:

  • Direct injection ("ignore prev")
  • Indirect injection (poisoned web pages, files, emails)
  • Capability misuse (does the model call destructive tools unauthorized?)
  • Resource exhaustion (loop-based cost attack)
  • Privacy leak (does the model leak PII to a tool?)
  • Tool result tampering (manipulate model with wrong observations)
  • Multi-step attacks (5-turn social engineering)

Tools: PyRIT (Microsoft), Garak, Anthropic datasets. Adversarial dataset → auto-run before every release.

Prerequisites

Resources(2)