Back to full roadmap

topicadvanced

Agent Red-Teaming

Systematically try to break your own agent before shipping to production.

3 hours2 resources1 prereqs

Agent red-team scope is wider than single-shot LLM:

Direct injection ("ignore prev")
Indirect injection (poisoned web pages, files, emails)
Capability misuse (does the model call destructive tools unauthorized?)
Resource exhaustion (loop-based cost attack)
Privacy leak (does the model leak PII to a tool?)
Tool result tampering (manipulate model with wrong observations)
Multi-step attacks (5-turn social engineering)

Tools: PyRIT (Microsoft), Garak, Anthropic datasets. Adversarial dataset → auto-run before every release.

Prerequisites

Indirect Prompt Injection (Agent-Specific)

Malicious instructions hidden in web pages/emails/files the agent reads — the most dangerous attack.

Resources(2)

GGitHub(2)

PyRIT (Microsoft AI Red Team)

Garak (LLM vulnerability scanner)

Rate Limiting & Queue Management

Audit Trail & Compliance

Open the full interactive roadmap