Back to full roadmap

topicadvanced

Red Teaming

Systematically try to break your own model before shipping.

3 hours1 resources1 prereqs

Structure:

Build an adversarial prompt suite (200-500 known + creative)
Run automatically, log bad responses
Human red-teamers try 'creative attacks'
Improve prompt + guardrails based on findings
Re-run every release

Prerequisites

Jailbreak Defense

DAN, role-play, hypothetical-scenario, encoded attacks — bypassing the model's safety training.

Resources(1)

GGitHub(1)

PyRIT (Microsoft AI Red Team)

PII Detection & Redaction

Open the full interactive roadmap