# AI Red Teaming and Adversarial Robustness Engineering Training (MITRE ATLAS + OWASP LLM Top 10 + Garak + PyRIT + Llama Guard)

> Source: https://sukruyusufkaya.com/en/training/ai-red-teaming-adversarial-robustness-muhendisligi-egitimi
> Updated: 2026-07-02T06:32:18.814Z
> Level: advanced
> Topics: ai red teaming, mitre atlas, owasp llm top 10, prompt injection, jailbreak llm, gcg attack, pair tap jailbreak, indirect prompt injection, data poisoning, model extraction, nvidia garak, microsoft pyrit, promptfoo red team, uk aisi inspect, llama guard 4, constitutional classifiers, nemo guardrails, eu ai act red team, kvkk ai güvenlik, mcp security
**TLDR:** A 3-day advanced Turkish red-teaming training that addresses end to end the security testing of LLM and generative-AI systems, defense against prompt injection + jailbreak + data poisoning + multimodal attacks, and EU AI Act + KVKK + ISO 42001 + BDDK compliance audit. Includes MITRE ATLAS, OWASP LLM Top 10 (2025), NVIDIA Garak, Microsoft PyRIT, Promptfoo, UK AISI Inspect, Llama Guard 4, Anthropic Constitutional Classifiers, NeMo Guardrails, agent + browser-agent + MCP security.

## Açıklama

The AI Red Teaming and Adversarial Robustness Engineering Training is a 3-day advanced program designed for AI Security Engineers, Red Team Engineers, ML Engineers, Compliance Officers, and Senior Backend Developers who want to systematically test and harden enterprise LLM and generative-AI products against attack vectors.

## Kazanımlar

- Skillfully distinguish AI red teaming from classical pen testing.
- Prepare threat-modeling worksheets with the MITRE ATLAS framework.
- Convert all OWASP LLM Top 10 (2025) items into a risk inventory.
- Design direct + indirect prompt injection + multi-turn jailbreak scenarios.
- Apply GCG, PAIR, TAP, Crescendo, Skeleton Key, Many-shot automated attacks.
- Design multimodal attacks (image, audio, document, browser agent).
- Use NVIDIA Garak + Microsoft PyRIT + Promptfoo + UK AISI Inspect tools.
- Build a Llama Guard 4 + Constitutional Classifiers + NeMo Guardrails defense stack.
- Develop attack and defense strategies for agent + browser agent + MCP specific scenarios.
- Produce EU AI Act + KVKK + ISO 42001 + BDDK-compliant red-team audit reports.

<p>This training is designed to teach end to end — in Turkish — AI red teaming + adversarial robustness engineering, the discipline of systematically testing and hardening enterprise generative-AI and LLM products against attack vectors. Developments defining the 2024-2026 period: EU AI Act entering into force in May 2024 + Article 15 robustness/cybersecurity mandate + Article 50 transparency, KVKK Generative AI Guide (2024), ISO/IEC 42001:2023 AI Management System certification, NIST AI RMF 1.1 (2024), the publication of Microsoft AI Red Team methodology, the UK AI Safety Institute (AISI) framework, the maturation of NVIDIA Garak and Microsoft PyRIT open-source red-team tools, the OWASP LLM Top 10 v2.0 (2025) update, and the maturation of the MITRE ATLAS framework. In Turkey, a training that addresses this discipline in Turkish + end to end + production-grade is virtually nonexistent — existing content either stays at OWASP slides or freezes at the shallow jailbreak-demo level. This program is designed to fill that gap as Turkey's most comprehensive production-grade AI red teaming reference training.</p>

<p>The program's strategic backbone is the first module, which clarifies how AI red teaming differs from classical penetration testing and maps the 2026 threat landscape. Classical pen testing was designed for deterministic systems; AI systems are non-deterministic + open to semantic attack surface + natural-language jailbreak — modern AI security cannot be built without grasping this difference. Anthropic constitutional AI + ARC Evals + Responsible Scaling Policy, OpenAI Preparedness Framework + system card red-team reports, Microsoft AI Red Team + UK AISI Inspect Framework methodologies are comparatively covered. Compliance mandates: EU AI Act Article 15 (robustness + cybersecurity), KVKK Generative AI Guide (2024), ISO/IEC 42001:2023 audit requirements, banking BDDK + healthcare SBSGM + financial SPK + audit KGK sectoral AI security frameworks are detailed. For Turkish enterprise AI teams in 2026, red teaming has become not optional but mandatory.</p>

<p>The second module covers in detail MITRE's ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) framework, started in 2020 and matured in 2024-2026. ATLAS matrix structure: 14 tactics (Reconnaissance → ML Model Discovery → Initial Access → Execution → Persistence → ML Attack Staging → Exfiltration → Impact) and 80+ techniques. ATLAS Navigator tool + JSON layer files usage is shown practically. Critical tactics deep dive: ML Model Reconnaissance (T1591), ML Supply Chain Compromise, Prompt Injection (LLM-AT0007), Jailbreak (LLM-AT0006), Data Poisoning (AML.T0020), Model Stealing (AML.T0044). Real-world cases (Microsoft Tay 2016, OpenAI ChatGPT plugin attacks 2023, indirect prompt injection cases) are categorized in the ATLAS taxonomy. Threat-modeling worksheet: ATLAS-based risk inventory + impact-likelihood scoring + mitigation roadmap for an enterprise LLM product. The ATLAS + STRIDE + OWASP LLM Top 10 unified framework is ideal for Turkish red-team reports.</p>

<p>The third module covers in detail OWASP LLM Top 10 — started by the OWASP Foundation in 2023 and updated to v2.0 in 2025. LLM01 Prompt Injection (direct + indirect distinction), LLM02 Insecure Output Handling (LLM output being an XSS / SQL injection vector), LLM03 Training Data Poisoning, LLM04 Model Denial of Service (DoS), LLM05 Supply Chain Vulnerabilities (HuggingFace pickle, model lineage), LLM06 Sensitive Information Disclosure (PII + system-prompt leak), LLM07 System Prompt Leakage (Anthropic + OpenAI leaks discovered in 2024-2025), LLM08 Vector and Embedding Weaknesses (RAG poisoning), LLM09 Misinformation (hallucination weaponization), LLM10 Unbounded Consumption (cost + DoS attack). Each item is presented with a real-world example + a mitigation checklist + a Python code example. The OWASP + ATLAS + NIST AI RMF unified mapping significantly eases the work of Turkish enterprise compliance teams.</p>

<p>The fourth module covers prompt injection — LLM security's most critical attack vector — at mathematical and practical levels. Direct Prompt Injection (DPI): the user directly overriding the system prompt with 'ignore previous instructions' or jailbreak templates (DAN, STAN); role-play hijacking; persona-switching attacks. Indirect Prompt Injection (IPI): based on Greshake et al. 2023 paper — malicious content hidden in a RAG document, web page, email, PDF, image OCR, or audio transcript executed by the LLM as injection. Anthropic's 2024 IPI research on Claude Computer Use + Claude, and real-world ChatGPT plugin attack cases, are analyzed in detail. Mitigation layers: Anthropic's spotlight (XML tag), prompt sandwiching + delimiter, input sanitization, LLM-as-judge detection layer, principle of least privilege, output validation, sandboxing. In production, no single defense is sufficient — defense in depth is mandatory.</p>

<p>The fifth module covers the 2023-2026 evolution of LLM jailbreak. Manual: DAN (Do Anything Now), STAN, hypothetical scenario, role-play hijacking, encoding tricks (Base64, ROT13, Pig Latin, Unicode obfuscation), low-resource language jailbreak. Automated: GCG (Greedy Coordinate Gradient suffix attack, Zou et al. 2023), AutoDAN (gradient-free), PAIR (Prompt Automatic Iterative Refinement, Chao 2023), TAP (Tree of Attacks with Pruning) Python implementation. Multi-turn: Crescendo (Microsoft 2024 gradual escalation, starting with small harmless questions and gradually ramping up), Skeleton Key (Microsoft 2024 universal bypass), Many-shot jailbreaking (Anthropic 2024, in-context jailbreak with 256+ examples). Defense: Anthropic Constitutional Classifiers (2025, 95% jailbreak prevention), Llama Guard 4 (Meta 2025), NVIDIA NeMo Guardrails. Multi-turn vs single-turn defense comparison.</p>

<p>The sixth module addresses the attack surface of multimodal LLMs that spread in 2024-2026. Image-based: visual prompt injection (Bagdasaryan 2023), invisible Unicode text-in-image, adversarial image patches, QR code injection, hidden white-on-white text. Audio: TTS jailbreak (Anthropic Claude voice 2024-2025), audio adversarial perturbation, Whisper transcription injection. Document: PDF + DOCX hidden injection, OCR-based attack, white-on-white text trick. Browser Agent / Computer Use specific: Anthropic Computer Use IPI risks (acknowledged in the Anthropic Computer Use security paper), OpenAI Operator + Browser Use screenshot manipulation attacks, DOM-based prompt injection, popup hijacking. Specific attack patterns for GPT-5 Vision, Claude Sonnet 4.6 + Opus 4.7 Vision, Gemini 2.5 Vision are done hands-on.</p>

<p>The seventh module addresses attacks targeting training pipeline and ML supply chain. Data Poisoning: BadNets (Gu 2017 — adding backdoor triggers to the training set), instruction-tuning data poisoning (Wan 2023, Xu 2024), RAG vector-store poisoning (steering retrieval by embedding malicious documents), GraphRAG attack vectors. Model Extraction: Tramer 2016 model stealing via API, knowledge-distillation extraction attack, embedding extraction; Anthropic + OpenAI's watermarking defense approaches. Supply Chain: HuggingFace pickle deserialization vulnerability (2024 GitHub Sleepy Pickle case — arbitrary-code-execution risk via pickle), GGUF model lineage attack, model-card metadata manipulation, malicious LoRA adapter distribution. Mitigation: safetensors enforcement, model lineage verification, signature checking, sandbox loading.</p>

<p>The eighth module covers in detail the leading red-team tools of the 2024-2026 ecosystem. NVIDIA Garak (open-source generative-AI vulnerability scanner): 100+ built-in probes (DAN, GCG, leakage, encoding, malware-gen), modular detector framework, fast LLM scan with the garak --model_type command; writing custom probe + detector + buff; Garak HTML report + CI/CD integration. Microsoft PyRIT (Python Risk Identification Tool for generative AI): orchestrator + target + converter + scorer architecture; Crescendo + RedTeaming orchestrator for multi-turn attacks; Azure Content Safety + Azure OpenAI integration. Promptfoo (open-source eval + red team): red team plugin + CI/CD integration + prompt regression. UK AISI Inspect (2024): government-grade evaluation framework, hands-on dangerous-capability eval. Tool-selection matrix: practical decision guide on which tool is optimal for which scenario.</p>

<p>The ninth module covers in detail the layered defense discipline against attacks. Meta Llama Guard 4 (2025): input + output classification, safety taxonomy (S1 violent crime → S14 elections), Python deployment with writing fine-tuned custom Llama Guard. Anthropic Constitutional Classifiers (2025): jailbreak-robust filtering, the 95% jailbreak-prevention claim, and real-world performance. NVIDIA NeMo Guardrails: Colang DSL syntax + flow + rail definition; topic guardrails (off-topic prevention) + RAG safety + dialogue guardrails; NeMo + LangChain + LlamaIndex integration. Multi-layer defense in depth: input → output → tool call → output validation layers; GuardrailsAI + Outlines (structured output) + Microsoft Guidance integration; cost vs latency vs robustness trade-off decision matrix. In production, no silver bullet — layered approach is mandatory.</p>

<p>The tenth module addresses in detail the new attack surface opened by the agent paradigm. Tool misuse: the agent calling the wrong tool (e.g., using the 'send_email' tool to send spam), excessive privilege scope creep, the confused-deputy problem (mismatch between user trust and LLM action). MCP (Model Context Protocol) attacks: malicious MCP server, tool-description injection, MCP server response manipulation, chain injection. Browser-agent risks: IPI risks acknowledged in the Anthropic Computer Use security paper, OpenAI Operator + Browser Use screenshot manipulation, DOM-based prompt injection, popup hijacking. Defense patterns: principle of least privilege (minimum scope per tool), human-in-the-loop approval (human approval for critical actions), tool sandboxing + ephemeral VM + scope-limited credentials, MCP server signing + verification.</p>

<p>The eleventh module ties red-teaming results to enterprise compliance discipline. EU AI Act (May 2024 entry into force): Article 15 robustness + cybersecurity (red teaming mandatory for high-risk AI), Article 50 transparency (deepfake + generative-AI labeling), high-risk AI classification Annex III, fines of €35M or 7% of global revenue. KVKK Generative AI Guide (2024): risk assessment, PII handling, jailbreak prevention, audit framework. ISO/IEC 42001:2023 AI Management System certification process; NIST AI RMF 1.1 (2024) Govern + Map + Measure + Manage functions; tracking Frontier Model Forum (FMF) + GPAI commitments. Turkey sectoral framework: BDDK banking AI framework + KGK BDS audit; SBSGM healthcare AI; SPK financial AI; KGK audit. Turkish red-team audit report template + remediation roadmap are shown practically.</p>

<p>In the capstone module, each participant builds an end-to-end red-team playbook for their organization's LLM product: target system profile (chatbot, agent, RAG, browser agent, multimodal LLM), ATLAS-based threat-modeling worksheet, OWASP LLM Top 10 risk inventory, attack pipeline (Garak + PyRIT + Promptfoo + custom probes), defense stack (Llama Guard 4 + Constitutional Classifiers + NeMo Guardrails + custom filters), compliance audit (EU AI Act + KVKK + ISO 42001 + sectoral BDDK/SBSGM/SPK), 90-day remediation roadmap. By the end of the training, participants reach a level of technical competence to clearly frame how AI red teaming differs from classical pen testing; skillfully use the MITRE ATLAS + OWASP LLM Top 10 (2025) frameworks; design direct + indirect prompt injection + multi-turn jailbreak + multimodal attack scenarios; recognize data poisoning + model extraction + supply-chain attacks; use NVIDIA Garak + Microsoft PyRIT + Promptfoo + UK AISI Inspect tools in production; build a Llama Guard 4 + Constitutional Classifiers + NeMo Guardrails defense stack; provide defense against agent + browser-agent + MCP specific attacks; and produce EU AI Act + KVKK + ISO 42001 + BDDK compliance audit reports. The training consists of 3 days, 12 modules, and over 100 hands-on lessons.</p>