A/B + Shadow Traffic: Feature Flag + Canary 1%→5%→25% + Automated Rollback

Safe way to put new FT model in production: shadow traffic (old + new in parallel, compare responses), canary deployment (gradual rampup 1%→5%→25%→100%), feature flag (LaunchDarkly / GrowthBook / Unleash), automated rollback (P95 latency or error rate threshold).

Şükrü Yusuf KAYA

28 min read

6/26/2026

Advanced

A/B + Shadow Traffic: Feature Flag + Canary 1%→5%→25% + Automated Rollback

1. Shadow Traffic Pattern#

Request → API Gateway
            ├── (primary 100%) → Production model v1
            └── (shadow 100%) → Candidate model v2  [response logged, not returned]

Compare:
  - Response similarity (semantic + exact)
  - Latency (P50, P95, P99)
  - Token usage
  - Error rate

Avantaj: Real user traffic ile v2'yi test, ama user sadece v1 görür → risk yok.

Cookbook'un kuralı: Shadow traffic 24-72 saat → sayısal kanıt → canary rollout.

2. Canary Rollout Reçetesi#

Phase	Traffic %	Süre	Çıkış kriterleri
1. Shadow	0% (logged only)	24-72h	Sample 1000+ pair, output similarity OK
2. Canary 1%	1%	4-12h	Error rate < baseline + 0.5%
3. Canary 5%	5%	12-24h	P95 latency < baseline × 1.1
4. Canary 25%	25%	24-48h	User feedback (thumbs) >= baseline
5. Full	100%	—	Rollout complete

Automated rollback triggers:

Error rate > %2 (baseline 0.5%)
P95 latency > 5s (baseline 2s)
Thumbs-down rate > %10 (baseline 5%)
Critical eval set regression > 3 puan

# Feature flag pseudo
if feature_flag("llm_model_v2", user_id, percent_rollout=5):
    response = call_model_v2(prompt)
else:
    response = call_model_v1(prompt)

✅ Teslim

GrowthBook veya Unleash kur (open-source feature flag). 2) Shadow traffic logging pipeline yaz. 3) Sonraki ders: 16.3 — Online Eval.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

A/B + Shadow Traffic: Feature Flag + Canary 1%→5%→25% + Automated Rollback

1. Shadow Traffic Pattern#

2. Canary Rollout Reçetesi#

Yorumlar & Soru-Cevap

Related Content

Welcome to the Fine-Tuning Cookbook: System, Stage Taxonomy, and the Reproducibility Contract

Reproducibility Stack: Seeds, cuDNN Flags, and Deterministic CUDA — End the 'Works on My Machine' Problem

Environment Pinning: uv + pyproject.toml, CUDA Version Matrix, and Container Recipes

Subscribe to Newsletter