Skip to content

A/B + Shadow Traffic: Feature Flag + Canary 1%→5%→25% + Automated Rollback

Safe way to put new FT model in production: shadow traffic (old + new in parallel, compare responses), canary deployment (gradual rampup 1%→5%→25%→100%), feature flag (LaunchDarkly / GrowthBook / Unleash), automated rollback (P95 latency or error rate threshold).

Şükrü Yusuf KAYA
28 min read
Advanced
A/B + Shadow Traffic: Feature Flag + Canary 1%→5%→25% + Automated Rollback

1. Shadow Traffic Pattern#

Request → API Gateway ├── (primary 100%) → Production model v1 └── (shadow 100%) → Candidate model v2 [response logged, not returned] Compare: - Response similarity (semantic + exact) - Latency (P50, P95, P99) - Token usage - Error rate
Avantaj: Real user traffic ile v2'yi test, ama user sadece v1 görür → risk yok.
Cookbook'un kuralı: Shadow traffic 24-72 saat → sayısal kanıt → canary rollout.

2. Canary Rollout Reçetesi#

PhaseTraffic %SüreÇıkış kriterleri
1. Shadow0% (logged only)24-72hSample 1000+ pair, output similarity OK
2. Canary 1%1%4-12hError rate < baseline + 0.5%
3. Canary 5%5%12-24hP95 latency < baseline × 1.1
4. Canary 25%25%24-48hUser feedback (thumbs) >= baseline
5. Full100%Rollout complete
Automated rollback triggers:
  • Error rate > %2 (baseline 0.5%)
  • P95 latency > 5s (baseline 2s)
  • Thumbs-down rate > %10 (baseline 5%)
  • Critical eval set regression > 3 puan
# Feature flag pseudo if feature_flag("llm_model_v2", user_id, percent_rollout=5): response = call_model_v2(prompt) else: response = call_model_v1(prompt)
✅ Teslim
  1. GrowthBook veya Unleash kur (open-source feature flag). 2) Shadow traffic logging pipeline yaz. 3) Sonraki ders: 16.3 — Online Eval.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

Related Content