LiteLLM Virtual Keys: Production-Grade Multi-Tenant Cost Attribution Infrastructure
Creating virtual keys in LiteLLM Proxy, per-key budgets, rate limits, model whitelists, and the full admin API. Each tenant gets their own key = automatic attribution + automatic control.
Şükrü Yusuf KAYA
20 min read
Advanced🔑 Tek master key, sonsuz virtual key
Production'a hazır LLM altyapısının iskeleti: bir master OpenAI/Anthropic key, LiteLLM Proxy aracılığıyla yüzlerce virtual key. Her tenant kendi virtual key'iyle çağırır — attribution otomatik, budget enforce.
Mimari Görünüm#
[Tenant 1 App] ──┐ │ sk-virtual-1 [Tenant 2 App] ──┤ │ sk-virtual-2 [Tenant N App] ──┤ ↓ [LiteLLM Proxy] ── sk-master ──→ [OpenAI / Anthropic / Gemini] │ ↓ [Postgres: keys, budgets, usage] ↓ [Langfuse / ClickHouse — telemetry]
Her tenant LiteLLM Proxy'ye kendi virtual key'i ile istek atar. Proxy:
- Virtual key'i doğrular
- Budget'ı kontrol eder
- Rate limit'i uygular
- Model whitelist'i kontrol eder
- Master key ile gerçek provider'a forward'lar
- Response'u tenant'a döner
- Usage'ı Postgres'e ve Langfuse'a yazar
LiteLLM Proxy Kurulumu#
Docker Compose ile self-host:
yaml
# docker-compose.ymlversion: "3.9" services: litellm-db: image: postgres:16 environment: POSTGRES_DB: litellm POSTGRES_USER: litellm POSTGRES_PASSWORD: ${LITELLM_DB_PASSWORD} volumes: - litellm_db_data:/var/lib/postgresql/data litellm: image: ghcr.io/berriai/litellm:main-latest ports: - "4000:4000" environment: DATABASE_URL: postgresql://litellm:${LITELLM_DB_PASSWORD}@litellm-db:5432/litellm LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY} LITELLM_SALT_KEY: ${LITELLM_SALT_KEY} # Real provider keys OPENAI_API_KEY: ${OPENAI_API_KEY} ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY} GOOGLE_API_KEY: ${GOOGLE_API_KEY} # Langfuse callback LANGFUSE_PUBLIC_KEY: ${LANGFUSE_PUBLIC_KEY} LANGFUSE_SECRET_KEY: ${LANGFUSE_SECRET_KEY} volumes: - ./config.yaml:/app/config.yaml command: --config /app/config.yaml --port 4000 --num_workers 8 volumes: litellm_db_data:Production LiteLLM proxy docker setup.
yaml
# config.yamlmodel_list: - model_name: gpt-5 litellm_params: model: gpt-5 api_key: os.environ/OPENAI_API_KEY - model_name: claude-sonnet-4-6 litellm_params: model: claude-sonnet-4-6 api_key: os.environ/ANTHROPIC_API_KEY - model_name: gemini-2.5-flash litellm_params: model: gemini/gemini-2.5-flash api_key: os.environ/GOOGLE_API_KEY general_settings: master_key: os.environ/LITELLM_MASTER_KEY database_url: os.environ/DATABASE_URL store_model_in_db: true alerting: ["slack"] alerting_threshold: 300 # alert on >300s requests litellm_settings: success_callback: ["langfuse"] failure_callback: ["langfuse"] cache: true cache_params: type: "redis" host: "redis" drop_params: true # provider-spesifik unsupported param'ları sustur router_settings: routing_strategy: "least-busy" num_retries: 3 timeout: 60config.yaml — model katalog + general settings + callback'ler.
Virtual Key Yaratma — Admin API#
import requests LITELLM_HOST = "https://litellm-proxy.your-saas.com" MASTER_KEY = os.environ["LITELLM_MASTER_KEY"] def create_tenant_key(tenant_id: str, plan: str): """Tenant'a virtual key yarat.""" plan_config = { "starter": { "models": ["gpt-5-mini", "claude-haiku-4-5"], "max_budget": 100, # $100/ay "tpm_limit": 50_000, "rpm_limit": 300, }, "pro": { "models": ["gpt-5", "gpt-5-mini", "claude-sonnet-4-6", "claude-haiku-4-5"], "max_budget": 1000, "tpm_limit": 200_000, "rpm_limit": 1500, }, "enterprise": { "models": ["*"], # all "max_budget": 10_000, "tpm_limit": 1_000_000, "rpm_limit": 10_000, }, } config = plan_config[plan] resp = requests.post( f"{LITELLM_HOST}/key/generate", headers={"Authorization": f"Bearer {MASTER_KEY}"}, json={ "key_alias": f"tenant-{tenant_id}", "models": config["models"], "max_budget": config["max_budget"], "budget_duration": "1mo", # aylık yenilenir "tpm_limit": config["tpm_limit"], "rpm_limit": config["rpm_limit"], "metadata": { "tenant_id": tenant_id, "plan": plan, "created_at": datetime.utcnow().isoformat(), }, "permissions": {"allow_chat_completions_endpoint": True}, "soft_budget_cooldown": True, # %80'de uyar, %100'de dur }, ) return resp.json() # {"key": "sk-litellm-...", "expires": "..."}
Tenant Tarafı Kullanım#
Tenant'ın app'i (Next.js, Python service, vs.) virtual key ile çağırır:
from openai import OpenAI client = OpenAI( api_key="sk-litellm-acme-corp-...", # ← virtual key base_url="https://litellm-proxy.your-saas.com", ) response = client.chat.completions.create( model="claude-sonnet-4-6", messages=[{"role": "user", "content": "..."}], # metadata zorunlu değil — virtual key zaten attribution yapıyor extra_headers={"X-User-ID": "user_42"}, # opsiyonel user-level granularity )
Bu istek LiteLLM Proxy'den geçer:
- Key valid mi? ✅
- Budget'ı aştı mı? hayır → continue
- Rate limit ok? evet → continue
- Model whitelist'te mi? ∈ pro plan models → ✅
claude-sonnet-4-6 - Anthropic'e forward
- Response geri, telemetry yazıldı, budget güncellendi
Budget Enforcement Davranışı#
LiteLLM 3 enforcement seviyesi sunar:
1. Hard Stop#
Budget aşıldığında istek 403 ile reddedilir.
"soft_budget_cooldown": false # default
2. Soft Cooldown#
Budget'ın %80'inde uyarı, %100'de retry-after.
"soft_budget_cooldown": true
3. Warning Only#
Budget aşılsa bile geçer ama Slack alert tetikler. Enterprise SLA için.
Tenant'a göstermek#
Frontend'de kullanıcıya "Bu ay AI usage'ınızın %X'i tamamlandı" göstermek için:
def get_tenant_usage(tenant_key): resp = requests.get( f"{LITELLM_HOST}/key/info?key={tenant_key}", headers={"Authorization": f"Bearer {MASTER_KEY}"}, ) info = resp.json() return { "spent": info["spend"], "budget": info["max_budget"], "remaining": info["max_budget"] - info["spend"], "pct_used": info["spend"] / info["max_budget"] * 100, }
İleri Pattern'ler#
Per-user budget içinden virtual key#
Bir tenant kendi içinde kullanıcı bazlı budget istiyorsa, child virtual keys:
# Tenant master key var # Her user için child key (limit < tenant'ın) create_key( parent_key_alias="tenant-acme-corp", user_id="user_42", max_budget=10, # bu user aylık $10 max )
Team-based access#
# Bir team (örn: marketing team) için key create_key( team_id="marketing-team", models=["gpt-5-mini", "gemini-2.5-flash"], # cheap models max_budget=500, )
Internal vs external#
İç ekip key'leri higher tpm, external customer key'leri lower tpm. Aynı proxy üstünde.
💡 Production pattern
LiteLLM Proxy'i stateless çalıştır (Postgres'in state'i ile). 3 replica, load balancer arkasında. Redis cache hit'li çağrılarda extra latency olmuyor. SaaS yığını için kritik infrastructure — modül 15'te HA setup.
▶️ Sıradaki ders
4.4 — Chargeback Raporlama. İç ekiplere (engineering, marketing, customer success) ya da kurumsal müşterilere LLM kullanımını fatura halinde raporlamak. CSV export, monthly reports, ve invoice generation pattern'leri.
Frequently Asked Questions
If you run a single instance, yes. In production, minimum 2 replicas + HA Postgres (e.g., AWS RDS Multi-AZ). LiteLLM's cloud-managed version ($99/mo+) is also available but self-host is generally preferred.
Yorumlar & Soru-Cevap
(0)Yorum yazmak için giriş yap.
Yorumlar yükleniyor...
Related Content
Module 0: Why Cost, Why Now?
The AI Cost Explosion: Why Token Prices Fell 96% from 2022 to 2026 — Yet Bills Grew 40×
Start LearningModule 0: Why Cost, Why Now?
Unit Economics Vocabulary: COGS, Gross Margin, $/User, Contribution Margin — 9 Financial Concepts Every AI Engineer Must Know
Start LearningModule 0: Why Cost, Why Now?