Skip to content

LiteLLM Virtual Keys: Production-Grade Multi-Tenant Cost Attribution Infrastructure

Creating virtual keys in LiteLLM Proxy, per-key budgets, rate limits, model whitelists, and the full admin API. Each tenant gets their own key = automatic attribution + automatic control.

Şükrü Yusuf KAYA
20 min read
Advanced
LiteLLM Virtual Keys: Production-Grade Multi-Tenant Cost Attribution Altyapısı
🔑 Tek master key, sonsuz virtual key
Production'a hazır LLM altyapısının iskeleti: bir master OpenAI/Anthropic key, LiteLLM Proxy aracılığıyla yüzlerce virtual key. Her tenant kendi virtual key'iyle çağırır — attribution otomatik, budget enforce.

Mimari Görünüm#

[Tenant 1 App] ──┐ │ sk-virtual-1 [Tenant 2 App] ──┤ │ sk-virtual-2 [Tenant N App] ──┤ ↓ [LiteLLM Proxy] ── sk-master ──→ [OpenAI / Anthropic / Gemini] │ ↓ [Postgres: keys, budgets, usage] ↓ [Langfuse / ClickHouse — telemetry]
Her tenant LiteLLM Proxy'ye kendi virtual key'i ile istek atar. Proxy:
  1. Virtual key'i doğrular
  2. Budget'ı kontrol eder
  3. Rate limit'i uygular
  4. Model whitelist'i kontrol eder
  5. Master key ile gerçek provider'a forward'lar
  6. Response'u tenant'a döner
  7. Usage'ı Postgres'e ve Langfuse'a yazar

LiteLLM Proxy Kurulumu#

Docker Compose ile self-host:
yaml
# docker-compose.yml
version: "3.9"
 
services:
litellm-db:
image: postgres:16
environment:
POSTGRES_DB: litellm
POSTGRES_USER: litellm
POSTGRES_PASSWORD: ${LITELLM_DB_PASSWORD}
volumes:
- litellm_db_data:/var/lib/postgresql/data
 
litellm:
image: ghcr.io/berriai/litellm:main-latest
ports:
- "4000:4000"
environment:
DATABASE_URL: postgresql://litellm:${LITELLM_DB_PASSWORD}@litellm-db:5432/litellm
LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY}
LITELLM_SALT_KEY: ${LITELLM_SALT_KEY}
 
# Real provider keys
OPENAI_API_KEY: ${OPENAI_API_KEY}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
GOOGLE_API_KEY: ${GOOGLE_API_KEY}
 
# Langfuse callback
LANGFUSE_PUBLIC_KEY: ${LANGFUSE_PUBLIC_KEY}
LANGFUSE_SECRET_KEY: ${LANGFUSE_SECRET_KEY}
volumes:
- ./config.yaml:/app/config.yaml
command: --config /app/config.yaml --port 4000 --num_workers 8
 
volumes:
litellm_db_data:
Production LiteLLM proxy docker setup.
yaml
# config.yaml
model_list:
- model_name: gpt-5
litellm_params:
model: gpt-5
api_key: os.environ/OPENAI_API_KEY
 
- model_name: claude-sonnet-4-6
litellm_params:
model: claude-sonnet-4-6
api_key: os.environ/ANTHROPIC_API_KEY
 
- model_name: gemini-2.5-flash
litellm_params:
model: gemini/gemini-2.5-flash
api_key: os.environ/GOOGLE_API_KEY
 
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
store_model_in_db: true
alerting: ["slack"]
alerting_threshold: 300 # alert on >300s requests
 
litellm_settings:
success_callback: ["langfuse"]
failure_callback: ["langfuse"]
cache: true
cache_params:
type: "redis"
host: "redis"
drop_params: true # provider-spesifik unsupported param'ları sustur
 
router_settings:
routing_strategy: "least-busy"
num_retries: 3
timeout: 60
config.yaml — model katalog + general settings + callback'ler.

Virtual Key Yaratma — Admin API#

import requests LITELLM_HOST = "https://litellm-proxy.your-saas.com" MASTER_KEY = os.environ["LITELLM_MASTER_KEY"] def create_tenant_key(tenant_id: str, plan: str): """Tenant'a virtual key yarat.""" plan_config = { "starter": { "models": ["gpt-5-mini", "claude-haiku-4-5"], "max_budget": 100, # $100/ay "tpm_limit": 50_000, "rpm_limit": 300, }, "pro": { "models": ["gpt-5", "gpt-5-mini", "claude-sonnet-4-6", "claude-haiku-4-5"], "max_budget": 1000, "tpm_limit": 200_000, "rpm_limit": 1500, }, "enterprise": { "models": ["*"], # all "max_budget": 10_000, "tpm_limit": 1_000_000, "rpm_limit": 10_000, }, } config = plan_config[plan] resp = requests.post( f"{LITELLM_HOST}/key/generate", headers={"Authorization": f"Bearer {MASTER_KEY}"}, json={ "key_alias": f"tenant-{tenant_id}", "models": config["models"], "max_budget": config["max_budget"], "budget_duration": "1mo", # aylık yenilenir "tpm_limit": config["tpm_limit"], "rpm_limit": config["rpm_limit"], "metadata": { "tenant_id": tenant_id, "plan": plan, "created_at": datetime.utcnow().isoformat(), }, "permissions": {"allow_chat_completions_endpoint": True}, "soft_budget_cooldown": True, # %80'de uyar, %100'de dur }, ) return resp.json() # {"key": "sk-litellm-...", "expires": "..."}

Tenant Tarafı Kullanım#

Tenant'ın app'i (Next.js, Python service, vs.) virtual key ile çağırır:
from openai import OpenAI client = OpenAI( api_key="sk-litellm-acme-corp-...", # ← virtual key base_url="https://litellm-proxy.your-saas.com", ) response = client.chat.completions.create( model="claude-sonnet-4-6", messages=[{"role": "user", "content": "..."}], # metadata zorunlu değil — virtual key zaten attribution yapıyor extra_headers={"X-User-ID": "user_42"}, # opsiyonel user-level granularity )
Bu istek LiteLLM Proxy'den geçer:
  • Key valid mi? ✅
  • Budget'ı aştı mı? hayır → continue
  • Rate limit ok? evet → continue
  • Model whitelist'te mi?
    claude-sonnet-4-6
    ∈ pro plan models → ✅
  • Anthropic'e forward
  • Response geri, telemetry yazıldı, budget güncellendi

Budget Enforcement Davranışı#

LiteLLM 3 enforcement seviyesi sunar:

1. Hard Stop#

Budget aşıldığında istek 403 ile reddedilir.
"soft_budget_cooldown": false # default

2. Soft Cooldown#

Budget'ın %80'inde uyarı, %100'de retry-after.
"soft_budget_cooldown": true

3. Warning Only#

Budget aşılsa bile geçer ama Slack alert tetikler. Enterprise SLA için.

Tenant'a göstermek#

Frontend'de kullanıcıya "Bu ay AI usage'ınızın %X'i tamamlandı" göstermek için:
def get_tenant_usage(tenant_key): resp = requests.get( f"{LITELLM_HOST}/key/info?key={tenant_key}", headers={"Authorization": f"Bearer {MASTER_KEY}"}, ) info = resp.json() return { "spent": info["spend"], "budget": info["max_budget"], "remaining": info["max_budget"] - info["spend"], "pct_used": info["spend"] / info["max_budget"] * 100, }

İleri Pattern'ler#

Per-user budget içinden virtual key#

Bir tenant kendi içinde kullanıcı bazlı budget istiyorsa, child virtual keys:
# Tenant master key var # Her user için child key (limit < tenant'ın) create_key( parent_key_alias="tenant-acme-corp", user_id="user_42", max_budget=10, # bu user aylık $10 max )

Team-based access#

# Bir team (örn: marketing team) için key create_key( team_id="marketing-team", models=["gpt-5-mini", "gemini-2.5-flash"], # cheap models max_budget=500, )

Internal vs external#

İç ekip key'leri higher tpm, external customer key'leri lower tpm. Aynı proxy üstünde.
💡 Production pattern
LiteLLM Proxy'i stateless çalıştır (Postgres'in state'i ile). 3 replica, load balancer arkasında. Redis cache hit'li çağrılarda extra latency olmuyor. SaaS yığını için kritik infrastructure — modül 15'te HA setup.
▶️ Sıradaki ders
4.4 — Chargeback Raporlama. İç ekiplere (engineering, marketing, customer success) ya da kurumsal müşterilere LLM kullanımını fatura halinde raporlamak. CSV export, monthly reports, ve invoice generation pattern'leri.

Frequently Asked Questions

If you run a single instance, yes. In production, minimum 2 replicas + HA Postgres (e.g., AWS RDS Multi-AZ). LiteLLM's cloud-managed version ($99/mo+) is also available but self-host is generally preferred.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

Related Content