İçeriğe geç

Sıfırdan Self-Hosted LLM Observability: ClickHouse + Grafana ile $/Request Dashboard

Üçüncü-parti aracı yerine kendi observability stack'ini kur: ClickHouse + Grafana + LiteLLM Webhook. Adım adım Docker setup, schema tasarımı, dashboard JSON'u ve Slack alert kurulumu — production-grade, sınırsız ölçek, KVKK uyumlu.

Şükrü Yusuf KAYA
25 dakikalık okuma
İleri
Sıfırdan Self-Hosted LLM Observability: ClickHouse + Grafana ile $/Request Dashboard
🛠 Tam kontrol için tam yol
Bu ders en derin teknik kurulum. Sonunda kendi observability stack'in olacak: KVKK uyumlu, sınırsız ölçeklenebilir, üçüncü-parti vendor lock yok. Production'a alacak boy.

Niye self-host?#

Üç sebep:

1. KVKK / Data Residency#

LLM çağrılarının prompt ve response'ları kişisel veri içerebilir. Üçüncü-parti araca (US-based) göndermek KVKK madde 9 ihlali olabilir. Self-host VPC içinde tutarak bu sorunu çözer.

2. Maliyet ölçeği#

Langfuse cloud Team tier: 499/ay2Mobservations.50Mobservationsavardıg˘ında499/ay 2M observations. 50M observations'a vardığında **5K-10K/ay** olur. Self-host: $50/ay sunucu + saat-emek.

3. Customization#

Kendi metric'lerin, kendi dashboard'ların, kendi alert rule'ların. Vendor'ın yapamadığı şeyleri ekleyebilirsin.

Stack Genel Bakış#

[Your App] → [LiteLLM proxy/SDK] → [Callback] ↓ [ClickHouse] ↓ [Grafana] ↓ [Slack alerts]

Komponentler#

  • LiteLLM: API abstraction + automatic cost tracking
  • ClickHouse: column-store DB, time-series için ideal, sınırsız scale
  • Grafana: dashboard + alerting
  • Slack webhook: alert delivery

Docker Compose Setup#

Tek dosyada tüm stack:
yaml
# docker-compose.yml
version: "3.9"
 
services:
clickhouse:
image: clickhouse/clickhouse-server:24.10
container_name: ch
ports:
- "8123:8123" # HTTP
- "9000:9000" # native
volumes:
- clickhouse_data:/var/lib/clickhouse
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
environment:
CLICKHOUSE_USER: telemetry
CLICKHOUSE_PASSWORD: ${CH_PASSWORD}
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
ulimits:
nofile:
soft: 262144
hard: 262144
 
grafana:
image: grafana/grafana-oss:11.3.0
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
- ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
- ./grafana-dashboards.yml:/etc/grafana/provisioning/dashboards/dashboards.yml
- ./dashboards:/var/lib/grafana/dashboards
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
GF_INSTALL_PLUGINS: vertamedia-clickhouse-datasource
 
volumes:
clickhouse_data:
grafana_data:
Tek dosyalık ClickHouse + Grafana stack.

ClickHouse Schema#

init.sql
dosyası:
sql
CREATE DATABASE IF NOT EXISTS llm_telemetry;
 
CREATE TABLE IF NOT EXISTS llm_telemetry.requests (
ts DateTime64(3) DEFAULT now64(3),
request_id String,
trace_id String,
user_id String,
tenant_id String,
feature String,
team String,
model String,
provider String,
 
input_tokens UInt32,
cached_input_tokens UInt32 DEFAULT 0,
cache_creation_tokens UInt32 DEFAULT 0,
output_tokens UInt32,
reasoning_tokens UInt32 DEFAULT 0,
tool_tokens UInt32 DEFAULT 0,
image_tokens UInt32 DEFAULT 0,
 
cost_input_usd Decimal(10, 6),
cost_output_usd Decimal(10, 6),
cost_total_usd Decimal(10, 6),
 
latency_ms UInt32,
ttft_ms UInt32 DEFAULT 0,
status_code UInt16 DEFAULT 200,
error_type String DEFAULT '',
 
cache_hit Bool DEFAULT false,
streamed Bool DEFAULT false,
cancelled Bool DEFAULT false,
 
raw_metadata String -- JSON for extra fields
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(ts)
ORDER BY (ts, model, user_id)
TTL ts + INTERVAL 90 DAY; -- 90 gün veri saklama
 
CREATE INDEX idx_user ON llm_telemetry.requests (user_id) TYPE bloom_filter GRANULARITY 4;
CREATE INDEX idx_feature ON llm_telemetry.requests (feature) TYPE bloom_filter GRANULARITY 4;
Time-series LLM telemetry için ClickHouse şeması. MergeTree engine + partition + TTL.

LiteLLM Callback ile ClickHouse'a Yazmak#

LiteLLM her isteğin sonrası bir callback fonksiyonu çağırıyor. Buraya ClickHouse insert kodumuzu koyacağız.
python
# llm_telemetry.py
import os
import json
import uuid
import time
import litellm
from clickhouse_connect import get_client
from datetime import datetime
 
ch_client = get_client(
host="localhost",
port=8123,
username="telemetry",
password=os.environ["CH_PASSWORD"],
database="llm_telemetry",
)
 
def telemetry_callback(
kwargs,
response,
start_time,
end_time,
):
"""LiteLLM her isteğin sonrası çağırır."""
try:
usage = response.usage if hasattr(response, "usage") else None
if not usage:
return # error / cancelled
 
metadata = kwargs.get("metadata", {}) or {}
litellm_params = kwargs.get("litellm_params", {}) or {}
model = kwargs.get("model", "")
provider = model.split("/")[0] if "/" in model else infer_provider(model)
 
# Maliyet — LiteLLM otomatik hesaplar
cost = response._hidden_params.get("response_cost", 0)
 
# Cache hit
cached = (
getattr(usage, "prompt_tokens_details", None) and
getattr(usage.prompt_tokens_details, "cached_tokens", 0)
) or (
getattr(usage, "cache_read_input_tokens", 0)
) or (
getattr(usage, "cached_content_token_count", 0)
) or 0
 
# Reasoning
reasoning = (
getattr(usage, "completion_tokens_details", None) and
getattr(usage.completion_tokens_details, "reasoning_tokens", 0)
) or (
getattr(usage, "thoughts_token_count", 0)
) or 0
 
latency_ms = int((end_time - start_time) * 1000)
 
ch_client.insert(
"llm_telemetry.requests",
[[
datetime.utcnow(),
metadata.get("trace_id", str(uuid.uuid4())),
metadata.get("trace_id", ""),
metadata.get("user_id", ""),
metadata.get("tenant_id", ""),
metadata.get("feature", ""),
metadata.get("team", ""),
model,
provider,
getattr(usage, "prompt_tokens", 0) or getattr(usage, "input_tokens", 0),
cached,
getattr(usage, "cache_creation_input_tokens", 0) or 0,
getattr(usage, "completion_tokens", 0) or getattr(usage, "output_tokens", 0),
reasoning,
0, 0, # tool, image (extract from metadata gerekiyorsa)
0, 0, cost, # input, output cost ayrımı (varsa)
latency_ms,
0, # ttft (stream-only)
200, "",
cached > 0,
kwargs.get("stream", False),
False,
json.dumps(metadata),
]],
column_names=[
"ts","request_id","trace_id","user_id","tenant_id","feature","team","model","provider",
"input_tokens","cached_input_tokens","cache_creation_tokens","output_tokens","reasoning_tokens",
"tool_tokens","image_tokens","cost_input_usd","cost_output_usd","cost_total_usd",
"latency_ms","ttft_ms","status_code","error_type","cache_hit","streamed","cancelled","raw_metadata"
],
)
except Exception as e:
print(f"telemetry callback failed: {e}")
 
def infer_provider(model: str) -> str:
if model.startswith("gpt") or model.startswith("o3"):
return "openai"
if model.startswith("claude"):
return "anthropic"
if model.startswith("gemini"):
return "google"
return "unknown"
 
# LiteLLM'e callback olarak register et
litellm.success_callback = [telemetry_callback]
LiteLLM success callback'i ile her başarılı LLM çağrısı ClickHouse'a yazılıyor.

Grafana Sorguları — En Kullanışlı 6 Panel#

Panel 1 — Toplam aylık maliyet#

SELECT toStartOfDay(ts) AS time, sum(cost_total_usd) AS daily_cost FROM llm_telemetry.requests WHERE ts >= now() - INTERVAL 30 DAY GROUP BY time ORDER BY time

Panel 2 — Feature başına maliyet#

SELECT feature, sum(cost_total_usd) AS total_cost, count() AS request_count, avg(cost_total_usd) AS avg_cost_per_request FROM llm_telemetry.requests WHERE ts >= now() - INTERVAL 24 HOUR GROUP BY feature ORDER BY total_cost DESC

Panel 3 — Cache hit ratio#

SELECT toStartOfHour(ts) AS time, sumIf(cached_input_tokens, cached_input_tokens > 0) / sum(input_tokens + cached_input_tokens) * 100 AS cache_hit_pct FROM llm_telemetry.requests WHERE ts >= now() - INTERVAL 7 DAY GROUP BY time ORDER BY time

Panel 4 — Top 10 user (maliyet)#

SELECT user_id, sum(cost_total_usd) AS total_cost, count() AS request_count FROM llm_telemetry.requests WHERE ts >= now() - INTERVAL 7 DAY GROUP BY user_id ORDER BY total_cost DESC LIMIT 10

Panel 5 — Latency percentiles#

SELECT toStartOfHour(ts) AS time, quantile(0.5)(latency_ms) AS p50, quantile(0.95)(latency_ms) AS p95, quantile(0.99)(latency_ms) AS p99 FROM llm_telemetry.requests WHERE ts >= now() - INTERVAL 24 HOUR GROUP BY time ORDER BY time

Panel 6 — Model maliyet karşılaştırma#

SELECT model, count() AS requests, sum(cost_total_usd) AS total_cost, avg(cost_total_usd) AS avg_cost, avg(latency_ms) AS avg_latency FROM llm_telemetry.requests WHERE ts >= now() - INTERVAL 7 DAY GROUP BY model ORDER BY total_cost DESC

Grafana Alert'ler#

3 kritik alert kuracağız:

Alert 1 — Anormal maliyet artışı#

Trigger: Saatlik toplam maliyet, son 24 saat ortalamasının 2× üzeri
WITH hourly AS ( SELECT toStartOfHour(ts) AS h, sum(cost_total_usd) AS c FROM llm_telemetry.requests WHERE ts >= now() - INTERVAL 25 HOUR GROUP BY h ) SELECT argMax(c, h) AS current_hour_cost, avg(c) AS avg_24h_cost, argMax(c, h) / avg(c) AS ratio FROM hourly WHERE h < toStartOfHour(now())
Alert when
ratio > 2.0
.

Alert 2 — Cache hit ratio düşüşü#

Trigger: Son 1 saatlik cache hit ratio %30'un altında.

Alert 3 — Error rate %1+#

Trigger: Son 5 dakikada error_type != '' olan istek oranı > 1%.
Alert'leri Slack webhook'una bağla. Modül 15'te tüm alert/incident response stratejisi.

Production İpuçları#

1. ClickHouse'u ayrı sunucuda tut#

Aynı app sunucusunda çalıştırma. ClickHouse memory'i sever; aynı node'da çakışma olur.

2. Batched insert#

Tek-tek insert yavaş. LiteLLM callback'i bir queue'ya yaz, arka plan worker her 10 saniyede toplu insert etsin.
clickhouse-connect
insert performansı bu pattern'le 100× artar.

3. Retention policy#

TTL 90 gün şu an. Çoğu use-case için yeterli. Yıllık trend analizi gerekiyorsa, monthly aggregation tablosu yarat ve TTL artır.

4. Backup#

ClickHouse'u günlük S3'e yedekle (
BACKUP TABLE ... TO S3(...)
). Datayı kaybetmek istemezsin.

5. Authentication#

Grafana ve ClickHouse'u public internete açma. VPN/SSH tunnel ile eriş ya da CloudFlare Access koy.
▶️ Sıradaki ders
3.6 — Enterprise APM Entegrasyonu: Sentry, Datadog ile LLM Cost. Mevcut APM altyapın varsa LLM telemetry'i nasıl ona bağlarsın? Modern APM'lerin LLM-specific feature'ları, custom metric'ler ve cost attribution patterns.

Sık Sorulan Sorular

Tek node ClickHouse rahat 50M+ satır/gün handle eder. 100M+ için ClickHouse cluster (3+ node, ReplicatedMergeTree). Çoğu Türk SaaS için tek node yeterli. Hetzner CX31 (4 vCPU, 8GB RAM, 80GB SSD, €15/ay) — uzun süre dayanır.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

İlgili İçerikler