How many user votes needed for meaningful TR-LLMArena leaderboard?

Capstone Module 21: TR-LLMArena — Turkish LMSys-Style Community Leaderboard

Module 21 capstone: Turkish LMSys-style community-driven leaderboard. Double-blind A/B vote system, ELO ranking, monthly leaderboard. HuggingFace Spaces deploy, GPT-4o/Claude/Llama-3 vs Turkish models (Modules 14-20 capstones). Concrete scientific contribution to Turkish AI ecosystem. Curriculum's 12th production artifact.

Şükrü Yusuf KAYA

85 min read

6/25/2026

Advanced

Capstone Modül 21: TR-LLMArena — Türkçe LMSys-tarzı Community Leaderboard

🏆 Capstone — 12. Production Artefakt: TR-LLMArena

Modül 21'in 2 dersinde LLM benchmark'larının anatomi'sini ve production evaluation framework kurmayı öğrendik. Şimdi bunları Türkçe AI ekosistemine kalıcı katkı'ya dönüştürüyoruz: TR-LLMArena.

Hedef: lmarena.ai gibi bir Türkçe versiyon. Kullanıcılar Türkçe soru sorar, iki anonim model cevap verir, kullanıcı 'hangisi daha iyi' diye oy verir. ELO ranking ile aylık leaderboard.

Test edilecek modeller (başlangıç):

GPT-4o (OpenAI)
Claude 3.5 Sonnet (Anthropic)
Gemini 1.5 Pro (Google)
Llama-3.1-405B (Meta)
Mistral Large (Mistral AI)
DeepSeek-V3 (DeepSeek)
Modül 14.3 capstone: Türkçe SFT Llama
Modül 15.6 capstone: Türkçe DPO Llama
Modül 17.5 capstone: Türkçe Reasoning R1-Distill
Modül 18.4 capstone: Türkçe Mixtral DPO

Tech stack:

HuggingFace Spaces (Gradio) — frontend + hosting
FastAPI backend (vote logic)
SQLite (ELO state, vote history)
KVKK uyumlu (anonim oy, no user data retention)

Maliyet: ~$200/ay (model API calls + HF Spaces premium).

Müfredatın 12. production artefaktı: arena.sukruyusufkaya.com. Aynı zamanda Türkçe AI community için ortak kaynak. 85 dakikada Türkçe AI ekosistemine somut bilim katkısı yapıyorsun.

Capstone Akışı (10 Aşama)#

Sistem mimarisi — arena flow
ELO ranking matematik — chess-style
Çift-anonim vote sistemi
Gradio UI — kullanıcı sohbeti + vote
Backend FastAPI — vote logic + ELO update
SQLite state management
HuggingFace Spaces deploy
Model ekleme — 10 model bağla
Monthly leaderboard publication
KVKK + community management

🎉 Modül 21 Tamamlandı — LLM Evaluation'ın Tam Anatomi

Modül 21 final (3 ders, 250 dakika):

21.1: Benchmark Anatomi — MMLU/HumanEval/Arena/GPQA + Türkçe TR-MMLU/MUKAYESE
21.2: Production Eval Framework — kendi test set'in + LLM-as-judge + A/B testing
21.3 Capstone: TR-LLMArena — Türkçe community leaderboard, 12. production artefakt

Müfredatın 12. production artefaktı:

arena.sukruyusufkaya.com

. Türkçe AI ekosistemine bilim katkısı.

Önceki: 1 ders / 70 dk → Şimdi: 3 ders / 250 dk. 3.6× genişleme, uzman kalitesi.

Modül 21 Envanteri (Yeniden Yazıldı)#

#	Ders	Süre
21.1	Benchmark Anatomi: MMLU → Arena	80 dk
21.2	Production Eval Framework	85 dk
21.3	Capstone TR-LLMArena	85 dk
Toplam	3 ders	250 dk (~4.2 saat)

Önceki: 1 ders / 70 dk → Şimdi: 3 ders / 250 dk.

Frequently Asked Questions

**Minimum reliable**: 1,000 votes/model. For 10 models, **10,000 votes** total. **Good convergence**: 5,000-10,000 votes/model. **LMSys reference**: 2M+ votes by 2024. Top models 100K+ votes. **Realistic Turkish target**: - Month 1: 500-1,000 votes (early adopters) - Month 3: 5,000-10,000 votes - Month 6: 30,000-50,000 votes (meaningful leaderboard) **Strategy**: Announcement to AI Türkiye, Turkish NLP communities. Universities (Boğaziçi, METU, Bilkent NLP groups). Twitter/X announcement. Press release. **Realistic outcome**: meaningful in 6-12 months, industry-recognized in 1-2 years.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

Capstone Module 21: TR-LLMArena — Turkish LMSys-Style Community Leaderboard

Capstone Akışı (10 Aşama)#

Modül 21 Envanteri (Yeniden Yazıldı)#

Frequently Asked Questions

How many user votes needed for meaningful TR-LLMArena leaderboard?

Yorumlar & Soru-Cevap

Related Content

Who Is an LLM Engineer? The AI Engineering Career Ladder from Junior to Staff

Course Philosophy: Why This Path, Why This Order — The Skeleton of an 8-Month Curriculum

Workshop Setup: uv, PyTorch 2.5+, CUDA, WSL2, Mac MPS, Triton, FlashAttention, Nsight

Subscribe to Newsletter