# Benchmark Anatomy: From MMLU to LMSys Arena — Science and Art of Measuring LLM Quality

> Source: https://sukruyusufkaya.com/en/learn/llm-muhendisligi/benchmark-anatomi-mmlu-arena
> Updated: 2026-05-13T13:00:32.050Z
> Category: LLM Mühendisliği
> Module: Module 21: LLM Evaluation — Benchmarks and Production Eval
**TLDR:** Mathematical and epistemic anatomy of LLM benchmarks: MMLU (Hendrycks 2020 — 57 tasks), HumanEval (Chen 2021 — code), MT-Bench (Zheng 2023 — chat), LMSys Chatbot Arena (community ELO ranking), GPQA (Rein 2023 — graduate-level reasoning). 'Why isn't a single benchmark enough?' For Turkish: TR-MMLU, MUKAYESE, BoazıçNLP. **Benchmark contamination** problem serious analysis. Holistic evaluation approach.

