# LLM Evaluation Benchmarks: MMLU, HELM, MT-Bench, LMSys Arena — Anatomy of Quality Measurement

> Source: https://sukruyusufkaya.com/en/learn/llm-muhendisligi/llm-evaluation-mmlu-helm-mt-bench-lmsys-arena
> Updated: 2026-05-13T12:47:42.197Z
> Category: LLM Mühendisliği
> Module: Module 21: LLM Evaluation Frameworks
**TLDR:** LLM evaluation frameworks: MMLU (Hendrycks 2020) general knowledge, HELM (Stanford 2022) comprehensive, MT-Bench (Zheng 2023) chat, LMSys Chatbot Arena (community ELO ranking), GPQA (Rein 2023) graduate-level, HumanEval/MBPP code. Turkish benchmarks (TR-MMLU, MUKAYESE). Benchmark contamination concern, holistic evaluation.

