# Code Eval: HumanEval + MBPP + BigCodeBench + LiveCodeBench + SWE-Bench-Lite

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-code-eval-humaneval-mbpp-swe-bench
> Updated: 2026-05-14T14:42:55.840Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part VIII — Code Models & Repo-Level FT
**TLDR:** Code LLM standard benchmark suite: HumanEval (164 Python), MBPP (974 Python), BigCodeBench (1140 calls, 139 libs), LiveCodeBench (data-leak resistant), SWE-Bench-Lite (300 real GitHub issues). Pass@1 vs pass@10, code execution sandbox. Running bench on RTX 4090.

