# GGUF K-Quants Block Structure: Q2_K → Q8_K + llama-quantize Perplexity Table

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-gguf-k-quants-block-structure
> Updated: 2026-05-14T14:42:57.193Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part X — Quantization Engineering
**TLDR:** GGUF — llama.cpp's native format, common for CPU/edge inference. K-quants block structure (Q2_K → Q8_K), separate struct per bit-width, llama-quantize for conversion, perplexity-vs-size curve. bf16 → Q4_K_M conversion 5 min on RTX 4090, Q4 GGUF 4.6 GB → CPU/Pi/iPhone deploy.

