Skip to content
Technical GlossaryGenerative AI and LLM

INT4 Quantization

An aggressive quantization approach that reduces the model to 4-bit precision for much lower memory cost.

INT4 quantization is especially important for running large models on smaller hardware. It dramatically reduces memory cost, but it also carries a stronger risk of quality loss depending on task sensitivity. For that reason, calibration and careful benchmarking become especially critical at lower bit widths.