Technical GlossaryGenerative AI and LLM
Post-Training Quantization
A quantization approach that reduces a pretrained model to lower-bit precision to gain memory and speed benefits.
Post-training quantization is one of the most practical ways to make a model more efficient without retraining it. It reduces memory usage and can increase speed on some hardware during inference. However, lower precision may lead to quality loss on certain tasks, so careful evaluation is required.
You Might Also Like
Explore these concepts to continue your artificial intelligence journey.
