# TensorRT-LLM: NVIDIA Native Engine — INT8 SmoothQuant + FP8 + In-Flight Batching

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-tensorrt-llm-engine-build
> Updated: 2026-05-14T14:43:01.221Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XV — Serving Engineering
**TLDR:** TensorRT-LLM — NVIDIA's LLM-specific TensorRT engine. CUDA kernels Hopper/Ada native, fastest inference (+15-30% throughput vs vLLM). Engine build process, INT8 SmoothQuant, FP8 quantization, multi-LoRA. Llama 3.1 8B TRT-LLM engine build (1h) + inference on RTX 4090.

