# Edge Inference: ONNX + Jetson + MediaTek NPU + Qualcomm AI Engine

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-edge-inference-onnx-jetson-npu
> Updated: 2026-05-14T14:43:01.654Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part XV — Serving Engineering
**TLDR:** Edge LLM inference is real in 2026: NVIDIA Jetson Orin, MediaTek NPU (Pixel), Qualcomm AI Engine (Snapdragon 8 Gen 3+), Apple Neural Engine. ONNX format for cross-platform, edge-specific quantization (INT8/INT4/W4A8 mixed), latency budget < 200 ms first-token. SmolLM3 1.7B + Pixel 8 Pro deploy recipe.

