# Llama 3.2 1B / 3B — Edge & Mobile FT: Tied Embeddings + Distillation + GGUF Q4

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-llama-3.2-1b-3b-edge-mobile-ft
> Updated: 2026-05-14T14:42:51.333Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part III — Small Open Models (1B–8B)
**TLDR:** Llama 3.2 1B/3B — distilled from Llama 3.1 8B. Tied embeddings, edge inference. Full FT possible on RTX 4090 (1B=2GB, 3B=6GB W). 8-15 tok/s on iPhone/Pixel with GGUF Q4_K_M. TR-MMLU numbers and dataset strategies.