Skip to content

SmolLM3 1.7B: Tiny Tier — Production Model Running on 8GB RAM Devices

SmolLM3 (HuggingFace, Mar 2025) — 1.7B params, hybrid GQA, 256K context (YaRN), 100% open (data, training pipeline, weights). Edge target: 8GB RAM phone / RPi 5 / IoT. Full FT on RTX 4090 in 25 min. Q4_K_M GGUF → 1.0 GB.

Şükrü Yusuf KAYA
26 min read
Intermediate
SmolLM3 1.7B: Tiny Tier — 8GB RAM'li Cihazda Çalışan Production Model

1. SmolLM3 Mimari & Açıklık#

  • Layers: 30, hidden: 2048, KV heads: 4
  • Vocab: 49,152 (BPE, multilingual)
  • 11T pre-train token (web + math + code)
  • Hybrid GQA: bazı katmanlar full attention, bazıları local-window (efficiency için)
Açıklık: HuggingFace, SmolLM3 ile TÜM training pipeline'ını açtı:
  • Data sources + cleaning recipes
  • Training code (nanotron + nanoseed)
  • Checkpoints (intermediate dahil)
  • Eval scripts
Bu = "reproducible LLM" anlamında en yakın model. Cookbook'ta SmolLM3 öğretim için ideal — kodu okuyabilirsin.

2. Edge Deploy Senaryoları#

CihazQ4 token/sRAM
Raspberry Pi 5 (8GB)4-61.0 GB
Pixel 8 Pro18-251.0 GB
iPhone 15 Pro28-351.0 GB
MacBook M2 Air (8GB)50-701.0 GB
Use cases:
  • Offline chatbot (havalimanı kiosk, askeri terminal)
  • IoT cihaz local NLP (Türkçe sesle smart home komut)
  • Smart wearable (saat üzerinde simple Q&A)
  • TR çeviri offline backup
python
# === SmolLM3 1.7B Full FT (RTX 4090) ===
# Full FT mümkün çünkü 1.7B × 2 = 3.4 GB W + grad
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
 
model = AutoModelForCausalLM.from_pretrained(
"HuggingFaceTB/SmolLM3-1.7B-Instruct",
torch_dtype="bfloat16",
attn_implementation="flash_attention_2",
device_map="cuda",
)
tok = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-1.7B-Instruct")
 
dataset = load_dataset("malhajar/alpaca-gpt4-tr", split="train").map(...)
 
cfg = SFTConfig(
output_dir="smol-1.7b-tr-fullft",
num_train_epochs=2,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=5e-5, # Full FT için düşük lr
warmup_ratio=0.05, lr_scheduler_type="cosine",
weight_decay=0.01,
bf16=True, optim="paged_adamw_8bit",
gradient_checkpointing=True,
max_seq_length=4096, packing=True,
dataset_text_field="text",
logging_steps=5, save_steps=200, report_to="wandb",
)
 
SFTTrainer(model=model, tokenizer=tok, train_dataset=dataset, args=cfg).train()
# 25 dakikada 1 epoch — 4090'ın tadını çıkarıyoruz
SmolLM3 1.7B Full FT — RTX 4090 25 dakika
✅ Teslim
  1. SmolLM3 1.7B full FT et. 2) Q4_K_M dönüştür. 3) Eğer RPi 5 / Android cihazın varsa deploy et. 4) Sonraki ders: 3.9 — DeepSeek-R1-Distill (Reasoning Distillation).

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

Related Content