Skip to content

Qwen3-MoE + Llama-4-MoE Pattern: Generic MoE FT Recipe (8×H100 Baseline)

Qwen3-MoE (30B-A3B, 235B-A22B) and Llama-4-MoE (Behemoth, Maverick, Scout) — 2025's new MoE generation. 'Generic MoE FT pattern' — apply same discipline to any MoE. Common chat template, router-aware LoRA, expert-targeted SFT. 8×H100 baseline recipe.

Şükrü Yusuf KAYA
26 min read
Advanced
Qwen3-MoE + Llama-4-MoE Pattern: Generic MoE FT Reçetesi (8×H100 Baseline)

1. Yeni Jenerasyon MoE'ler (2025-2026)#

ModelTotal paramsActiveExpertsTop-KRTX 4090 Lab?
Mixtral 8×7B46.7B12.9B82QLoRA marjinal (~22 GB)
Mixtral 8×22B141B39B82Cloud only
DeepSeek-V3671B37B256+18Cloud (16×H100)
Qwen3 30B-A3B30B3B1288QLoRA marjinal (~16 GB)
Qwen3 235B-A22B235B22B1288Cloud (8×H100)
Llama-4 Scout109B17B161Cloud (4×H100)
Llama-4 Maverick400B17B1281Cloud (16×H100)
Llama-4 Behemoth (preview)2T288B161Cloud (64×H100+)
Karar: Yeni başlayanlar için Mixtral 8×7B veya Qwen3 30B-A3B — ikisi de 1-2× consumer GPU + cloud-spillover ile yapılabilir.
✅ Teslim
  1. Qwen3 30B-A3B'yi 1×H100 80GB ile mini-FT et. 2) Aux loss balance metrics monitor. 3) Sonraki ders: 5.5 — Sparse Upcycling (Dense → MoE).

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

Related Content