Gemma 3 1B / 4B / 12B / 27B: Google'ın 256K Vocab + Multimodal (4B+)

Gemma 3 — Google'ın 2025 açık modelleri. 256K vocab (TR-friendly), 4B+ multimodal (SigLIP vision tower), GeGLU activation, RMSNorm, 128K context, ShieldGemma safety classifier. RTX 4090'da Gemma 3 4B/12B QLoRA. system role yok (user'a prepend), Gemma 3 ToS dikkati.

Şükrü Yusuf KAYA

35 dakikalık okuma

14.05.2026

İleri

Gemma 3 1B / 4B / 12B / 27B: Google'ın 256K Vocab + Multimodal (4B+)

1. Gemma 3 Mimari Özellikler#

Feature	1B	4B	12B	27B
Layers	26	34	48	62
Hidden	1152	2560	3840	5376
KV heads (GQA)	4	4	8	16
Vocab	256,000	256,000	256,000	256,000
Multimodal	❌	✅	✅	✅
Vision encoder	—	SigLIP 400M	SigLIP 400M	SigLIP 400M
Native context	32K	128K	128K	128K
Active params	1.0B	4.3B	12.2B	27.0B

Architecture details:

GeGLU (Gated GELU) instead of SwiGLU
RMSNorm (Llama-like)
256K vocab — Gemma'nın signature: TR token verimi 1.95 (Llama 3.21)
system role yok — system prompt user message'a prepend edilir

Chat template:

<start_of_turn>user
{system_message}\n\n{user_message}<end_of_turn>
<start_of_turn>model
{assistant_message}<end_of_turn>

2. RTX 4090 Memory (Gemma 3 4B/12B QLoRA)#

Model	W (NF4)	A (batch=2, seq=4096)	Total	Sığar?
Gemma 3 1B	0.5 GB	2.0 GB	~6 GB	✅✅ rahat
Gemma 3 4B	2.2 GB	3.5 GB	~9 GB	✅✅ rahat
Gemma 3 12B	6.1 GB	5.5 GB	~15 GB	✅
Gemma 3 27B	13.5 GB	8.0 GB	~25 GB	⚠️ batch=1 zorunlu

Avantaj: 256K vocab → embedding 1.05B params (27B'de). Bu W'nin %4'ü — modern büyük vocab tasarımı.

python

# === Gemma 3 12B Türkçe QLoRA Lab ===
from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
 
model, tok = FastLanguageModel.from_pretrained(
    "unsloth/gemma-3-12b-it-bnb-4bit",
    max_seq_length=4096,
    dtype="bfloat16", load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
    model, r=32, lora_alpha=64, lora_dropout=0.05,
    target_modules=["q_proj","k_proj","v_proj","o_proj",
                    "gate_proj","up_proj","down_proj"],
    use_gradient_checkpointing="unsloth",
)
 
# Gemma chat template: system yok — user'a prepend
def to_chat_gemma(ex):
    user_msg = ex["instruction"]
    if "system" in ex and ex["system"]:
        user_msg = ex["system"] + "\n\n" + user_msg
    messages = [
        {"role": "user", "content": user_msg},
        {"role": "assistant", "content": ex["output"]},  # "model" değil "assistant"
    ]
    return {"text": tok.apply_chat_template(messages, tokenize=False)}
 
dataset = load_dataset("malhajar/alpaca-gpt4-tr", split="train").map(to_chat_gemma, num_proc=8)
 
cfg = SFTConfig(
    output_dir="gemma-3-12b-tr",
    num_train_epochs=1,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    bf16=True, optim="paged_adamw_8bit",
    max_seq_length=4096, packing=True,
    dataset_text_field="text",
    logging_steps=5, report_to="wandb",
)
 
trainer = SFTTrainer(model=model, tokenizer=tok, train_dataset=dataset, args=cfg)
trainer.train()

Gemma 3 12B Türkçe QLoRA Lab

3. Gemma 3 ToS & Lisans Dikkati#

Gemma 3 Gemma Terms of Use ile yayınlandı — Llama Community License benzeri ama bazı ayrımlar:

✅ Commercial use OK (kısıtsız)
✅ Derivative work OK
✅ Distribution OK (with notice + copy of terms)
❌ "Prohibited Use Policy" ile uyumsuz kullanımlar yasak
⚠️ Gemma derivatives Gemma adını kullanma ve ShieldGemma'yı disable etme kısıtlaması yok ama "responsible use" beklenir

Cookbook'un kuralı: production deploy'da Gemma 3 model card'ını oku, prohibited-use guidance'a uy.

✅ Teslim