Qwen 2.5-VL: Dynamic Resolution + M-RoPE + Turkish OCR FT (Invoice/Petition)

Qwen 2.5-VL (3B/7B/72B) — modern multimodal champion. **Dynamic resolution** (no 224×224 fixed), **M-RoPE** (temporal + height + width), document understanding, video, multilingual. End-to-end Turkish invoice/petition OCR FT: dataset prep, vision tower freeze, LoRA target, accuracy measurement.

Şükrü Yusuf KAYA

38 min read

5/14/2026

Advanced

Qwen 2.5-VL: Dynamic Resolution + M-RoPE + Türkçe OCR FT (Fatura/Dilekçe)

1. Qwen 2.5-VL Mimari Özellikler#

Aspect	Detay
Vision encoder	Qwen native ViT (672M params)
Resolution	Dynamic — herhangi bir resolution kabul eder
Image tokens	Resolution × 1 token / 28×28 patch (örn. 1024×1024 → 1296 patches)
Position encoding	M-RoPE (multi-axis)
Video support	yes (frame sequence)
Languages	TR/EN/ZH + 30+
Long-context	128K

M-RoPE detay: Klasik RoPE 1D. M-RoPE 3D:

Temporal (video frame index)
Height (image y-coordinate)
Width (image x-coordinate)

Bu sayede spatial reasoning daha kuvvetli.

python

# === Türkçe Fatura OCR FT — Qwen 2.5-VL 7B + RTX 4090 ===
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
from trl import SFTTrainer, SFTConfig
import torch
 
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                         bnb_4bit_use_double_quant=True,
                         bnb_4bit_compute_dtype=torch.bfloat16)
 
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-7B-Instruct",
    quantization_config=bnb,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
    min_pixels=256*28*28,
    max_pixels=1280*28*28,                    # ~1280 max patches
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
 
# Vision tower'ı freeze, sadece LLM + projector train
for name, param in model.named_parameters():
    if "visual" in name and "merger" not in name:
        param.requires_grad = False
 
lora = LoraConfig(
    r=32, lora_alpha=64, lora_dropout=0.05,
    target_modules=["q_proj","k_proj","v_proj","o_proj",
                    "gate_proj","up_proj","down_proj"],
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora)
 
# Türkçe fatura dataset — örnek format
# {"image": <PIL>, "extracted_fields": {"vergi_no": "...", "tutar": "...", ...}}
dataset = load_dataset("user/turkish-invoices", split="train")
 
def format_invoice(example):
    fields = example["extracted_fields"]
    answer = f"Vergi No: {fields['vergi_no']}\nTutar: {fields['tutar']} TL\nFatura Tarihi: {fields['tarih']}"
    messages = [
        {"role": "user", "content": [
            {"type": "image", "image": example["image"]},
            {"type": "text", "text": "Bu Türkçe faturadan vergi numarası, tutar ve tarih bilgilerini çıkar."},
        ]},
        {"role": "assistant", "content": answer},
    ]
    return processor.apply_chat_template(messages, tokenize=False)
 
# Train (~8 saat 1000 fatura, RTX 4090)
cfg = SFTConfig(
    output_dir="qwen-2.5-vl-tr-invoice",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    learning_rate=1e-4,
    bf16=True, optim="paged_adamw_8bit",
    max_seq_length=8192,
    logging_steps=5, report_to="wandb",
)
 
# Bench:
# Base Qwen 2.5-VL field extraction accuracy: ~76%
# After FT (1000 fatura): ~94%

Türkçe fatura OCR — Qwen 2.5-VL 7B FT

✅ Teslim

Açık TR fatura/dilekçe dataset bulup (veya synthetic generate) FT et. 2) Field extraction accuracy ölç. 3) Sonraki ders: 6.5 — Pixtral 12B.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

Qwen 2.5-VL: Dynamic Resolution + M-RoPE + Turkish OCR FT (Invoice/Petition)

1. Qwen 2.5-VL Mimari Özellikler#

Yorumlar & Soru-Cevap

Related Content

Welcome to the Fine-Tuning Cookbook: System, Stage Taxonomy, and the Reproducibility Contract

Reproducibility Stack: Seeds, cuDNN Flags, and Deterministic CUDA — End the 'Works on My Machine' Problem

Environment Pinning: uv + pyproject.toml, CUDA Version Matrix, and Container Recipes

Subscribe to Newsletter