Document VLM FT: DocVQA + ChartQA + TableVQA + Türkçe Fatura/Dilekçe Dataset

Document AI use-case'leri: DocVQA (document Q&A), ChartQA (grafik anlama), TableVQA (tablo extraction). TR-spesifik dataset üretimi: synthetic fatura + dilekçe + sözleşme images, structured field extraction. Qwen 2.5-VL 7B baseline → FT → field accuracy %76 → %94.

Şükrü Yusuf KAYA

30 dakikalık okuma

24.06.2026

İleri

Document VLM FT: DocVQA + ChartQA + TableVQA + Türkçe Fatura/Dilekçe Dataset

1. Document VLM Dataset'leri#

Dataset	Size	Language	Notlar
DocVQA	50K	EN	document Q&A
ChartQA	21K	EN	chart understanding
TableVQA	23K	EN	table extraction
InfographicVQA	30K	EN	infographic Q&A
TR-Doc-Synthetic (cookbook)	10K	TR	senin oluşturduğun

TR synthetic dataset oluşturma:

# Synthetic Türkçe fatura üretimi (Playwright + HTML template)
from playwright.async_api import async_playwright

async def render_invoice(data):
    template = open("invoice_template.html").read()
    html = template.format(**data)

    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.set_content(html)
        await page.screenshot(path=f"invoice_{data['id']}.png", full_page=True)
        await browser.close()

# 10K random fatura: ~30 dakika 4090 (CPU-bound)

✅ Teslim

Playwright ile 100 synthetic TR fatura üret. 2) Qwen 2.5-VL field extraction FT. 3) Field accuracy delta. 4) Sonraki ders: 6.9 — Grounding FT (Bounding Box).

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

İlgili İçerikler

Part 0 — Engineering Foundations

Fine-Tuning Cookbook'a Hoş Geldin: Sistematik, Stage Taksonomisi ve Reproducibility Kontratı

Öğrenmeye Başla

Part 0 — Engineering Foundations

Reproducibility Stack: Seeds, cuDNN Flags ve Deterministic CUDA — 'Sende Niye Çalışıyor Bende Çalışmıyor' Sorununu Bitir

Öğrenmeye Başla

Part 0 — Engineering Foundations

Environment Pinning: uv + pyproject.toml, CUDA Version Matrix ve Container Reçeteleri

Öğrenmeye Başla