Skip to content

Document VLM FT: DocVQA + ChartQA + TableVQA + Turkish Invoice/Petition Dataset

Document AI use-cases: DocVQA, ChartQA, TableVQA. TR-specific dataset generation: synthetic invoice + petition + contract images, structured field extraction. Qwen 2.5-VL 7B baseline → FT → field accuracy 76% → 94%.

Şükrü Yusuf KAYA
30 min read
Advanced
Document VLM FT: DocVQA + ChartQA + TableVQA + Türkçe Fatura/Dilekçe Dataset

1. Document VLM Dataset'leri#

DatasetSizeLanguageNotlar
DocVQA50KENdocument Q&A
ChartQA21KENchart understanding
TableVQA23KENtable extraction
InfographicVQA30KENinfographic Q&A
TR-Doc-Synthetic (cookbook)10KTRsenin oluşturduğun
TR synthetic dataset oluşturma:
# Synthetic Türkçe fatura üretimi (Playwright + HTML template) from playwright.async_api import async_playwright async def render_invoice(data): template = open("invoice_template.html").read() html = template.format(**data) async with async_playwright() as p: browser = await p.chromium.launch() page = await browser.new_page() await page.set_content(html) await page.screenshot(path=f"invoice_{data['id']}.png", full_page=True) await browser.close() # 10K random fatura: ~30 dakika 4090 (CPU-bound)
✅ Teslim
  1. Playwright ile 100 synthetic TR fatura üret. 2) Qwen 2.5-VL field extraction FT. 3) Field accuracy delta. 4) Sonraki ders: 6.9 — Grounding FT (Bounding Box).

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

Related Content