Skip to content

TR Embedding FT: BGE-M3, jina-v3, nomic-embed TR Adaptation + MTEB-TR Eval

TR embedding model FT for RAG: BGE-M3 (multilingual, good TR baseline), jina-embeddings-v3, nomic-embed-text. TR-specific query/document pair generation, contrastive learning (InfoNCE), MTEB-TR benchmark. BGE-M3 TR FT 6h on RTX 4090.

Şükrü Yusuf KAYA
28 min read
Advanced
TR Embedding FT: BGE-M3, jina-v3, nomic-embed TR Adaptation + MTEB-TR Eval

1. TR Embedding Baseline Tablo (MTEB-TR 2026)#

ModelSizeTR-MTEB AvgLisans
BGE-M3568M62.1MIT
jina-embeddings-v3570M60.4CC-BY-NC
nomic-embed-text-v2-multilingual137M55.8Apache 2.0
multilingual-e5-large559M58.2MIT
TR-spesifik FT (BGE-M3 base + 50K TR pairs)568M66.8 (+4.7)Apache 2.0
Karar: BGE-M3 baseline. Production'da custom domain için FT etmek %5-8 boost verir.
python
# === BGE-M3 TR Fine-Tuning ===
from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader
 
model = SentenceTransformer("BAAI/bge-m3", device="cuda")
 
# Dataset: (query, positive_doc, negative_doc) triplet
train_examples = []
for query, pos_doc, neg_docs in tr_dataset:
for neg in neg_docs[:7]: # 1 pos + 7 hard negatives
train_examples.append(InputExample(texts=[query, pos_doc, neg]))
 
train_dataloader = DataLoader(train_examples, batch_size=8, shuffle=True)
 
# Loss — MultipleNegativesRankingLoss (InfoNCE variant)
loss = losses.MultipleNegativesRankingLoss(model)
 
model.fit(
train_objectives=[(train_dataloader, loss)],
epochs=3,
warmup_steps=500,
optimizer_params={"lr": 2e-5},
output_path="bge-m3-tr-finetuned",
)
 
# Eval — MTEB-TR
from mteb import MTEB
benchmark = MTEB(tasks=["mteb_tr_*"])
results = benchmark.run(model)
BGE-M3 TR contrastive FT
✅ Teslim
  1. 5K TR query-doc pair üret. 2) BGE-M3'ü FT et. 3) MTEB-TR'da baseline ile karşılaştır. 4) Sonraki ders: 9.8 — TR Reranker FT.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

Related Content

Connected pillar topics

Pillar topics this article maps to