İçeriğe geç

Capstone Modül 7: Türkçe Semantic Search Sistemi — sentence-transformers + FAISS + Mini-RAG

Modül 7'nin capstone projesi: Türkçe semantic search sistemi sıfırdan. sentence-transformers Türkçe model seçimi, FAISS vector index, production-grade query pipeline, mini-RAG architecture (retriever + generator), benchmark + deployment. Embedding teorisinin pratik uygulaması.

Şükrü Yusuf KAYA
75 dakikalık okuma
İleri
Capstone Modül 7: Türkçe Semantic Search Sistemi — sentence-transformers + FAISS + Mini-RAG
🏆 Modül 7 Capstone — Embedding teorisinin pratik kanıtı
5 ders boyunca embedding'in matematiğini, tarihini, modern LLM implementasyonunu, geometrisini öğrendik. Şimdi gerçek sistem inşa edeceğiz: Türkçe semantic search engine. sentence-transformers ile Türkçe-tuned model kullanacağız, FAISS ile milyar-vector scale'de indexleme yapacağız, mini-RAG mimari kurup retrieval + generation entegre edeceğiz. 75 dakika sonra: deploy edilebilir bir gerçek dünya artefaktına sahip olacaksın. Bu, müfredatın ikinci capstone'u — embedding'in 'production' kanıtı.

Capstone Akışı (10 Aşama)#

  1. Sistem mimarisi — semantic search vs keyword, RAG flow
  2. Türkçe sentence-transformer modeli — model seçimi (TR-tuned)
  3. Dokuman corpus — Wikipedia subset veya Türkçe blog corpus
  4. Embedding pipeline — batch encoding, dimensionality
  5. FAISS index — IndexFlatIP vs IndexIVFPQ
  6. Query pipeline — query embed → top-k retrieve
  7. Mini-RAG — retrieved docs + LLM generation
  8. Evaluation — recall@k, MRR, NDCG metrikleri
  9. Production deployment — FastAPI service, caching
  10. Improvements — re-ranking, hybrid search, monitoring

1. Sistem Mimarisi#

1.1 Semantic search vs keyword search#

  • Keyword (BM25): 'Türkiye'nin başkenti' query → docs with these literal words
  • Semantic: 'ülkenin merkezi' query → understands semantic, retrieves 'Türkiye'nin başkenti' content
Semantic search synonyms, paraphrase, multilingual queries handle eder.

1.2 RAG (Retrieval-Augmented Generation) flow#

[1] User query: "İstanbul'un en yüksek noktası neresidir?" ↓ [2] Query embedding (sentence-transformer) ↓ [3] FAISS top-k search (vector index) ↓ [4] Retrieved documents (top-5 chunks) ↓ [5] Prompt construction: query + retrieved docs ↓ [6] LLM generation (GPT-4 / Claude / Llama) ↓ [7] Final answer (grounded in retrieved facts)

1.3 Embedding aşamasında olan#

Query → 768-dim vector (BERT-base) veya 1024-dim (BERT-large). Docs already pre-indexed.

1.4 Capstone scope#

Bu capstone'da:
  • Türkçe sentence-transformers model load
  • 10K Türkçe Wikipedia paragrafı index
  • Query → top-5 retrieve
  • (Opsiyonel) LLM ile generation
  • Evaluation
  • Mini FastAPI service

2. Türkçe Sentence-Transformer Modeli#

2.1 Popüler seçenekler (2026)#

ModelVocabdTürkçe quality
sentence-transformers/distiluse-base-multilingual-cased-v230K512Good
sentence-transformers/paraphrase-multilingual-mpnet-base-v230K768Excellent
emrecan/bert-base-turkish-cased-mean-nli-stsb-tr32K768Türkçe-specific
BAAI/bge-m3 (multilingual)250K1024State of art
OpenAI text-embedding-3-smallAPI1536Excellent, paid

2.2 Tavsiye#

Production: BAAI/bge-m3 (open source, SOTA, multilingual). Research/lightweight: paraphrase-multilingual-mpnet-base-v2.

2.3 Install + load#

pip install sentence-transformers faiss-cpu # veya faiss-gpu
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-m3") print(f"Embedding dim: {model.get_sentence_embedding_dimension()}") # 1024

2.4 Encoding#

text = "İstanbul'un en yüksek noktası Çamlıca Tepesi'dir." emb = model.encode(text) print(f"Vector shape: {emb.shape}") # [1024] print(f"Norm: {emb.norm():.3f}") # already normalized by SBERT

2.5 Batch encoding#

texts = ["cümle 1", "cümle 2", ...] embs = model.encode(texts, batch_size=32, show_progress_bar=True) # embs shape: [N, 1024]
Production: GPU + batch=64-128 ideal.

5. FAISS Vector Index#

5.1 FAISS nedir#

Facebook AI Similarity Search — C++ library, Python binding. Billion-scale vector indexing + search. GPU/CPU support.

5.2 Index types#

IndexMemorySpeedAccuracy
IndexFlatIPHighSlow (linear scan)Exact
IndexFlatL2HighSlowExact
IndexIVFFlatMediumFast (clustering)Approximate
IndexIVFPQLow (PQ compression)Very fastApproximate
IndexHNSWFlatMediumVery fastApproximate (HNSW graph)
Capstone için 10K vector — IndexFlatIP yeterli (exact, fast enough).

5.3 Build index#

import faiss import numpy as np d = 1024 # embedding dim index = faiss.IndexFlatIP(d) # inner product (cosine for normalized) # Add embeddings embs_np = embs.astype("float32") index.add(embs_np) print(f"Indexed {index.ntotal} vectors")

5.4 Save + load#

faiss.write_index(index, "tr-wiki-faiss.index") # ... index = faiss.read_index("tr-wiki-faiss.index")

5.5 Search#

query = "İstanbul'un en yüksek tepesi neresidir?" q_emb = model.encode(query).astype("float32") q_emb = q_emb.reshape(1, -1) k = 5 scores, indices = index.search(q_emb, k) print(scores) # [[0.87, 0.85, 0.83, 0.80, 0.78]] print(indices) # [[423, 8912, 1023, 456, 6789]]

5.6 Production: IndexIVFPQ for scale#

nlist = 100 # number of clusters m = 16 # number of subquantizers bits = 8 # bits per subquantizer quantizer = faiss.IndexFlatIP(d) index = faiss.IndexIVFPQ(quantizer, d, nlist, m, bits) index.train(embs_np) index.add(embs_np) index.nprobe = 10 # search depth
Millions of vectors için ideal. Memory %4 (compared to FlatIP), speed 10-100x.

5.7 GPU acceleration#

res = faiss.StandardGpuResources() index_gpu = faiss.index_cpu_to_gpu(res, 0, index)
GPU 10-50x faster on billion-vector scale.
python
# Türkçe Mini-RAG: tüm pipeline
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
from datasets import load_dataset
import openai
 
# 1. Load model
model = SentenceTransformer("BAAI/bge-m3")
 
# 2. Load Türkçe corpus (Wikipedia subset)
wiki = load_dataset("wikipedia", "20240401.tr", split="train[:1000]")
docs = []
for row in wiki:
for para in row["text"].split("\n"):
if len(para) > 100 and len(para) < 500:
docs.append(para)
print(f"Total docs: {len(docs):,}")
 
# 3. Embed docs
embs = model.encode(docs, batch_size=32, show_progress_bar=True)
embs_np = embs.astype("float32")
 
# 4. FAISS index
d = embs_np.shape[1]
index = faiss.IndexFlatIP(d)
index.add(embs_np)
 
# 5. Query function
def semantic_search(query, k=5):
q_emb = model.encode([query]).astype("float32")
scores, indices = index.search(q_emb, k)
return [(docs[i], scores[0][j]) for j, i in enumerate(indices[0])]
 
# 6. Test
results = semantic_search("Türkiye'nin başkenti neresidir?", k=3)
for doc, score in results:
print(f"[{score:.3f}] {doc[:200]}...")
 
# 7. Mini-RAG with LLM generation
def rag_generate(query, k=5, llm="gpt-4o"):
retrieved = semantic_search(query, k)
context = "\n\n".join([doc for doc, _ in retrieved])
prompt = f"""Aşağıdaki bağlamı kullanarak soruyu Türkçe yanıtla:
 
Bağlam:
{context}
 
Soru: {query}
 
Yanıt:"""
response = openai.chat.completions.create(
model=llm,
messages=[{"role": "user", "content": prompt}],
max_tokens=300,
)
return response.choices[0].message.content
 
# 8. End-to-end test
answer = rag_generate("İstanbul'un en yüksek noktası neresidir?")
print(answer)
Mini-RAG end-to-end — Türkçe semantic search + LLM generation

8. Evaluation Metrikleri#

8.1 Recall@k#

İlk k retrieved doc içinde 'doğru' doc var mı?
Recall@5 = |{queries where correct doc in top-5}| / total_queries

8.2 MRR (Mean Reciprocal Rank)#

Doğru doc'un rank'ının inverse'ü.
MRR = mean(1 / rank_of_correct_doc)
Örnek: 100 query, correct doc avg rank 3rd → MRR = 1/3 ≈ 0.33.

8.3 NDCG (Normalized Discounted Cumulative Gain)#

Relevance ratings 0-5 ile. Top-k docs için cumulative gain hesabı, ideal'a normalize.

8.4 Türkçe benchmark#

Benchmark dataset: TR-Quad, Türkçe SQuAD. 100 query, 1000 doc corpus. Capstone için custom small benchmark yarat.

8.5 Empirical (capstone results)#

BAAI/bge-m3 + 1000 doc Türkçe Wikipedia:
  • Recall@5: ~0.85
  • MRR: ~0.65
  • Search latency: 5 ms (CPU)

9. FastAPI Service Deploy#

9.1 Service code#

from fastapi import FastAPI from pydantic import BaseModel from sentence_transformers import SentenceTransformer import faiss import numpy as np app = FastAPI() # Globals (load once) model = SentenceTransformer("BAAI/bge-m3") index = faiss.read_index("tr-wiki-faiss.index") with open("docs.txt") as f: docs = [line.strip() for line in f] class Query(BaseModel): text: str k: int = 5 @app.post("/search") def search(q: Query): q_emb = model.encode([q.text]).astype("float32") scores, indices = index.search(q_emb, q.k) return { "results": [ {"doc": docs[i], "score": float(scores[0][j])} for j, i in enumerate(indices[0]) ] }

9.2 Run#

uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

9.3 Production considerations#

  • Cold start: model + index load ~5 sec. K8s liveness probe / readiness probe tune et.
  • Memory: BAAI/bge-m3 ~2 GB RAM. Pod limit 4 GB minimum.
  • Throughput: CPU 100-500 q/s, GPU 5000+ q/s.
  • Caching: query hash → cached result (Redis). Aynı query'leri skip et.
🎉 Modül 7 Tamamlandı — Embedding Katmanı
6 ders boyunca: embedding'in matematiksel anatomisi, Word2Vec/GloVe/FastText klasikleri, modern LLM embedding katmanı + tying, embedding geometry + isotropy, capstone Türkçe semantic search. Production-deployable Türkçe semantic search sistemi inşa ettin: sentence-transformers + FAISS + mini-RAG + FastAPI. Bu, müfredatın ikinci gerçek dünya artefaktı. Sıradaki: Modül 8 — Attention Mathematics. Embedding vector'leri attention katmanına nasıl giriyor, scaled dot-product attention, multi-head attention, FlashAttention'a geçeceğiz. Bu, transformer'ın kalbi.

Modül 7 Envanteri (Tamamlandı)#

#DersSüre
7.1Embedding Nedir — Token ID'den Vektöre65 dk
7.2Word2Vec Satır Satır (Mikolov 2013)70 dk
7.3GloVe + FastText (Subword Genişletme)65 dk
7.4Modern LLM Embedding + Tying70 dk
7.5Embedding Geometry + Isotropy + BERTology70 dk
7.6Capstone — Türkçe Semantic Search Mini-RAG75 dk
Toplam6 ders415 dk (~7 saat)

Genel Müfredat İlerleme#

8 modül bitti: 58 ders, ~49 saat tamamen üretildi. Kalan: Part II Modül 8-13 + Part III-V.

Sık Sorulan Sorular

Evet, ideal! Kendi içerik corpus'unla (blog, dokümantasyon, learn portal makaleler) Türkçe semantic search service kur. sukruyusufkaya.com'da 'AI ile arama' özelliği bu yöntemle kurulabilir.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

İlgili İçerikler