Skip to content

Capstone Module 7: Turkish Semantic Search System — sentence-transformers + FAISS + Mini-RAG

Module 7 capstone project: Turkish semantic search system from scratch. sentence-transformers Turkish model selection, FAISS vector index, production-grade query pipeline, mini-RAG architecture (retriever + generator), benchmark + deployment. Practical application of embedding theory.

Şükrü Yusuf KAYA
75 min read
Advanced
Capstone Modül 7: Türkçe Semantic Search Sistemi — sentence-transformers + FAISS + Mini-RAG
🏆 Modül 7 Capstone — Embedding teorisinin pratik kanıtı
5 ders boyunca embedding'in matematiğini, tarihini, modern LLM implementasyonunu, geometrisini öğrendik. Şimdi gerçek sistem inşa edeceğiz: Türkçe semantic search engine. sentence-transformers ile Türkçe-tuned model kullanacağız, FAISS ile milyar-vector scale'de indexleme yapacağız, mini-RAG mimari kurup retrieval + generation entegre edeceğiz. 75 dakika sonra: deploy edilebilir bir gerçek dünya artefaktına sahip olacaksın. Bu, müfredatın ikinci capstone'u — embedding'in 'production' kanıtı.

Capstone Akışı (10 Aşama)#

  1. Sistem mimarisi — semantic search vs keyword, RAG flow
  2. Türkçe sentence-transformer modeli — model seçimi (TR-tuned)
  3. Dokuman corpus — Wikipedia subset veya Türkçe blog corpus
  4. Embedding pipeline — batch encoding, dimensionality
  5. FAISS index — IndexFlatIP vs IndexIVFPQ
  6. Query pipeline — query embed → top-k retrieve
  7. Mini-RAG — retrieved docs + LLM generation
  8. Evaluation — recall@k, MRR, NDCG metrikleri
  9. Production deployment — FastAPI service, caching
  10. Improvements — re-ranking, hybrid search, monitoring

1. Sistem Mimarisi#

1.1 Semantic search vs keyword search#

  • Keyword (BM25): 'Türkiye'nin başkenti' query → docs with these literal words
  • Semantic: 'ülkenin merkezi' query → understands semantic, retrieves 'Türkiye'nin başkenti' content
Semantic search synonyms, paraphrase, multilingual queries handle eder.

1.2 RAG (Retrieval-Augmented Generation) flow#

[1] User query: "İstanbul'un en yüksek noktası neresidir?" ↓ [2] Query embedding (sentence-transformer) ↓ [3] FAISS top-k search (vector index) ↓ [4] Retrieved documents (top-5 chunks) ↓ [5] Prompt construction: query + retrieved docs ↓ [6] LLM generation (GPT-4 / Claude / Llama) ↓ [7] Final answer (grounded in retrieved facts)

1.3 Embedding aşamasında olan#

Query → 768-dim vector (BERT-base) veya 1024-dim (BERT-large). Docs already pre-indexed.

1.4 Capstone scope#

Bu capstone'da:
  • Türkçe sentence-transformers model load
  • 10K Türkçe Wikipedia paragrafı index
  • Query → top-5 retrieve
  • (Opsiyonel) LLM ile generation
  • Evaluation
  • Mini FastAPI service

2. Türkçe Sentence-Transformer Modeli#

2.1 Popüler seçenekler (2026)#

ModelVocabdTürkçe quality
sentence-transformers/distiluse-base-multilingual-cased-v230K512Good
sentence-transformers/paraphrase-multilingual-mpnet-base-v230K768Excellent
emrecan/bert-base-turkish-cased-mean-nli-stsb-tr32K768Türkçe-specific
BAAI/bge-m3 (multilingual)250K1024State of art
OpenAI text-embedding-3-smallAPI1536Excellent, paid

2.2 Tavsiye#

Production: BAAI/bge-m3 (open source, SOTA, multilingual). Research/lightweight: paraphrase-multilingual-mpnet-base-v2.

2.3 Install + load#

pip install sentence-transformers faiss-cpu # veya faiss-gpu
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-m3") print(f"Embedding dim: {model.get_sentence_embedding_dimension()}") # 1024

2.4 Encoding#

text = "İstanbul'un en yüksek noktası Çamlıca Tepesi'dir." emb = model.encode(text) print(f"Vector shape: {emb.shape}") # [1024] print(f"Norm: {emb.norm():.3f}") # already normalized by SBERT

2.5 Batch encoding#

texts = ["cümle 1", "cümle 2", ...] embs = model.encode(texts, batch_size=32, show_progress_bar=True) # embs shape: [N, 1024]
Production: GPU + batch=64-128 ideal.

5. FAISS Vector Index#

5.1 FAISS nedir#

Facebook AI Similarity Search — C++ library, Python binding. Billion-scale vector indexing + search. GPU/CPU support.

5.2 Index types#

IndexMemorySpeedAccuracy
IndexFlatIPHighSlow (linear scan)Exact
IndexFlatL2HighSlowExact
IndexIVFFlatMediumFast (clustering)Approximate
IndexIVFPQLow (PQ compression)Very fastApproximate
IndexHNSWFlatMediumVery fastApproximate (HNSW graph)
Capstone için 10K vector — IndexFlatIP yeterli (exact, fast enough).

5.3 Build index#

import faiss import numpy as np d = 1024 # embedding dim index = faiss.IndexFlatIP(d) # inner product (cosine for normalized) # Add embeddings embs_np = embs.astype("float32") index.add(embs_np) print(f"Indexed {index.ntotal} vectors")

5.4 Save + load#

faiss.write_index(index, "tr-wiki-faiss.index") # ... index = faiss.read_index("tr-wiki-faiss.index")

5.5 Search#

query = "İstanbul'un en yüksek tepesi neresidir?" q_emb = model.encode(query).astype("float32") q_emb = q_emb.reshape(1, -1) k = 5 scores, indices = index.search(q_emb, k) print(scores) # [[0.87, 0.85, 0.83, 0.80, 0.78]] print(indices) # [[423, 8912, 1023, 456, 6789]]

5.6 Production: IndexIVFPQ for scale#

nlist = 100 # number of clusters m = 16 # number of subquantizers bits = 8 # bits per subquantizer quantizer = faiss.IndexFlatIP(d) index = faiss.IndexIVFPQ(quantizer, d, nlist, m, bits) index.train(embs_np) index.add(embs_np) index.nprobe = 10 # search depth
Millions of vectors için ideal. Memory %4 (compared to FlatIP), speed 10-100x.

5.7 GPU acceleration#

res = faiss.StandardGpuResources() index_gpu = faiss.index_cpu_to_gpu(res, 0, index)
GPU 10-50x faster on billion-vector scale.
python
# Türkçe Mini-RAG: tüm pipeline
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
from datasets import load_dataset
import openai
 
# 1. Load model
model = SentenceTransformer("BAAI/bge-m3")
 
# 2. Load Türkçe corpus (Wikipedia subset)
wiki = load_dataset("wikipedia", "20240401.tr", split="train[:1000]")
docs = []
for row in wiki:
for para in row["text"].split("\n"):
if len(para) > 100 and len(para) < 500:
docs.append(para)
print(f"Total docs: {len(docs):,}")
 
# 3. Embed docs
embs = model.encode(docs, batch_size=32, show_progress_bar=True)
embs_np = embs.astype("float32")
 
# 4. FAISS index
d = embs_np.shape[1]
index = faiss.IndexFlatIP(d)
index.add(embs_np)
 
# 5. Query function
def semantic_search(query, k=5):
q_emb = model.encode([query]).astype("float32")
scores, indices = index.search(q_emb, k)
return [(docs[i], scores[0][j]) for j, i in enumerate(indices[0])]
 
# 6. Test
results = semantic_search("Türkiye'nin başkenti neresidir?", k=3)
for doc, score in results:
print(f"[{score:.3f}] {doc[:200]}...")
 
# 7. Mini-RAG with LLM generation
def rag_generate(query, k=5, llm="gpt-4o"):
retrieved = semantic_search(query, k)
context = "\n\n".join([doc for doc, _ in retrieved])
prompt = f"""Aşağıdaki bağlamı kullanarak soruyu Türkçe yanıtla:
 
Bağlam:
{context}
 
Soru: {query}
 
Yanıt:"""
response = openai.chat.completions.create(
model=llm,
messages=[{"role": "user", "content": prompt}],
max_tokens=300,
)
return response.choices[0].message.content
 
# 8. End-to-end test
answer = rag_generate("İstanbul'un en yüksek noktası neresidir?")
print(answer)
Mini-RAG end-to-end — Türkçe semantic search + LLM generation

8. Evaluation Metrikleri#

8.1 Recall@k#

İlk k retrieved doc içinde 'doğru' doc var mı?
Recall@5 = |{queries where correct doc in top-5}| / total_queries

8.2 MRR (Mean Reciprocal Rank)#

Doğru doc'un rank'ının inverse'ü.
MRR = mean(1 / rank_of_correct_doc)
Örnek: 100 query, correct doc avg rank 3rd → MRR = 1/3 ≈ 0.33.

8.3 NDCG (Normalized Discounted Cumulative Gain)#

Relevance ratings 0-5 ile. Top-k docs için cumulative gain hesabı, ideal'a normalize.

8.4 Türkçe benchmark#

Benchmark dataset: TR-Quad, Türkçe SQuAD. 100 query, 1000 doc corpus. Capstone için custom small benchmark yarat.

8.5 Empirical (capstone results)#

BAAI/bge-m3 + 1000 doc Türkçe Wikipedia:
  • Recall@5: ~0.85
  • MRR: ~0.65
  • Search latency: 5 ms (CPU)

9. FastAPI Service Deploy#

9.1 Service code#

from fastapi import FastAPI from pydantic import BaseModel from sentence_transformers import SentenceTransformer import faiss import numpy as np app = FastAPI() # Globals (load once) model = SentenceTransformer("BAAI/bge-m3") index = faiss.read_index("tr-wiki-faiss.index") with open("docs.txt") as f: docs = [line.strip() for line in f] class Query(BaseModel): text: str k: int = 5 @app.post("/search") def search(q: Query): q_emb = model.encode([q.text]).astype("float32") scores, indices = index.search(q_emb, q.k) return { "results": [ {"doc": docs[i], "score": float(scores[0][j])} for j, i in enumerate(indices[0]) ] }

9.2 Run#

uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

9.3 Production considerations#

  • Cold start: model + index load ~5 sec. K8s liveness probe / readiness probe tune et.
  • Memory: BAAI/bge-m3 ~2 GB RAM. Pod limit 4 GB minimum.
  • Throughput: CPU 100-500 q/s, GPU 5000+ q/s.
  • Caching: query hash → cached result (Redis). Aynı query'leri skip et.
🎉 Modül 7 Tamamlandı — Embedding Katmanı
6 ders boyunca: embedding'in matematiksel anatomisi, Word2Vec/GloVe/FastText klasikleri, modern LLM embedding katmanı + tying, embedding geometry + isotropy, capstone Türkçe semantic search. Production-deployable Türkçe semantic search sistemi inşa ettin: sentence-transformers + FAISS + mini-RAG + FastAPI. Bu, müfredatın ikinci gerçek dünya artefaktı. Sıradaki: Modül 8 — Attention Mathematics. Embedding vector'leri attention katmanına nasıl giriyor, scaled dot-product attention, multi-head attention, FlashAttention'a geçeceğiz. Bu, transformer'ın kalbi.

Modül 7 Envanteri (Tamamlandı)#

#DersSüre
7.1Embedding Nedir — Token ID'den Vektöre65 dk
7.2Word2Vec Satır Satır (Mikolov 2013)70 dk
7.3GloVe + FastText (Subword Genişletme)65 dk
7.4Modern LLM Embedding + Tying70 dk
7.5Embedding Geometry + Isotropy + BERTology70 dk
7.6Capstone — Türkçe Semantic Search Mini-RAG75 dk
Toplam6 ders415 dk (~7 saat)

Genel Müfredat İlerleme#

8 modül bitti: 58 ders, ~49 saat tamamen üretildi. Kalan: Part II Modül 8-13 + Part III-V.

Frequently Asked Questions

Yes, ideal! Build Turkish semantic search service with your own content corpus (blog, documentation, learn portal articles). 'AI search' feature on sukruyusufkaya.com can be built this way.

Yorumlar & Soru-Cevap

(0)
Yorum yazmak için giriş yap.
Yorumlar yükleniyor...

Related Content