Capstone Module 7: Turkish Semantic Search System — sentence-transformers + FAISS + Mini-RAG
Module 7 capstone project: Turkish semantic search system from scratch. sentence-transformers Turkish model selection, FAISS vector index, production-grade query pipeline, mini-RAG architecture (retriever + generator), benchmark + deployment. Practical application of embedding theory.
Şükrü Yusuf KAYA
75 min read
Advanced🏆 Modül 7 Capstone — Embedding teorisinin pratik kanıtı
5 ders boyunca embedding'in matematiğini, tarihini, modern LLM implementasyonunu, geometrisini öğrendik. Şimdi gerçek sistem inşa edeceğiz: Türkçe semantic search engine. sentence-transformers ile Türkçe-tuned model kullanacağız, FAISS ile milyar-vector scale'de indexleme yapacağız, mini-RAG mimari kurup retrieval + generation entegre edeceğiz. 75 dakika sonra: deploy edilebilir bir gerçek dünya artefaktına sahip olacaksın. Bu, müfredatın ikinci capstone'u — embedding'in 'production' kanıtı.
Capstone Akışı (10 Aşama)#
- Sistem mimarisi — semantic search vs keyword, RAG flow
- Türkçe sentence-transformer modeli — model seçimi (TR-tuned)
- Dokuman corpus — Wikipedia subset veya Türkçe blog corpus
- Embedding pipeline — batch encoding, dimensionality
- FAISS index — IndexFlatIP vs IndexIVFPQ
- Query pipeline — query embed → top-k retrieve
- Mini-RAG — retrieved docs + LLM generation
- Evaluation — recall@k, MRR, NDCG metrikleri
- Production deployment — FastAPI service, caching
- Improvements — re-ranking, hybrid search, monitoring
1. Sistem Mimarisi#
1.1 Semantic search vs keyword search#
- Keyword (BM25): 'Türkiye'nin başkenti' query → docs with these literal words
- Semantic: 'ülkenin merkezi' query → understands semantic, retrieves 'Türkiye'nin başkenti' content
Semantic search synonyms, paraphrase, multilingual queries handle eder.
1.2 RAG (Retrieval-Augmented Generation) flow#
[1] User query: "İstanbul'un en yüksek noktası neresidir?" ↓ [2] Query embedding (sentence-transformer) ↓ [3] FAISS top-k search (vector index) ↓ [4] Retrieved documents (top-5 chunks) ↓ [5] Prompt construction: query + retrieved docs ↓ [6] LLM generation (GPT-4 / Claude / Llama) ↓ [7] Final answer (grounded in retrieved facts)
1.3 Embedding aşamasında olan#
Query → 768-dim vector (BERT-base) veya 1024-dim (BERT-large). Docs already pre-indexed.
1.4 Capstone scope#
Bu capstone'da:
- Türkçe sentence-transformers model load
- 10K Türkçe Wikipedia paragrafı index
- Query → top-5 retrieve
- (Opsiyonel) LLM ile generation
- Evaluation
- Mini FastAPI service
2. Türkçe Sentence-Transformer Modeli#
2.1 Popüler seçenekler (2026)#
| Model | Vocab | d | Türkçe quality |
|---|---|---|---|
| sentence-transformers/distiluse-base-multilingual-cased-v2 | 30K | 512 | Good |
| sentence-transformers/paraphrase-multilingual-mpnet-base-v2 | 30K | 768 | Excellent |
| emrecan/bert-base-turkish-cased-mean-nli-stsb-tr | 32K | 768 | Türkçe-specific |
| BAAI/bge-m3 (multilingual) | 250K | 1024 | State of art |
| OpenAI text-embedding-3-small | API | 1536 | Excellent, paid |
2.2 Tavsiye#
Production: BAAI/bge-m3 (open source, SOTA, multilingual).
Research/lightweight: paraphrase-multilingual-mpnet-base-v2.
2.3 Install + load#
pip install sentence-transformers faiss-cpu # veya faiss-gpu
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-m3") print(f"Embedding dim: {model.get_sentence_embedding_dimension()}") # 1024
2.4 Encoding#
text = "İstanbul'un en yüksek noktası Çamlıca Tepesi'dir." emb = model.encode(text) print(f"Vector shape: {emb.shape}") # [1024] print(f"Norm: {emb.norm():.3f}") # already normalized by SBERT
2.5 Batch encoding#
texts = ["cümle 1", "cümle 2", ...] embs = model.encode(texts, batch_size=32, show_progress_bar=True) # embs shape: [N, 1024]
Production: GPU + batch=64-128 ideal.
5. FAISS Vector Index#
5.1 FAISS nedir#
Facebook AI Similarity Search — C++ library, Python binding. Billion-scale vector indexing + search. GPU/CPU support.
5.2 Index types#
| Index | Memory | Speed | Accuracy |
|---|---|---|---|
| IndexFlatIP | High | Slow (linear scan) | Exact |
| IndexFlatL2 | High | Slow | Exact |
| IndexIVFFlat | Medium | Fast (clustering) | Approximate |
| IndexIVFPQ | Low (PQ compression) | Very fast | Approximate |
| IndexHNSWFlat | Medium | Very fast | Approximate (HNSW graph) |
Capstone için 10K vector — IndexFlatIP yeterli (exact, fast enough).
5.3 Build index#
import faiss import numpy as np d = 1024 # embedding dim index = faiss.IndexFlatIP(d) # inner product (cosine for normalized) # Add embeddings embs_np = embs.astype("float32") index.add(embs_np) print(f"Indexed {index.ntotal} vectors")
5.4 Save + load#
faiss.write_index(index, "tr-wiki-faiss.index") # ... index = faiss.read_index("tr-wiki-faiss.index")
5.5 Search#
query = "İstanbul'un en yüksek tepesi neresidir?" q_emb = model.encode(query).astype("float32") q_emb = q_emb.reshape(1, -1) k = 5 scores, indices = index.search(q_emb, k) print(scores) # [[0.87, 0.85, 0.83, 0.80, 0.78]] print(indices) # [[423, 8912, 1023, 456, 6789]]
5.6 Production: IndexIVFPQ for scale#
nlist = 100 # number of clusters m = 16 # number of subquantizers bits = 8 # bits per subquantizer quantizer = faiss.IndexFlatIP(d) index = faiss.IndexIVFPQ(quantizer, d, nlist, m, bits) index.train(embs_np) index.add(embs_np) index.nprobe = 10 # search depth
Millions of vectors için ideal. Memory %4 (compared to FlatIP), speed 10-100x.
5.7 GPU acceleration#
res = faiss.StandardGpuResources() index_gpu = faiss.index_cpu_to_gpu(res, 0, index)
GPU 10-50x faster on billion-vector scale.
python
# Türkçe Mini-RAG: tüm pipelinefrom sentence_transformers import SentenceTransformerimport faissimport numpy as npfrom datasets import load_datasetimport openai # 1. Load modelmodel = SentenceTransformer("BAAI/bge-m3") # 2. Load Türkçe corpus (Wikipedia subset)wiki = load_dataset("wikipedia", "20240401.tr", split="train[:1000]")docs = []for row in wiki: for para in row["text"].split("\n"): if len(para) > 100 and len(para) < 500: docs.append(para)print(f"Total docs: {len(docs):,}") # 3. Embed docsembs = model.encode(docs, batch_size=32, show_progress_bar=True)embs_np = embs.astype("float32") # 4. FAISS indexd = embs_np.shape[1]index = faiss.IndexFlatIP(d)index.add(embs_np) # 5. Query functiondef semantic_search(query, k=5): q_emb = model.encode([query]).astype("float32") scores, indices = index.search(q_emb, k) return [(docs[i], scores[0][j]) for j, i in enumerate(indices[0])] # 6. Testresults = semantic_search("Türkiye'nin başkenti neresidir?", k=3)for doc, score in results: print(f"[{score:.3f}] {doc[:200]}...") # 7. Mini-RAG with LLM generationdef rag_generate(query, k=5, llm="gpt-4o"): retrieved = semantic_search(query, k) context = "\n\n".join([doc for doc, _ in retrieved]) prompt = f"""Aşağıdaki bağlamı kullanarak soruyu Türkçe yanıtla: Bağlam:{context} Soru: {query} Yanıt:""" response = openai.chat.completions.create( model=llm, messages=[{"role": "user", "content": prompt}], max_tokens=300, ) return response.choices[0].message.content # 8. End-to-end testanswer = rag_generate("İstanbul'un en yüksek noktası neresidir?")print(answer)Mini-RAG end-to-end — Türkçe semantic search + LLM generation
8. Evaluation Metrikleri#
8.1 Recall@k#
İlk k retrieved doc içinde 'doğru' doc var mı?
Recall@5 = |{queries where correct doc in top-5}| / total_queries
8.2 MRR (Mean Reciprocal Rank)#
Doğru doc'un rank'ının inverse'ü.
MRR = mean(1 / rank_of_correct_doc)
Örnek: 100 query, correct doc avg rank 3rd → MRR = 1/3 ≈ 0.33.
8.3 NDCG (Normalized Discounted Cumulative Gain)#
Relevance ratings 0-5 ile. Top-k docs için cumulative gain hesabı, ideal'a normalize.
8.4 Türkçe benchmark#
Benchmark dataset: TR-Quad, Türkçe SQuAD. 100 query, 1000 doc corpus. Capstone için custom small benchmark yarat.
8.5 Empirical (capstone results)#
BAAI/bge-m3 + 1000 doc Türkçe Wikipedia:
- Recall@5: ~0.85
- MRR: ~0.65
- Search latency: 5 ms (CPU)
9. FastAPI Service Deploy#
9.1 Service code#
from fastapi import FastAPI from pydantic import BaseModel from sentence_transformers import SentenceTransformer import faiss import numpy as np app = FastAPI() # Globals (load once) model = SentenceTransformer("BAAI/bge-m3") index = faiss.read_index("tr-wiki-faiss.index") with open("docs.txt") as f: docs = [line.strip() for line in f] class Query(BaseModel): text: str k: int = 5 @app.post("/search") def search(q: Query): q_emb = model.encode([q.text]).astype("float32") scores, indices = index.search(q_emb, q.k) return { "results": [ {"doc": docs[i], "score": float(scores[0][j])} for j, i in enumerate(indices[0]) ] }
9.2 Run#
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
9.3 Production considerations#
- Cold start: model + index load ~5 sec. K8s liveness probe / readiness probe tune et.
- Memory: BAAI/bge-m3 ~2 GB RAM. Pod limit 4 GB minimum.
- Throughput: CPU 100-500 q/s, GPU 5000+ q/s.
- Caching: query hash → cached result (Redis). Aynı query'leri skip et.
🎉 Modül 7 Tamamlandı — Embedding Katmanı
6 ders boyunca: embedding'in matematiksel anatomisi, Word2Vec/GloVe/FastText klasikleri, modern LLM embedding katmanı + tying, embedding geometry + isotropy, capstone Türkçe semantic search. Production-deployable Türkçe semantic search sistemi inşa ettin: sentence-transformers + FAISS + mini-RAG + FastAPI. Bu, müfredatın ikinci gerçek dünya artefaktı. Sıradaki: Modül 8 — Attention Mathematics. Embedding vector'leri attention katmanına nasıl giriyor, scaled dot-product attention, multi-head attention, FlashAttention'a geçeceğiz. Bu, transformer'ın kalbi.
Modül 7 Envanteri (Tamamlandı)#
| # | Ders | Süre |
|---|---|---|
| 7.1 | Embedding Nedir — Token ID'den Vektöre | 65 dk |
| 7.2 | Word2Vec Satır Satır (Mikolov 2013) | 70 dk |
| 7.3 | GloVe + FastText (Subword Genişletme) | 65 dk |
| 7.4 | Modern LLM Embedding + Tying | 70 dk |
| 7.5 | Embedding Geometry + Isotropy + BERTology | 70 dk |
| 7.6 | Capstone — Türkçe Semantic Search Mini-RAG | 75 dk |
| Toplam | 6 ders | 415 dk (~7 saat) |
Genel Müfredat İlerleme#
8 modül bitti: 58 ders, ~49 saat tamamen üretildi.
Kalan: Part II Modül 8-13 + Part III-V.
Frequently Asked Questions
Yes, ideal! Build Turkish semantic search service with your own content corpus (blog, documentation, learn portal articles). 'AI search' feature on sukruyusufkaya.com can be built this way.
Yorumlar & Soru-Cevap
(0)Yorum yazmak için giriş yap.
Yorumlar yükleniyor...
Related Content
Module 0: Course Framework & Workshop Setup
Who Is an LLM Engineer? The AI Engineering Career Ladder from Junior to Staff
Start LearningModule 0: Course Framework & Workshop Setup
Course Philosophy: Why This Path, Why This Order — The Skeleton of an 8-Month Curriculum
Start LearningModule 0: Course Framework & Workshop Setup