FAISS vs Pinecone vs Weaviate — which for production?

FAISS: self-hosted, open-source, free. Pinecone: managed cloud, easy, paid. Weaviate: open-source + managed option. FAISS for capstone. Production scale (>10M vectors, multi-tenant): Pinecone or Weaviate managed version.

BAAI/bge-m3 vs OpenAI text-embedding-3 — which better for Turkish?

Quality: comparable (both SOTA). BAAI/bge-m3 multilingual advantage, self-hosted. OpenAI managed + easy API but paid + data privacy concern. Self-host: BAAI/bge-m3. API: OpenAI or Cohere multilingual.

Capstone Module 7: Turkish Semantic Search System — sentence-transformers + FAISS + Mini-RAG

Module 7 capstone project: Turkish semantic search system from scratch. sentence-transformers Turkish model selection, FAISS vector index, production-grade query pipeline, mini-RAG architecture (retriever + generator), benchmark + deployment. Practical application of embedding theory.

Şükrü Yusuf KAYA

75 min read

5/13/2026

Advanced

Capstone Modül 7: Türkçe Semantic Search Sistemi — sentence-transformers + FAISS + Mini-RAG

🏆 Modül 7 Capstone — Embedding teorisinin pratik kanıtı

5 ders boyunca embedding'in matematiğini, tarihini, modern LLM implementasyonunu, geometrisini öğrendik. Şimdi gerçek sistem inşa edeceğiz: Türkçe semantic search engine. sentence-transformers ile Türkçe-tuned model kullanacağız, FAISS ile milyar-vector scale'de indexleme yapacağız, mini-RAG mimari kurup retrieval + generation entegre edeceğiz. 75 dakika sonra: deploy edilebilir bir gerçek dünya artefaktına sahip olacaksın. Bu, müfredatın ikinci capstone'u — embedding'in 'production' kanıtı.

Capstone Akışı (10 Aşama)#

Sistem mimarisi — semantic search vs keyword, RAG flow
Türkçe sentence-transformer modeli — model seçimi (TR-tuned)
Dokuman corpus — Wikipedia subset veya Türkçe blog corpus
Embedding pipeline — batch encoding, dimensionality
FAISS index — IndexFlatIP vs IndexIVFPQ
Query pipeline — query embed → top-k retrieve
Mini-RAG — retrieved docs + LLM generation
Evaluation — recall@k, MRR, NDCG metrikleri
Production deployment — FastAPI service, caching
Improvements — re-ranking, hybrid search, monitoring

1. Sistem Mimarisi#

1.1 Semantic search vs keyword search#

Keyword (BM25): 'Türkiye'nin başkenti' query → docs with these literal words
Semantic: 'ülkenin merkezi' query → understands semantic, retrieves 'Türkiye'nin başkenti' content

Semantic search synonyms, paraphrase, multilingual queries handle eder.

1.2 RAG (Retrieval-Augmented Generation) flow#

[1] User query: "İstanbul'un en yüksek noktası neresidir?"
     ↓
[2] Query embedding (sentence-transformer)
     ↓
[3] FAISS top-k search (vector index)
     ↓
[4] Retrieved documents (top-5 chunks)
     ↓
[5] Prompt construction: query + retrieved docs
     ↓
[6] LLM generation (GPT-4 / Claude / Llama)
     ↓
[7] Final answer (grounded in retrieved facts)

1.3 Embedding aşamasında olan#

Query → 768-dim vector (BERT-base) veya 1024-dim (BERT-large). Docs already pre-indexed.

1.4 Capstone scope#

Bu capstone'da:

Türkçe sentence-transformers model load
10K Türkçe Wikipedia paragrafı index
Query → top-5 retrieve
(Opsiyonel) LLM ile generation
Evaluation
Mini FastAPI service

2. Türkçe Sentence-Transformer Modeli#

2.1 Popüler seçenekler (2026)#

Model	Vocab	d	Türkçe quality
sentence-transformers/distiluse-base-multilingual-cased-v2	30K	512	Good
sentence-transformers/paraphrase-multilingual-mpnet-base-v2	30K	768	Excellent
emrecan/bert-base-turkish-cased-mean-nli-stsb-tr	32K	768	Türkçe-specific
BAAI/bge-m3 (multilingual)	250K	1024	State of art
OpenAI text-embedding-3-small	API	1536	Excellent, paid

2.2 Tavsiye#

Production: BAAI/bge-m3 (open source, SOTA, multilingual). Research/lightweight: paraphrase-multilingual-mpnet-base-v2.

2.3 Install + load#

pip install sentence-transformers faiss-cpu  # veya faiss-gpu

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-m3")
print(f"Embedding dim: {model.get_sentence_embedding_dimension()}")  # 1024

2.4 Encoding#

text = "İstanbul'un en yüksek noktası Çamlıca Tepesi'dir."
emb = model.encode(text)
print(f"Vector shape: {emb.shape}")  # [1024]
print(f"Norm: {emb.norm():.3f}")     # already normalized by SBERT

2.5 Batch encoding#

texts = ["cümle 1", "cümle 2", ...]
embs = model.encode(texts, batch_size=32, show_progress_bar=True)
# embs shape: [N, 1024]

Production: GPU + batch=64-128 ideal.

5. FAISS Vector Index#

5.1 FAISS nedir#

Facebook AI Similarity Search — C++ library, Python binding. Billion-scale vector indexing + search. GPU/CPU support.

5.2 Index types#

Index	Memory	Speed	Accuracy
IndexFlatIP	High	Slow (linear scan)	Exact
IndexFlatL2	High	Slow	Exact
IndexIVFFlat	Medium	Fast (clustering)	Approximate
IndexIVFPQ	Low (PQ compression)	Very fast	Approximate
IndexHNSWFlat	Medium	Very fast	Approximate (HNSW graph)

Capstone için 10K vector — IndexFlatIP yeterli (exact, fast enough).

5.3 Build index#

import faiss
import numpy as np

d = 1024  # embedding dim
index = faiss.IndexFlatIP(d)   # inner product (cosine for normalized)

# Add embeddings
embs_np = embs.astype("float32")
index.add(embs_np)
print(f"Indexed {index.ntotal} vectors")

5.4 Save + load#

faiss.write_index(index, "tr-wiki-faiss.index")
# ...
index = faiss.read_index("tr-wiki-faiss.index")

5.5 Search#

query = "İstanbul'un en yüksek tepesi neresidir?"
q_emb = model.encode(query).astype("float32")
q_emb = q_emb.reshape(1, -1)

k = 5
scores, indices = index.search(q_emb, k)
print(scores)    # [[0.87, 0.85, 0.83, 0.80, 0.78]]
print(indices)   # [[423, 8912, 1023, 456, 6789]]

5.6 Production: IndexIVFPQ for scale#

nlist = 100  # number of clusters
m = 16       # number of subquantizers
bits = 8     # bits per subquantizer

quantizer = faiss.IndexFlatIP(d)
index = faiss.IndexIVFPQ(quantizer, d, nlist, m, bits)
index.train(embs_np)
index.add(embs_np)
index.nprobe = 10   # search depth

Millions of vectors için ideal. Memory %4 (compared to FlatIP), speed 10-100x.

5.7 GPU acceleration#

res = faiss.StandardGpuResources()
index_gpu = faiss.index_cpu_to_gpu(res, 0, index)

GPU 10-50x faster on billion-vector scale.

python

# Türkçe Mini-RAG: tüm pipeline
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
from datasets import load_dataset
import openai
 
# 1. Load model
model = SentenceTransformer("BAAI/bge-m3")
 
# 2. Load Türkçe corpus (Wikipedia subset)
wiki = load_dataset("wikipedia", "20240401.tr", split="train[:1000]")
docs = []
for row in wiki:
    for para in row["text"].split("\n"):
        if len(para) > 100 and len(para) < 500:
            docs.append(para)
print(f"Total docs: {len(docs):,}")
 
# 3. Embed docs
embs = model.encode(docs, batch_size=32, show_progress_bar=True)
embs_np = embs.astype("float32")
 
# 4. FAISS index
d = embs_np.shape[1]
index = faiss.IndexFlatIP(d)
index.add(embs_np)
 
# 5. Query function
def semantic_search(query, k=5):
    q_emb = model.encode([query]).astype("float32")
    scores, indices = index.search(q_emb, k)
    return [(docs[i], scores[0][j]) for j, i in enumerate(indices[0])]
 
# 6. Test
results = semantic_search("Türkiye'nin başkenti neresidir?", k=3)
for doc, score in results:
    print(f"[{score:.3f}] {doc[:200]}...")
 
# 7. Mini-RAG with LLM generation
def rag_generate(query, k=5, llm="gpt-4o"):
    retrieved = semantic_search(query, k)
    context = "\n\n".join([doc for doc, _ in retrieved])
    
    prompt = f"""Aşağıdaki bağlamı kullanarak soruyu Türkçe yanıtla:
 
Bağlam:
{context}
 
Soru: {query}
 
Yanıt:"""
    
    response = openai.chat.completions.create(
        model=llm,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=300,
    )
    return response.choices[0].message.content
 
# 8. End-to-end test
answer = rag_generate("İstanbul'un en yüksek noktası neresidir?")
print(answer)

Mini-RAG end-to-end — Türkçe semantic search + LLM generation

8. Evaluation Metrikleri#

8.1 Recall@k#

İlk k retrieved doc içinde 'doğru' doc var mı?

Recall@5 = |{queries where correct doc in top-5}| / total_queries

8.2 MRR (Mean Reciprocal Rank)#

Doğru doc'un rank'ının inverse'ü.

MRR = mean(1 / rank_of_correct_doc)

Örnek: 100 query, correct doc avg rank 3rd → MRR = 1/3 ≈ 0.33.

8.3 NDCG (Normalized Discounted Cumulative Gain)#

Relevance ratings 0-5 ile. Top-k docs için cumulative gain hesabı, ideal'a normalize.

8.4 Türkçe benchmark#

Benchmark dataset: TR-Quad, Türkçe SQuAD. 100 query, 1000 doc corpus. Capstone için custom small benchmark yarat.

8.5 Empirical (capstone results)#

BAAI/bge-m3 + 1000 doc Türkçe Wikipedia:

Recall@5: ~0.85
MRR: ~0.65
Search latency: 5 ms (CPU)

9. FastAPI Service Deploy#

9.1 Service code#

from fastapi import FastAPI
from pydantic import BaseModel
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

app = FastAPI()

# Globals (load once)
model = SentenceTransformer("BAAI/bge-m3")
index = faiss.read_index("tr-wiki-faiss.index")
with open("docs.txt") as f:
    docs = [line.strip() for line in f]

class Query(BaseModel):
    text: str
    k: int = 5

@app.post("/search")
def search(q: Query):
    q_emb = model.encode([q.text]).astype("float32")
    scores, indices = index.search(q_emb, q.k)
    return {
        "results": [
            {"doc": docs[i], "score": float(scores[0][j])}
            for j, i in enumerate(indices[0])
        ]
    }

9.2 Run#

uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

9.3 Production considerations#

Cold start: model + index load ~5 sec. K8s liveness probe / readiness probe tune et.
Memory: BAAI/bge-m3 ~2 GB RAM. Pod limit 4 GB minimum.
Throughput: CPU 100-500 q/s, GPU 5000+ q/s.
Caching: query hash → cached result (Redis). Aynı query'leri skip et.

🎉 Modül 7 Tamamlandı — Embedding Katmanı

6 ders boyunca: embedding'in matematiksel anatomisi, Word2Vec/GloVe/FastText klasikleri, modern LLM embedding katmanı + tying, embedding geometry + isotropy, capstone Türkçe semantic search. Production-deployable Türkçe semantic search sistemi inşa ettin: sentence-transformers + FAISS + mini-RAG + FastAPI. Bu, müfredatın ikinci gerçek dünya artefaktı. Sıradaki: Modül 8 — Attention Mathematics. Embedding vector'leri attention katmanına nasıl giriyor, scaled dot-product attention, multi-head attention, FlashAttention'a geçeceğiz. Bu, transformer'ın kalbi.

Modül 7 Envanteri (Tamamlandı)#

#	Ders	Süre
7.1	Embedding Nedir — Token ID'den Vektöre	65 dk
7.2	Word2Vec Satır Satır (Mikolov 2013)	70 dk
7.3	GloVe + FastText (Subword Genişletme)	65 dk
7.4	Modern LLM Embedding + Tying	70 dk
7.5	Embedding Geometry + Isotropy + BERTology	70 dk
7.6	Capstone — Türkçe Semantic Search Mini-RAG	75 dk
Toplam	6 ders	415 dk (~7 saat)

Genel Müfredat İlerleme#

8 modül bitti: 58 ders, ~49 saat tamamen üretildi. Kalan: Part II Modül 8-13 + Part III-V.

Frequently Asked Questions

Yes, ideal! Build Turkish semantic search service with your own content corpus (blog, documentation, learn portal articles). 'AI search' feature on sukruyusufkaya.com can be built this way.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...