Skip to content

Vector Database Comparison: pgvector, Qdrant, Milvus, Weaviate (2026)

Which vector database is right for RAG and agents? I compare pgvector, Qdrant, Milvus, and Weaviate on performance, scale, hybrid search, and data sovereignty.

SYK
Şükrü Yusuf KAYA
AI Expert · Enterprise AI Consultant

TL;DR — Choosing a vector database is not the "which benchmark is fastest" race people imagine. Here is what I see in the field: the right decision is made along four axes — managed versus self-hosted, which scale tier you are in, how much hybrid search you actually need, and your team's existing data-platform commitments. The headline milliseconds are the least important part. My practical recipe for most Turkish organizations is clear: if you already run Postgres and you are under ~50M vectors, start with pgvector; if raw speed truly matters, move to the Rust-based Qdrant; if hybrid search (vector + BM25 in one query) matters, use Weaviate; if you have billions of vectors and a real ops team, use Milvus. On the KVKK front the choice gets even sharper: with managed cloud services like Pinecone, your data usually leaves the country; if data sovereignty matters, self-hosted open-source options (pgvector, Qdrant, Milvus, Weaviate) are the only ones left on the table. In this article I walk through all eight databases, the index concepts behind them, and a concrete decision framework, with examples from the field.

First, Let's Be Clear: What Does a Vector Database Actually Do?

Most teams entering RAG (Retrieval-Augmented Generation) and agent projects treat the vector database as "a box we throw embeddings into and search." That picture is not wrong, but it is incomplete. Let me put the essence plainly.

When you feed a chunk of text into an embedding model, you get back an array of numbers — a vector of 384, 768, 1024, or 1536 dimensions. This vector represents the "meaning" of that text as a point in a high-dimensional space. Two semantically similar texts become two nearby points in that space. The single job of a vector database is this: from among millions, even billions, of such points, quickly find the ones nearest to your query vector.

That sounds simple, but there is a trap. If you have 10 million vectors and compute the distance to every one of them on each query (we call this "brute force" or exact search), that is correct but horrifyingly slow. This is exactly where what separates a vector database from an ordinary database comes into play: Approximate Nearest Neighbor (ANN) indexes. These indexes trade a tiny bit of perfect accuracy to make search hundreds or thousands of times faster.

The name for that trade-off is recall. Recall measures how many of the true nearest neighbors you actually captured. 95% recall means you found 95 of the truly closest 100 neighbors. All the engineering in the vector database world is really about the tension between three things: recall (quality), latency (delay), and cost. Improve one and you usually give up something on another. The art is finding the right balance for your scenario. The rest of this article is built on understanding this triangle, because the database you choose is valued by how well it lets you tune that balance.

The Two Main Index Families: HNSW and IVF

At the heart of every vector database lies the index. Nearly every product on the market uses one or both of two fundamental index families. You cannot choose well without understanding them, so let's take them patiently.

HNSW (Hierarchical Navigable Small World). Think of it as a multi-layered road map. The top layer has a few "highway" nodes; as you descend, denser "streets" appear. A search starts at the top, quickly reaches roughly the right region via the highways, then descends into lower layers, walking neighbors one by one to sharpen. HNSW's beauty is delivering very high recall with very low latency. Its downside is a hunger for memory: all those layered connections must live in RAM, so on large datasets the memory cost climbs fast. Qdrant, Weaviate, and the modern versions of pgvector all lean on HNSW.

IVF (Inverted File Index). This is a different philosophy. First it partitions all vectors into clusters — say, 4096 clusters. Each cluster has a center (centroid). When a query arrives, it first finds the few clusters closest to the query, then searches only within those clusters. Instead of scanning everything, it focuses on a small subset. IVF does not consume as much memory as HNSW, so it is more economical at enormous scales (billions of vectors). But its tuning is more delicate: how many clusters you probe (nprobe) directly affects recall. The Milvus and Faiss world is strong in IVF variants.

As a practical summary, here is what I tell people in the field:

"

Up to tens of millions of vectors and if you want low latency, HNSW is almost always the right answer. Once you climb to the billions and memory cost starts drowning you, IVF (and its derivatives) enters the picture. Most Turkish organizations are in the first group — don't build a needlessly complex system on an inflated scale assumption.

Quantization: The Art of Shrinking Memory

The bulk of a vector database's cost is memory, because for fast search the index has to live in RAM. The life-saving technique here is quantization. The idea is simple: represent each vector component with fewer bits so it takes less space.

  • Scalar quantization: Typically reduces 32-bit float values to 8-bit integers (int8). Memory drops to a quarter, and accuracy loss is negligible in most scenarios. For most teams this is the first stop.
  • Product quantization (PQ): Splits the vector into small sub-parts and represents each part with the nearest example in a codebook. The memory saving is far more aggressive (8x, 16x, or more), but accuracy loss and tuning complexity rise. It is tailor-made for enormous scales.
  • Binary quantization: The extreme. It reduces each component to a single bit (positive or negative). Memory drops up to 32x and search accelerates incredibly using Hamming distance. But accuracy loss is serious; it is generally used together with a "rescoring" stage — a coarse filter in binary, then re-ranking the top candidates with the full vector.

Why does it matter? Because embedding dimension is directly a storage cost. A 1536-dimensional OpenAI embedding means roughly 60 GB of raw float space for 10 million documents. You can reduce the same data to 15 GB with int8 scalar quantization, and to under 2 GB with binary. When renting cloud servers in Turkish lira, that difference is what doubles or triples — or halves — your monthly bill. A rule I see in the field: ask about the embedding dimension first. If a 768-dimensional model gives you similar quality instead of 1536, you have halved your storage and latency from the start.

Filtering, Metadata, and Why Most Projects Stumble Here

In the real world, pure vector search is rarely enough. Usually a query like this arrives: "Among documents this user can access, from the year 2025, belonging to the HR department, find the ones closest to this question." The date, department, and access constraints here are metadata filters, and they have to work together with vector similarity.

There is a trap here: do you filter first (pre-filter) or afterward (post-filter)? If you filter first and then search, the efficiency of the ANN index can break, because the index was built for all data and the filter punches holes in it. If you filter afterward, you may not have enough results left once the top-k candidates from the index pass through the filter. Good vector databases solve this elegantly — Qdrant's "filterable HNSW" approach and Weaviate's filtered searches are mature here. On the pgvector side, combining with Postgres's own WHERE conditions is both powerful and, in some versions, something that demands care around recall.

My advice in the field is clear: take your filtering need seriously from day one. Because in a multi-tenant enterprise RAG system, access control is done with metadata filters, and if this is set up wrong, one user can see another department's document. That is no longer a performance issue — it is a KVKK breach issue.

Hybrid Search: Why Isn't Vector Alone Enough?

Everyone working with Turkish content should know one thing: pure vector search is weak on rare, specific terms. Product codes, law article numbers ("Law No. 6698, Article 11"), proper nouns, abbreviations... For these, semantic similarity captures the general meaning but can miss the exact term match.

This is where hybrid search comes in: combining vector similarity (which captures meaning) with a classic keyword search like BM25 (which captures the word). The results of these two worlds are usually blended with a method like Reciprocal Rank Fusion (RRF).

There is a big difference between databases here. Weaviate stands out with its ability to combine vector similarity and BM25 in a single query — it has put hybrid search at the heart of the product. This is a lifesaver especially in multi-criteria, term-heavy enterprise queries. Milvus and Qdrant also offer hybrid support, but Weaviate's maturity in this area is notable. On the pgvector side, hybrid search is possible, but you have to manually combine Postgres's full-text search infrastructure; it is not integrated — though if you already have Postgres, this can turn into an advantage.

Replication, Sharding, and Scalability

A vector database running on a single server is great for a POC (proof of concept), but in production two questions await you: what if the server crashes (high availability), and what if the data does not fit on a single machine (horizontal scale)?

  • Replication: Keeping the same data on more than one node. If one goes down, another takes over. It also distributes read load.
  • Sharding: Splitting the data into logical parts and distributing them across nodes. If you have billions of vectors, they will not fit on a single machine; sharding becomes mandatory.

At this point Milvus differentiates itself. Milvus has a distributed architecture that separates compute and storage, runs on Kubernetes, and takes sharding and replication seriously. It shines at the billions-of-vectors scale and with GPU acceleration. But there is a price: running Milvus correctly in production requires a real ops team. One of the biggest mistakes I see in the field is a team that actually has 5 million vectors standing up a Milvus cluster and getting crushed under it. If your scale doesn't demand it, don't carry Milvus's complexity.

Qdrant and Weaviate also run in distributed mode and are more than enough for most enterprise needs. pgvector, meanwhile, inherits Postgres's replication mechanisms — which means your existing Postgres operational knowledge works directly, a hidden superpower for most teams.

Eight Vector Databases: A Positioning Map

Now let's position the eight main players on the market, one by one and honestly. Each has a "personality," and the right choice depends on how well your scenario overlaps with that personality.

Pinecone — The comfort of the managed leader. Pinecone is a fully managed service. You never worry about servers, indexes, or scaling; you call an API and they handle the rest. The speed-quality balance is mature, the operational burden nearly zero. But there are two prices: your data lives in their cloud (the critical point for KVKK), and the cost can grow more expensive than self-hosting as scale increases.

Qdrant — The Rust-based open-source speed leader. Qdrant stands out with the raw performance that comes from being written in Rust. In benchmarks it holds the lowest latencies. Open source, you can run it on your own servers, with very good filtered-search maturity. If speed truly matters to you and you want to keep control in hand, this is the first place to look.

Weaviate — Hybrid search and GraphQL. Weaviate's distinguishing feature is combining hybrid search (vector + BM25) in a single query and offering a rich GraphQL-based query interface. It is strong in complex, term-heavy enterprise searches.

Milvus — The database of enormous scale. Billions of vectors, GPU acceleration, distributed architecture. It shines at large scale but demands a real ops team.

Chroma — The developer-experience leader. Chroma embraces the "start in five minutes" philosophy. In local development, prototyping, and small projects it is incredibly comfortable. At production scale it is not as ambitious as the others, but for quickly trying an idea it is unmatched.

pgvector — The Postgres-integrated default. If you already use Postgres (and in Turkey most organizations do), pgvector is an extension. You keep vectors right next to your relational data, in the same database, within the same transaction, with the same backup and access control. This integration is a far bigger advantage than you think.

Vertex Vector Search — GCP's solution. If you live in the Google Cloud ecosystem, Vertex's managed vector search service is a natural option. Tightly integrated with GCP services, scalable. The comfort of being managed and the data-sovereignty bargain are similar to Pinecone's.

Vespa — Large-scale hybrid. Vespa is a powerful platform rooted in search engines, offering both very large scale and rich hybrid ranking capabilities. It is complex, but in the toughest, large-scale hybrid scenarios it is a serious option.

Benchmarks: Let's Look at the Numbers, But With the Right Eye

Now let's get to the numbers everyone is curious about. But let me warn you upfront: these figures were measured in a context (the 10-million-vector range, specific hardware, specific tuning) and may not reflect your reality one to one. Still, they give a relative feel.

Databasep50 latencyp99 latency (10M vectors)Standout feature
Qdrant~4 ms (lowest)~12 ms (lowest)Rust-based raw speed
Redis~5 msHighest indexing throughput (~15k–40k vectors/sec)
Milvus~6 ms (with GPU)~18 msGPU acceleration, enormous scale
Weaviate~16 msHybrid search + GraphQL

The lessons to draw from this table are these:

  • Qdrant is the latency champion. It holds the lowest values both at p50 (~4 ms) and p99 (~12 ms). The fruit of a Rust-based architecture.
  • Redis stands out in indexing throughput. Alongside ~5 ms p50 latency, it offers the highest write throughput, able to index ~15k–40k vectors per second. If you update your data frequently and fast, this matters.
  • Milvus drops to ~6 ms p50 with GPU and shows the difference GPU acceleration makes at enormous scale.
  • Weaviate at ~16 ms p99 combines reasonable latency with its hybrid-search capabilities.

But here is the critical caveat: the difference between 4 ms and 18 ms at 10 million vectors is a difference the user cannot perceive in most enterprise RAG applications. Because the real latency lies in embedding generation and in the LLM writing the answer (on the order of seconds). The 14 ms difference in vector search gets lost in the noise of the total pipeline. This is why putting benchmarks at the center of your selection criteria is the most common mistake I see in the field.

What Is the Decision Actually Based On? Four Axes

If it is not the headline numbers, what is the decision based on? The four axes that work in the field are these:

1. Managed or self-hosted? This is usually the first and most decisive question. If you have no ops team, want to start fast, and your data can leave the country — managed (Pinecone, Vertex) offers comfort. If you must keep control, optimize cost, and especially keep data inside the country per KVKK — self-hosted open source (pgvector, Qdrant, Weaviate, Milvus) is the only realistic path.

2. Scale tier. Hundreds of thousands, millions, tens of millions, or billions? This axis naturally filters products. Billions of vectors is the Milvus/Vespa world. Up to tens of millions, pgvector and Qdrant are comfortably enough. At the hundreds-of-thousands level, even Chroma is enough.

3. Hybrid search depth. Are your queries term-heavy? Are you searching for law articles, product codes, proper nouns? Then hybrid search is non-negotiable and Weaviate stands out. If your queries are more conceptual/semantic, pure vector may suffice.

4. The team's existing data-platform commitments. This is the most overlooked but in practice the most decisive axis. If you are a team already leaning on Postgres, pgvector brings you zero new operational burden. If you live on GCP, Vertex is natural. If you already use Redis, Redis's vector capabilities can save you from standing up a separate system. A new database means a new operational discipline, a new backup strategy, a new learning curve. Do not underestimate it.

KVKK and Data Sovereignty: The Real Breaking Point in Turkey

Now we come to the most important Turkey-specific part of this article. In an enterprise RAG project, the vector database holds your embeddings. And embeddings are not innocent numbers — they carry the semantic essence of your documents, customer records, and internal correspondence. With enough embeddings and the right technique, a significant part of the original text can be reconstructed. In other words, your vector database actually holds a derivative of your sensitive data.

For KVKK, this directly raises the question: where does this data reside?

  • Managed cloud (like Pinecone): Your data usually lives in a region abroad, on a third party's infrastructure. Cross-border data transfer is one of KVKK's most sensitive headings and the one that produces the most administrative fines. It requires conditions like explicit consent, an adequacy decision, or the presence of appropriate safeguards. A municipality, a bank, or a healthcare institution moving citizen data abroad this way is often an architecture to be rejected from the start.
  • Self-hosted open source (pgvector, Qdrant, Milvus, Weaviate): The data never leaves the institution's boundary. It sits in a data center in Turkey, on servers under your own control. For data sovereignty, this is the only compliant option for most public-sector and regulated-industry projects.

My clear advice in the field: if you are in a regulated sector (public, finance, healthcare, telecom) or working with citizen/customer data, narrow the selection window from the start to self-hosted open-source options. This does not mean Pinecone is bad — it is a technically excellent product — but it may not fit your compliance context. Make the architecture decision by legal reality, not technical admiration.

On the cost side, too, think in Turkish lira. Managed services bill in dollars/euros, and exchange-rate swings make your budget unpredictable. In a self-hosted open-source solution, your cost is largely the lira price of the server you rent — more predictable and generally cheaper at scale. When you cut memory to a quarter with int8 quantization, that reflects directly on your monthly server bill.

Start Simple, Then Graduate: Why Is pgvector the Right First Step?

There is a mistake I see again and again in the field: teams try to stand up the "most scalable, fastest, coolest" vector database in their very first POC. Then they wrestle with operational complexity for weeks, and the real work — RAG quality — never gets its turn.

The path I recommend is this: start simple.

If you already use Postgres and you are under ~50 million vectors, start with pgvector. Why?

  • Zero new operational burden. You are not standing up a new system, a new backup, a new monitoring. Your Postgres knowledge works directly.
  • Same transaction, same access control. Vectors live next to your relational data. When you delete a document, its embedding goes in the same transaction — consistency for free.
  • KVKK-friendly. Already on your in-house Postgres server, inside the country, under your control.
  • Mature HNSW support. Modern pgvector, with the HNSW index, gives reasonable latencies up to tens of millions of vectors.

When do you graduate? When these signals appear:

"

When your latencies climb to unacceptable levels, when your vector count exceeds ~50 million, or when your filtered/hybrid search needs start pushing beyond pgvector's comfort zone. That is when — and only then — you should consider moving to Qdrant (for speed), Weaviate (for hybrid), or Milvus (for enormous scale).

The beauty of this "simple first, then graduate" approach is this: your embeddings are portable. Vectors are ultimately arrays of numbers; moving them from one database to another is far easier than moving a relational schema. So starting with pgvector does not lock you in; it just lets you focus faster on the real work — RAG quality, proper chunking, good reranking.

Decision Framework: Ask Yourself These Questions in Order

If I want to reduce this whole article to a single flow, here is the decision framework I walk through with clients in the field. Answer them in order, top to bottom:

  1. Can the data leave the country? If not (KVKK/regulated sector), eliminate managed cloud options (Pinecone, Vertex) from the start. Continue with self-hosted open source.

  2. Do you already use Postgres and are you under ~50M vectors? If yes, start with pgvector. Nothing else is needed. Spend your energy on RAG quality.

  3. Are raw speed and low latency truly critical for you? (In a high-traffic, latency-sensitive product.) If yes, look at Qdrant — the Rust-based speed leader.

  4. Are your queries term-heavy, is hybrid search a must? (Law articles, product codes, proper nouns.) If yes, Weaviate stands out — vector + BM25 in a single query.

  5. Do you have billions of vectors and a real ops team? If yes, Milvus (or Vespa) opens the door to enormous scale and GPU acceleration. If not, don't carry that complexity.

  6. Are you just trying a quick prototype? Start with Chroma in five minutes, make the production decision later.

  7. Are you locked into a specific cloud's ecosystem? If you are on GCP, Vertex; if you have existing Redis infrastructure, Redis's vector capabilities can remove your burden of standing up a separate system.

When you answer these seven questions in order, the right database often emerges on its own. And notice: none of the questions is "which is 2 ms faster." Because the right choice is hidden not in headline benchmarks — but in data sovereignty, in your existing platform commitments, in your scale reality, and in the operational burden your team can carry. Choose your vector database with this eye, and let what you choose leave you the time to focus on the real work of RAG — producing quality answers.

Consulting Pathways

Consulting pages closest to this article

For the most logical next step after this article, you can review the most relevant solution, role, and industry landing pages here.

Comments

Comments

Connected pillar topics

Pillar topics this article maps to