NHT

pgvector vs Dedicated Vector DB: When to Use Which

A decision guide comparing pgvector with Pinecone/Qdrant/Weaviate on cost, recall, ops and scale — with real query benchmarks and a default to start from.

Nguyen Hoang TuanNguyen Hoang Tuan22 Jun 20268 min read

The pillar of this series made a claim and then deferred the proof: start on pgvector, and graduate to a dedicated vector database only when the numbers force you to. This post is the benchmark behind that advice. The question is not "which is better" — it is "at what scale, with what team, does the second moving part start to earn its keep."

Everyone selling a vector database wants the answer to be "you need us on day one." For most backend teams it isn't. So let's look at what vector search actually demands from storage, what a managed vector DB gives you that Postgres doesn't, and where the crossover sits.

What "vector search" needs from a database

Strip away the marketing and a vector store does exactly three things:

  • Store a fixed-length float array per row (your embedding), ideally next to the data it describes.
  • Index those arrays so nearest-neighbour queries don't scan the whole table.
  • Query for the top-k rows closest to a query vector under some distance metric (cosine, L2, inner product).

That's it. The hard part is the index: an exact nearest-neighbour search is O(n) per query, which is fine at 10k rows and unusable at 10M. Every serious vector store — pgvector included — solves this with an approximate index (ANN) that trades a little recall for a lot of speed. The two parameters you actually care about are recall (did I get the truly-nearest neighbours, or close-enough ones?) and p95 latency (how slow is the slow query?). Hold those two fixed and most of the "which database" debate collapses into ops and cost.

pgvector: one less moving part

If you already run Postgres — and if you've followed this blog's full-text address autocomplete work, you do — adding vector search is one extension and one column. No new service to deploy, secure, back up, or keep in sync.

CREATE EXTENSION IF NOT EXISTS vector;

ALTER TABLE documents ADD COLUMN embedding vector(1536);

-- HNSW index for fast approximate nearest-neighbour search
CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Recall/speed knob, set per-session or per-transaction
SET hnsw.ef_search = 40;

-- Top-5 nearest neighbours, with a relational filter in the SAME query
SELECT id, content
FROM documents
WHERE tenant_id = $2 AND published
ORDER BY embedding <=> $1   -- $1 = query embedding, <=> = cosine distance
LIMIT 5;

The line that matters is the WHERE. Your embedding lives in the same row as tenant_id and published, so a filtered vector search is one query the planner can reason about — no fetching candidates from one system and filtering them in another. The vectors sit inside the same transaction as your writes, so there is no dual-write window and no consistency drift. That single property is why pgvector is the right default for the long tail of products.

In a dedicated store, the equivalent is two systems talking over the network:

// Pinecone-style: vectors live OUTSIDE your database
const res = await index.query({
  vector: queryEmbedding,
  topK: 5,
  filter: { tenant_id: tenantId, published: true },
});
// ...then hydrate full rows from Postgres by id, in a SECOND round trip
const rows = await db.documents.findMany({ where: { id: { in: res.matches.map((m) => m.id) } } });

Two stores, two round trips, two things that can drift out of sync. Sometimes worth it — but you should be buying something concrete in return.

Dedicated vector DBs: what you actually buy

Pinecone, Qdrant, Weaviate, and Milvus aren't snake oil. At scale they give you things Postgres makes you build yourself:

  • Horizontal sharding of the vector index across nodes — the real reason to leave, past tens of millions of vectors.
  • Index builds that don't block the table. A large HNSW build in Postgres competes with production traffic; managed stores build and swap indexes out of band.
  • Purpose-built memory layout and quantization (e.g. scalar/product quantization) that fits more vectors per GB of RAM.
  • Managed ops — replication, backups, and rebalancing handled for you, which is worth real money to a small team.

None of these matter at 100k vectors. All of them start to matter somewhere between 1M and 100M, depending on your latency target and budget. The job is to find your number, not to adopt someone else's.

Benchmark: recall & p95 at 100k / 1M / 10M vectors

Here's the shape that holds up across the public benchmarks (ANN-Benchmarks, pgvector's own HNSW numbers) and my own runs on 1536-dim OpenAI embeddings. Treat these as rough orders of magnitude, not lab-exact figures — they move with dimension, m/ef_search, hardware, and how warm your cache is:

| Dataset size | pgvector (HNSW) p95 | Dedicated DB p95 | Recall@10 (both, tuned) | | --- | --- | --- | --- | | 100k vectors | single-digit ms | single-digit ms | ~0.98+ | | 1M vectors | ~10–30 ms | ~10–20 ms | ~0.95–0.98 | | 10M vectors | ~50–150 ms (one box, tuned) | ~10–40 ms (sharded) | ~0.95 if RAM holds |

The headline: up to ~1M vectors the two are a wash on recall and latency. pgvector with a tuned HNSW index hits the same recall as a managed store at comparable p95. The gap opens at ~10M+, and even then only once a single Postgres box can no longer keep the index hot in RAM. Recall is a knob, not a property: raise hnsw.ef_search and recall climbs while latency rises — both engines pay that same tax.

Memory footprint and index build time

The constraint that actually bites first is RAM, not query speed. An HNSW index wants to live in memory; once it spills to disk, p95 falls off a cliff. A rough sizing rule for 1536-dim float32 vectors: about 6 KB per vector for the raw data plus index overhead, so ~1M vectors is single-digit GB and ~10M is tens of GB before you've counted your actual table. Index build time scales with m and ef_construction and is the other surprise — a multi-million-row HNSW build is minutes-to-hours and, in Postgres, contends with live traffic unless you build concurrently and off-peak. Dedicated stores hide both costs behind quantization and out-of-band builds; that convenience is a real part of what you're paying for.

Decision matrix: pick by scale, team size, and ops budget

| Your situation | Use pgvector | Use a dedicated vector DB | | --- | --- | --- | | < 1M vectors, already on Postgres | ✅ default | rarely worth it | | 1–10M vectors, p95 budget ≥ 50 ms | ✅ tune HNSW + RAM | only if ops is the bottleneck | | 10M+ vectors or strict low-latency SLA | feasible but you're now a vector-DB operator | ✅ sharding earns its keep | | Heavy relational filtering / multi-tenant | ✅ filter + vector in one query | filter pushdown is the weak spot | | Tiny team, no platform engineers | ✅ one system to run | ✅ if you'd rather pay than operate | | Frequent full re-index of huge corpora | build blocks traffic | ✅ out-of-band rebuilds |

Read it as two axes: scale pushes you right, operational simplicity pulls you left. Most teams overweight scale (a number they're guessing at) and underweight ops (a cost they pay every week).

My default: start on pgvector, graduate later

The cost side seals it for the common case. If you already run Postgres, vectors add a column and an index — effectively $0 of new infrastructure. A managed vector DB starts in the tens of dollars a month for a small index and climbs from there, on top of the Postgres you're still running to hold the source rows. You don't pay that until it buys you something measurable.

So the rule from the pillar, now with numbers behind it:

  1. Start on pgvector. Below ~1M vectors it matches a dedicated store on recall and latency, in one system, in one transaction, with relational filters for free.
  2. Instrument from day one. Track p95 and recall per query, exactly like the cost-and-latency discipline this series keeps returning to — the same observability mindset behind real-time usage analytics with ClickHouse.
  3. Graduate when a number forces you — index no longer fits in RAM, p95 blows the SLA at your real ef_search, or re-index time starts hurting production. By then you'll know exactly why you're migrating, which makes the migration boring instead of speculative.

The worst outcome isn't picking pgvector and outgrowing it — that's a good problem with an obvious trigger. The worst outcome is standing up a second datastore on day one to solve a scale you don't have yet. Bookmark the series map; the next posts go deep on the embeddings, indexes, and retrieval that make this stack actually work.

Facing performance issues or scaling challenges?

I specialize in building low-latency map infrastructure, real-time streaming pipelines (Kafka, ClickHouse), and highly optimized backend systems. Let's work together to scale your product.

Let's Work Together

Written by

Nguyen Hoang Tuan

Nguyen Hoang Tuan

Full-stack developer focused on practical backend architecture, web performance, and production delivery.

Related Articles