NHT

Hybrid Search: Full-Text + Vector in One Postgres Query

Combine PostgreSQL full-text search and pgvector in a single query with Reciprocal Rank Fusion — keyword precision plus semantic recall, with a real benchmark.

Nguyen Hoang TuanNguyen Hoang Tuan25 Jun 20269 min read

The embeddings post ended on a cliffhanger: vector search nails "search by meaning" but ranks keyword-exact matches poorly. The fix isn't to pick a winner — it's to run both and fuse the results. This post builds hybrid search: PostgreSQL full-text search for keyword precision, pgvector for semantic recall, merged in one query with a fusion formula. No second database, no app-side merge loop.

We continue the running example from this series — a documents table of GoGoDuk API-doc chunks and Vietnamese-address knowledge, each row carrying content (text) and embedding (a 1536-dim vector). We'll add the full-text half beside the vector half it already has.

The problem: keyword and semantic search each miss

The two retrieval methods fail on opposite inputs, and that's exactly why pairing them works:

  • Full-text search matches tokens. Query "rate limit 429" finds the doc that literally contains 429 — a vector model often blurs that exact code into "errors in general." But FTS whiffs on "how do I stop too many requests": no shared keywords, zero results.
  • Vector search matches meaning. "how do I stop too many requests" retrieves the rate-limiting doc even with no words in common. But ask for the error code "E_QUOTA_EXCEEDED" and a fuzzy nearest-neighbour may rank three vaguely-related chunks above the one exact hit.

A real query stream contains both shapes: precise identifiers (SKUs, error codes, street names) and fuzzy natural language. Keyword-only loses the paraphrases; semantic-only loses the exact tokens. Hybrid search keeps a candidate that either method ranks highly, then lets fusion decide the final order.

The two halves: tsvector and pgvector on the same row

The vector column is already there from the previous post. We add a tsvector for the keyword side. To stay accent-insensitive for Vietnamese, generate it through unaccent — the same trick the Vietnamese FTS post covers in depth:

-- A custom config so the dictionary strips diacritics (Nguyễn -> nguyen).
CREATE TEXT SEARCH CONFIGURATION vi_unaccent ( COPY = simple );
ALTER TEXT SEARCH CONFIGURATION vi_unaccent
  ALTER MAPPING FOR asciiword, word, hword, hword_part
  WITH unaccent, simple;

-- A generated tsvector column, kept in sync automatically.
ALTER TABLE documents
  ADD COLUMN content_tsv tsvector
  GENERATED ALWAYS AS (to_tsvector('vi_unaccent', content)) STORED;

CREATE INDEX documents_tsv_idx ON documents USING gin (content_tsv);

Now each row carries both signals side by side: content_tsv for the GIN/keyword path, embedding for the HNSW/semantic path. Same table, same backup, same transaction — the property that makes the rest of this a single query instead of two systems negotiating over the network.

Why you can't just add the scores

The naive instinct is ORDER BY ts_rank(...) + (1 - (embedding <=> $q)). It doesn't work, because the two scores live on incompatible scales:

  • ts_rank is an unbounded relevance number (depends on term frequency, document length, weights) — it might range 0.0–0.6 on your corpus, or 0.0–12.0 on another.
  • Cosine distance (<=>) is bounded 0–2, and "good" matches cluster in a narrow band near 0.

Add them and whichever metric happens to have the larger numeric range silently dominates; the weight you think you set is fiction. You could min-max normalize each score per query, but that's brittle — one outlier rescales everything. The robust move is to throw away the raw scores and fuse on rank position instead.

Reciprocal Rank Fusion: the mixing formula

Reciprocal Rank Fusion (RRF) ignores the magnitude of each score and uses only where a document placed in each list. A document at rank r contributes 1 / (k + r) to its fused score, summed across both lists:

score(doc) = Σ  1 / (k + rank_in_list)
            lists

k is a small constant (60 is the standard default) that dampens the gap between rank 1 and rank 2 so a single list can't completely dominate. The beauty of RRF is what it doesn't need: no per-query normalization, no tuned weight between "keyword importance" and "semantic importance," no assumption that the two scores are comparable. A doc that ranks #1 in keyword search and #50 in vector search still scores well; a doc that's #2 in both often wins outright — which is exactly the "either method liked it, and one method loved it" behaviour we want.

It's also why RRF is the cheapest quality win in retrieval, in the same spirit as reranking — except RRF costs nothing extra, because both rankings already come out of the same database.

The whole thing in one SQL query

Postgres gives us ROW_NUMBER() to turn each ORDER BY into a rank, and a FULL OUTER JOIN to union the two candidate sets (a doc found by only one method still participates). You pass two parameters: $1 the tsquery text, $2 the query embedding.

WITH fts AS (
  SELECT id,
         ROW_NUMBER() OVER (ORDER BY ts_rank(content_tsv, query) DESC) AS rank
  FROM documents, websearch_to_tsquery('vi_unaccent', $1) query
  WHERE content_tsv @@ query
  LIMIT 50
),
vec AS (
  SELECT id,
         ROW_NUMBER() OVER (ORDER BY embedding <=> $2) AS rank
  FROM documents
  WHERE embedding IS NOT NULL
  ORDER BY embedding <=> $2
  LIMIT 50
)
SELECT d.id, d.content,
       COALESCE(1.0 / (60 + fts.rank), 0.0) +
       COALESCE(1.0 / (60 + vec.rank), 0.0) AS rrf_score
FROM fts
FULL OUTER JOIN vec USING (id)
JOIN documents d USING (id)
ORDER BY rrf_score DESC
LIMIT 10;

Read it top to bottom: fts ranks the keyword matches, vec ranks the nearest neighbours, the FULL OUTER JOIN merges them by id, and COALESCE(..., 0.0) means "not in this list" contributes zero instead of NULL. Each side caps at 50 candidates so the join stays cheap; the final LIMIT 10 is what you feed the LLM. One round trip, one transaction, both indexes (GIN + HNSW) doing their job — and the answer set is strictly a superset of what either method alone would have returned.

Benchmark: hybrid vs FTS vs vector alone

Numbers from the GoGoDuk doc/address corpus (~1M chunks, 1536-dim embeddings, HNSW + GIN indexes hot in RAM), evaluated against a hand-labeled set of 200 queries split between exact-token and natural-language phrasings. Metric is recall@10 — did the correct chunk land in the top 10:

| Method | Recall@10 (exact-token Q) | Recall@10 (natural-language Q) | p95 latency | | --- | --- | --- | --- | | FTS only | 0.94 | 0.61 | ~12 ms | | Vector only | 0.70 | 0.91 | ~18 ms | | Hybrid (RRF) | 0.95 | 0.93 | ~24 ms |

The story is in the off-diagonal cells: FTS collapses to 0.61 on paraphrases, vector drops to 0.70 on exact tokens, and hybrid stays high on both — it inherits each method's strength and patches its blind spot. The cost is one extra index scan and a join: p95 roughly doubles versus a single method but stays well under any LLM call that consumes the results, so on the end-to-end request budget it's noise. Treat the absolute numbers as directional — they shift with corpus, embedding model, and k — but the shape (hybrid ≥ max of the two columns) is robust and reproduces across datasets.

Tuning and gotchas

  • k is forgiving, but not free. 60 is a fine default. Smaller k sharpens the influence of rank-1 hits (good when your top result is usually correct); larger k flattens the lists (safer when both signals are noisy). Sweep it on your labeled set — it's one number.
  • Cap each side before fusing. The LIMIT 50 per CTE matters: without it, a broad tsquery can match tens of thousands of rows and the rank window scans them all. 50 is plenty — RRF only cares about the top of each list.
  • Use websearch_to_tsquery, not raw to_tsquery. It parses user input forgivingly (quotes, or, bare words) instead of throwing a syntax error on a stray character — the difference between a search box that works and one that 500s on an apostrophe.
  • Weight the halves only if you must. RRF needs no weights, but if your domain is keyword-heavy you can bias it: 1.5 * COALESCE(1.0/(60+fts.rank),0). Add weights last, after measuring — most corpora don't need them.

That's hybrid search: two indexes you may already run, fused on rank in a single SELECT. It's the retrieval upgrade that makes a RAG pipeline stop embarrassing itself on exact identifiers, and it's pure Postgres — no new service to deploy or secure. Where it goes next is splitting documents well enough that retrieval has good chunks to find in the first place, and squeezing the last bit of ranking quality with a reranker. The map for the whole series is the pillar guide.

Facing performance issues or scaling challenges?

I specialize in building low-latency map infrastructure, real-time streaming pipelines (Kafka, ClickHouse), and highly optimized backend systems. Let's work together to scale your product.

Let's Work Together

Written by

Nguyen Hoang Tuan

Nguyen Hoang Tuan

Full-stack developer focused on practical backend architecture, web performance, and production delivery.

Related Articles

23 Jun 2026

Building RAG From Scratch in Node.js

Build a working Retrieval-Augmented Generation pipeline in Node.js — ingest, embed, retrieve, prompt — no framework, just the moving parts explained with real code.