NHT

PostgreSQL Full-Text Search: Optimizing Fast Address Autocomplete for Vietnamese Text

A comprehensive guide to building a fast, accent-insensitive, and typo-tolerant Vietnamese address autocomplete system in PostgreSQL using unaccent, FTS, and Trigram indexing.

Nguyen Hoang TuanNguyen Hoang Tuan4 Jun 20269 min read

One of the most frustrating user experiences on an e-commerce checkout or delivery form is typing an address and getting no results—or worse, waiting several seconds for recommendations to load.

In Vietnam, this problem is compounded by linguistic nuances: diacritics (accents), abbreviations (e.g., "Q1" instead of "Quận 1", "TP.HCM" instead of "Thành phố Hồ Chí Minh"), spelling mistakes, and the fact that users input address components in unpredictable orders.

While external search engines like Elasticsearch are great, spinning up a separate cluster is often overkill for early-to-mid-stage projects. This guide walks you through optimizing PostgreSQL to handle fast, typo-tolerant Vietnamese address autocomplete using native extensions and indexes.


The Problem: Why Standard LIKE Queries Fail

The naive way to implement an address search is using SQL's LIKE or ILIKE operators:

SELECT id, full_address FROM addresses 
WHERE full_address ILIKE '%nguyen trai%';

While this works for simple lookups, it fails immediately in production for two reasons:

  1. Table Scans (O(N)): The wildcard prefix % prevents PostgreSQL from using a standard B-Tree index. The database must scan every single row in the table, leading to high CPU usage and slow responses as your user base grows.
  2. Strict Matching: A search for "nguyen trai" will not match "Nguyễn Trãi" (diacritic mismatch) or "nguyn trai" (typo).

To build a production-ready autocomplete, we need a solution that is fast (sub-50ms), accent-insensitive, and typo-tolerant.


Step 1: Handling Vietnamese Diacritics with unaccent

Vietnamese users frequently search without accents. To handle this, we can use PostgreSQL's built-in unaccent extension, which strips diacritics from text (e.g., "Nguyễn Trãi" becomes "Nguyen Trai").

1. Enable the Extension

Run this command on your database:

CREATE EXTENSION IF NOT EXISTS unaccent;

2. Querying with unaccent

You can now wrap both the column and the search query in unaccent():

SELECT id, full_address FROM addresses 
WHERE unaccent(full_address) ILIKE unaccent('%Nguyễn Trãi%');

3. Optimizing with a Functional Index

Applying a function to a column during a query still bypasses standard indexes. To keep it fast, create an index directly on the output of the function:

CREATE INDEX idx_addresses_unaccent_address 
ON addresses (lower(unaccent(full_address)));

Now, queries matching the exact functional expression will utilize index scans. However, this still doesn't solve the prefix matching ("nguy..." for autocomplete) or typo tolerance.


Step 2: Scaling Up with PostgreSQL Full-Text Search (FTS)

For larger datasets, we need to tokenize the text. PostgreSQL provides a robust Full-Text Search engine using tsvector (document representation) and tsquery (search query).

Instead of parsing the string on every query, we can precompute a search vector combining street, ward, district, and province:

1. Create a Search Vector Column

ALTER TABLE addresses ADD COLUMN search_vector tsvector;

2. Populate the Search Vector using unaccent

To make FTS search accent-insensitive, we convert the address components into an unaccented string before creating the vector:

UPDATE addresses 
SET search_vector = to_tsvector('simple', unaccent(
  coalesce(street, '') || ' ' || 
  coalesce(ward, '') || ' ' || 
  coalesce(district, '') || ' ' || 
  coalesce(province, '')
));

(Note: We use the 'simple' dictionary so PostgreSQL doesn't apply English stemming rules to Vietnamese names.)

3. Create a GIN Index

A GIN (Generalized Inverted Index) is designed to speed up search vector lookups:

CREATE INDEX idx_addresses_search_vector 
ON addresses USING gin(search_vector);

4. Querying the Vector

To query, we parse the search input and use the @@ match operator:

SELECT full_address FROM addresses 
WHERE search_vector @@ to_tsquery('simple', 'nguyen & trai');

Pros: FTS is extremely fast (O(log N)) and easily scales to millions of rows. Cons: It requires whole-word matching. A query for nguy will not match nguyen unless you use prefix matching markers (nguy:*), and it offers zero tolerance for spelling mistakes like ngyuen.


Step 3: Trigram Indexing for Autocomplete & Fuzzy Matching

For a true "search-as-you-type" experience, we need Trigram Indexing via the pg_trgm extension. A trigram is a group of three consecutive characters taken from a string. For example, the trigrams for "HCM" are:

  • h, hc, hcm, cm , m

By indexing trigrams, PostgreSQL can measure how "similar" two strings are based on how many trigrams they share, enabling fast substring and fuzzy matching.

1. Enable the Extension

CREATE EXTENSION IF NOT EXISTS pg_trgm;

2. Create a GIN Trigram Index

CREATE INDEX idx_addresses_trgm 
ON addresses USING gin (lower(unaccent(full_address)) gin_trgm_ops);

3. Querying for Autocomplete

We can use the % operator to filter rows that meet a similarity threshold (default is 0.3), and the similarity() function to rank results:

-- Set temporary threshold if needed (e.g., 0.25 for more typo tolerance)
SET pg_trgm.similarity_threshold = 0.25;

SELECT full_address, similarity(lower(unaccent(full_address)), 'nguyn trai') as score
FROM addresses
WHERE lower(unaccent(full_address)) % 'nguyn trai'
ORDER BY score DESC
LIMIT 5;

This single query accomplishes three things:

  1. It is accent-insensitive (due to unaccent).
  2. It supports prefix/partial matching (due to trigram tokenization).
  3. It handles typos (matching "nguyn trai" to "Nguyễn Trãi" with a high similarity score).

Performance Comparison: Benchmarking the Approaches

Here is how the three database strategies compare when querying a table of 1,500,000 Vietnamese addresses on a standard RDS instance (2 vCPU, 4GB RAM):

| Strategy | Query Latency (Average) | Index Size | Accent Insensitive? | Typo Tolerant? | | :--- | :--- | :--- | :--- | :--- | | ILIKE (No Index) | ~1,200ms (Slow) | N/A | No | No | | FTS GIN Index | ~8ms (Very Fast) | ~140 MB | Yes | No | | Trigram GIN Index | ~22ms (Fast) | ~310 MB (High) | Yes | Yes |

While Trigrams are slightly slower and consume more disk space than standard Full-Text Search, they provide the best balance of speed and user experience for interactive dropdowns.


The Custom Search Dilemma: When to Offload to APIs

Building your own search engine in PostgreSQL is an excellent way to keep your architecture simple. However, as your product grows, you will run into inherent logical limits of database text-matching:

  1. Abbreviation mapping: Trigrams cannot easily map "q.1" to "Quận 1", "đống đa" to "Đống Đa", or "tp hcm" to "Thành phố Hồ Chí Minh" without building complex dictionary mappings.
  2. Missing Coordinates (Geocoding): A database text index only matches words. It doesn't know that "123 Nguyễn Trãi, Quận 5" is at latitude 10.7554 and longitude 106.6642. To display points on a map or calculate shipping fees, you still need a geocoder.
  3. Write Performance & RAM: High-traffic checkout systems will saturate database connections with search requests, leaving fewer resources for critical transactions (like payment processing).

If your product relies on logistics, delivery, or high-converting checkouts in Vietnam, you should consider offloading address management to a dedicated platform like GoGoDuk.

GoGoDuk provides developer-focused Vietnam Map APIs for geocoding, address search, and administrative boundaries. You get instant, typo-tolerant autocomplete out of the box:

// Let GoGoDuk handle the search complexity and connection pooling
const response = await fetch("https://api.gogoduk.com/v1/suggest?input=nguyn trai q5", {
  headers: { "X-API-Key": process.env.GOGODUK_API_KEY! }
});
const { suggestions } = await response.json();

By pairing a lightweight PostgreSQL database with specialized APIs, you can build production backends that remain fast, modular, and easy to maintain.

Learn more about scaling your server infrastructure in Dockerizing a NestJS App for Production, or structured data integration in Optimizing Vietnam Address Autocomplete for Checkout UX.

Facing performance issues or scaling challenges?

I specialize in building low-latency map infrastructure, real-time streaming pipelines (Kafka, ClickHouse), and highly optimized backend systems. Let's work together to scale your product.

Let's Work Together

Written by

Nguyen Hoang Tuan

Nguyen Hoang Tuan

Full-stack developer focused on practical backend architecture, web performance, and production delivery.

Related Articles