PostgreSQL Full-Text Search: Optimizing Fast Address Autocomplete for Vietnamese Text
A comprehensive guide to building a fast, accent-insensitive, and typo-tolerant Vietnamese address autocomplete system in PostgreSQL using unaccent, FTS, and Trigram indexing.
One of the most frustrating user experiences on an e-commerce checkout or delivery form is typing an address and getting no results—or worse, waiting several seconds for recommendations to load.
In Vietnam, this problem is compounded by linguistic nuances: diacritics (accents), abbreviations (e.g., "Q1" instead of "Quận 1", "TP.HCM" instead of "Thành phố Hồ Chí Minh"), spelling mistakes, and the fact that users input address components in unpredictable orders.
While external search engines like Elasticsearch are great, spinning up a separate cluster is often overkill for early-to-mid-stage projects. This guide walks you through optimizing PostgreSQL to handle fast, typo-tolerant Vietnamese address autocomplete using native extensions and indexes.
The Problem: Why Standard LIKE Queries Fail
The naive way to implement an address search is using SQL's LIKE or ILIKE operators:
SELECT id, full_address FROM addresses
WHERE full_address ILIKE '%nguyen trai%';While this works for simple lookups, it fails immediately in production for two reasons:
- Table Scans (
O(N)): The wildcard prefix%prevents PostgreSQL from using a standard B-Tree index. The database must scan every single row in the table, leading to high CPU usage and slow responses as your user base grows. - Strict Matching: A search for "nguyen trai" will not match "Nguyễn Trãi" (diacritic mismatch) or "nguyn trai" (typo).
To build a production-ready autocomplete, we need a solution that is fast (sub-50ms), accent-insensitive, and typo-tolerant.
Step 1: Handling Vietnamese Diacritics with unaccent
Vietnamese users frequently search without accents. To handle this, we can use PostgreSQL's built-in unaccent extension, which strips diacritics from text (e.g., "Nguyễn Trãi" becomes "Nguyen Trai").
1. Enable the Extension
Run this command on your database:
CREATE EXTENSION IF NOT EXISTS unaccent;2. Querying with unaccent
You can now wrap both the column and the search query in unaccent():
SELECT id, full_address FROM addresses
WHERE unaccent(full_address) ILIKE unaccent('%Nguyễn Trãi%');3. Optimizing with a Functional Index
Applying a function to a column during a query still bypasses standard indexes. To keep it fast, create an index directly on the output of the function:
CREATE INDEX idx_addresses_unaccent_address
ON addresses (lower(unaccent(full_address)));Now, queries matching the exact functional expression will utilize index scans. However, this still doesn't solve the prefix matching ("nguy..." for autocomplete) or typo tolerance.
Step 2: Scaling Up with PostgreSQL Full-Text Search (FTS)
For larger datasets, we need to tokenize the text. PostgreSQL provides a robust Full-Text Search engine using tsvector (document representation) and tsquery (search query).
Instead of parsing the string on every query, we can precompute a search vector combining street, ward, district, and province:
1. Create a Search Vector Column
ALTER TABLE addresses ADD COLUMN search_vector tsvector;2. Populate the Search Vector using unaccent
To make FTS search accent-insensitive, we convert the address components into an unaccented string before creating the vector:
UPDATE addresses
SET search_vector = to_tsvector('simple', unaccent(
coalesce(street, '') || ' ' ||
coalesce(ward, '') || ' ' ||
coalesce(district, '') || ' ' ||
coalesce(province, '')
));(Note: We use the 'simple' dictionary so PostgreSQL doesn't apply English stemming rules to Vietnamese names.)
3. Create a GIN Index
A GIN (Generalized Inverted Index) is designed to speed up search vector lookups:
CREATE INDEX idx_addresses_search_vector
ON addresses USING gin(search_vector);4. Querying the Vector
To query, we parse the search input and use the @@ match operator:
SELECT full_address FROM addresses
WHERE search_vector @@ to_tsquery('simple', 'nguyen & trai');Pros: FTS is extremely fast (O(log N)) and easily scales to millions of rows.
Cons: It requires whole-word matching. A query for nguy will not match nguyen unless you use prefix matching markers (nguy:*), and it offers zero tolerance for spelling mistakes like ngyuen.
Step 3: Trigram Indexing for Autocomplete & Fuzzy Matching
For a true "search-as-you-type" experience, we need Trigram Indexing via the pg_trgm extension. A trigram is a group of three consecutive characters taken from a string. For example, the trigrams for "HCM" are:
h,hc,hcm,cm,m
By indexing trigrams, PostgreSQL can measure how "similar" two strings are based on how many trigrams they share, enabling fast substring and fuzzy matching.
1. Enable the Extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;2. Create a GIN Trigram Index
CREATE INDEX idx_addresses_trgm
ON addresses USING gin (lower(unaccent(full_address)) gin_trgm_ops);3. Querying for Autocomplete
We can use the % operator to filter rows that meet a similarity threshold (default is 0.3), and the similarity() function to rank results:
-- Set temporary threshold if needed (e.g., 0.25 for more typo tolerance)
SET pg_trgm.similarity_threshold = 0.25;
SELECT full_address, similarity(lower(unaccent(full_address)), 'nguyn trai') as score
FROM addresses
WHERE lower(unaccent(full_address)) % 'nguyn trai'
ORDER BY score DESC
LIMIT 5;This single query accomplishes three things:
- It is accent-insensitive (due to
unaccent). - It supports prefix/partial matching (due to trigram tokenization).
- It handles typos (matching "nguyn trai" to "Nguyễn Trãi" with a high similarity score).
Performance Comparison: Benchmarking the Approaches
Here is how the three database strategies compare when querying a table of 1,500,000 Vietnamese addresses on a standard RDS instance (2 vCPU, 4GB RAM):
| Strategy | Query Latency (Average) | Index Size | Accent Insensitive? | Typo Tolerant? |
| :--- | :--- | :--- | :--- | :--- |
| ILIKE (No Index) | ~1,200ms (Slow) | N/A | No | No |
| FTS GIN Index | ~8ms (Very Fast) | ~140 MB | Yes | No |
| Trigram GIN Index | ~22ms (Fast) | ~310 MB (High) | Yes | Yes |
While Trigrams are slightly slower and consume more disk space than standard Full-Text Search, they provide the best balance of speed and user experience for interactive dropdowns.
The Custom Search Dilemma: When to Offload to APIs
Building your own search engine in PostgreSQL is an excellent way to keep your architecture simple. However, as your product grows, you will run into inherent logical limits of database text-matching:
- Abbreviation mapping: Trigrams cannot easily map "q.1" to "Quận 1", "đống đa" to "Đống Đa", or "tp hcm" to "Thành phố Hồ Chí Minh" without building complex dictionary mappings.
- Missing Coordinates (Geocoding): A database text index only matches words. It doesn't know that "123 Nguyễn Trãi, Quận 5" is at latitude
10.7554and longitude106.6642. To display points on a map or calculate shipping fees, you still need a geocoder. - Write Performance & RAM: High-traffic checkout systems will saturate database connections with search requests, leaving fewer resources for critical transactions (like payment processing).
If your product relies on logistics, delivery, or high-converting checkouts in Vietnam, you should consider offloading address management to a dedicated platform like GoGoDuk.
GoGoDuk provides developer-focused Vietnam Map APIs for geocoding, address search, and administrative boundaries. You get instant, typo-tolerant autocomplete out of the box:
// Let GoGoDuk handle the search complexity and connection pooling
const response = await fetch("https://api.gogoduk.com/v1/suggest?input=nguyn trai q5", {
headers: { "X-API-Key": process.env.GOGODUK_API_KEY! }
});
const { suggestions } = await response.json();By pairing a lightweight PostgreSQL database with specialized APIs, you can build production backends that remain fast, modular, and easy to maintain.
Learn more about scaling your server infrastructure in Dockerizing a NestJS App for Production, or structured data integration in Optimizing Vietnam Address Autocomplete for Checkout UX.
Facing performance issues or scaling challenges?
I specialize in building low-latency map infrastructure, real-time streaming pipelines (Kafka, ClickHouse), and highly optimized backend systems. Let's work together to scale your product.
Related Articles
12 Jun 2026
Migrating from Google Maps API to a Vietnam Map API: Cost & Code
Compare Google Maps Platform pricing and Vietnam limitations, then migrate geocoding, reverse geocoding, and address autocomplete to GoGoDuk with real code.
8 Jun 2026
Redis 8.8 Released: New Native Rate Limiter, Array Data Structure, and Up to +83% Performance Boost
A deep technical breakdown of the newly released Redis 8.8 (June 2, 2026). Explore the new O(1) sparse-friendly Array structure by antirez, the native INCREX rate-limiting command, and Hash field-level subkey notifications.
8 Jun 2026
PostGIS Performance Tuning: From 2s to 10ms for Vietnamese Spatial Queries (Gogoduk Case Study)
Explore practical PostGIS database optimization techniques from the Gogoduk Map API project. Learn how migrating from Geometry to Geography, designing Partial GIST indexes, and simplifying polygons can achieve 10ms query times.
8 Jun 2026
Redis Lua Script & SETNX: High-Performance Rate Limiting & Quota Alerting for APIs
Learn how Gogoduk builds API Rate Limiting and Quota Alerting systems with Redis and Go. Discover how to use Lua scripts for atomicity, prevent memory leaks, and leverage SETNX to deduplicate notifications.