NHT

PostgreSQL Partitioning: Taming 100M+ Row Tables with Declarative Partitioning

A practical guide to partitioning large PostgreSQL tables (100M+ rows): choosing RANGE/LIST/HASH, creating declarative partitions, using partition pruning to speed up queries, indexing per partition, and near-free data retention with DETACH/DROP PARTITION.

Nguyen Hoang TuanNguyen Hoang Tuan20 Jun 202610 min read

A PostgreSQL table runs smoothly with a few million rows, then suddenly slows down once it crosses a few hundred million. Indexes grow larger than RAM, autovacuum can't keep up, and a single DELETE to clean out old data locks the table for minutes. Adding indexes or bumping work_mem only buys you so much—past a threshold, the problem isn't "how do I write this query" but "the table is too big to handle as one monolithic block."

That's where partitioning earns its keep. This post walks through PostgreSQL's declarative partitioning (available since PG 11) the practical way: when you actually need it, how to choose between RANGE/LIST/HASH, how partition pruning makes queries touch only the data they need, how per-partition indexing works, and why dropping old data becomes nearly free.

Note: the row counts and query timings here are illustrative to make the trade-offs concrete. The PARTITION BY, PARTITION OF ... FOR VALUES, and ATTACH/DETACH/DROP PARTITION syntax is standard PostgreSQL 11+ behavior.

When a PostgreSQL table needs partitioning

Partitioning isn't free—it adds complexity to table creation, indexing, and migrations. Don't partition just because a table "sounds big." Reach for it when several of these signs show up together:

  • The table is already large and still growing—typically from around 100 million rows, or many times the size of available RAM.
  • The data has a clear time dimension—logs, events, orders, transactions… constantly appended and aging out. You usually query recent ranges and want to delete old data periodically.
  • Cleaning out old data is painful—a DELETE FROM events WHERE created_at < now() - interval '90 days' scans and locks tens of millions of rows and dumps a mountain of dead tuples on autovacuum.
  • Indexes no longer fit in memory—indexes on a huge table don't stay cached, so every read hits disk.

If your data has no natural split dimension (by time, by customer, by region), or the table is only a few million rows, index tuning—like in the PostGIS Performance Tuning post—is usually the right move before reaching for partitioning.

Declarative partitioning: RANGE, LIST, or HASH

PostgreSQL supports three declarative partitioning strategies. Pick based on how the data splits naturally and how you query it:

  • RANGE—partition by continuous value ranges. The most common, used for time columns: each month (or day/week) is a partition. Ideal for logs, events, orders.
  • LIST—partition by a discrete set of values. Fits when data splits along fixed categories: by country, by event type, by status.
  • HASH—PostgreSQL hashes the partition key and spreads rows evenly across N partitions. Use it when there's no natural split but you still want to shrink each piece (e.g. partition by customer_id).

In practice, RANGE by time is the most-used case, because it helps both at query time (most queries filter by a recent time range, enabling pruning) and at cleanup time (dropping an old month is cheap). The rest of this post focuses on RANGE.

Creating a RANGE-partitioned table

The parent table is declared as partitioned via the PARTITION BY clause. The parent itself holds no data—it only defines the structure and the partition key. The real rows live in the child partitions.

-- Parent table: defines structure + partition key
CREATE TABLE events (
  id          bigserial,
  customer_id bigint        NOT NULL,
  event_type  text          NOT NULL,
  payload     jsonb,
  created_at  timestamptz   NOT NULL
) PARTITION BY RANGE (created_at);

-- Child partitions, one per month
CREATE TABLE events_2026_05 PARTITION OF events
  FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

CREATE TABLE events_2026_06 PARTITION OF events
  FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');

One important detail: the range is lower-inclusive, upper-exclusiveFROM is included, TO is excluded. That's why two adjacent partitions never overlap (the boundary '2026-06-01' belongs to June, not May). When you INSERT into the parent events, PostgreSQL routes each row to the correct partition based on created_at—your application never needs to know which partitions exist.

If a row falls into a range no partition covers, the INSERT fails. You can create a DEFAULT partition to catch stray rows, but it's better to always pre-create partitions for the future (see automation below).

Partition pruning: why queries get much faster

This is the biggest performance win. Partition pruning is PostgreSQL recognizing that a query only concerns a few partitions, and skipping the rest entirely—not reading them, not scanning their indexes.

EXPLAIN
SELECT count(*) FROM events
WHERE created_at >= '2026-06-10' AND created_at < '2026-06-20';

With a filter on the partition column created_at, the plan only touches events_2026_06 instead of the whole table. If you have 36 partitions across 3 years of data, a query over one recent week touches just 1 partition—skipping the other 35. The amount of data to read drops by exactly that ratio.

The core condition for pruning to work: the WHERE clause must filter on the partition key. If you only filter by customer_id with no time constraint, PostgreSQL can't eliminate partitions and is forced to scan them all. So choose a partition key that matches the dimension your hottest queries filter on. Modern PostgreSQL also supports pruning at execution time (runtime), so parameterized queries and joins can benefit too.

Indexes and constraints on partitioned tables

When you create an index on the parent table, PostgreSQL automatically creates the matching index on each child partition (and on partitions created later). This is the key reason performance doesn't degrade: instead of one giant index over a hundred million rows, you get many small indexes—each serving only one partition's data, so they fit in memory and stay cheap to maintain.

-- Created on the parent => applies to every partition, including future ones
CREATE INDEX ON events (customer_id, created_at);

There's one important constraint to remember about unique keys: every UNIQUE or PRIMARY KEY on a partitioned table must include the partition key. PostgreSQL can't guarantee uniqueness across partitions without the partition column in the key. So a plain PRIMARY KEY (id) is invalid here; you need PRIMARY KEY (id, created_at).

-- Valid: the primary key includes the partition column created_at
ALTER TABLE events ADD PRIMARY KEY (id, created_at);

This is a common trade-off when moving to partitioning, and it needs to be accounted for at schema-design time—especially if the table is referenced by foreign keys from elsewhere.

Near-free retention and data cleanup

This is what makes time-based partitioning most worth it. On a regular table, deleting old data means a DELETE scanning tens of millions of rows: it locks, generates dead tuples, piles work onto autovacuum, and fragments indexes. On a partitioned table, cleaning out old data is a metadata operation:

-- Detach the partition (data is kept, e.g. to archive)
ALTER TABLE events DETACH PARTITION events_2026_05;

-- Or drop a whole month of old data outright — nearly instant
DROP TABLE events_2026_05;

DROP TABLE on a partition is nearly instant because it just removes a separate data file, rather than deleting row by row. No dead tuples, no vacuum cleanup afterward, no long table lock. DETACH is useful when you want to peel a partition off to archive (e.g. dump to cold storage) before dropping it.

This approach mirrors TTL-based retention in the OLAP world. The ClickHouse usage & billing analytics post shows ClickHouse using PARTITION BY toYYYYMM(...) together with TTL ... DELETE to automatically drop whole old months—the same "delete by partition, not by row" idea, except OLAP does it automatically while PostgreSQL leaves you in explicit control.

Automating partitions and common mistakes

A time-partitioned table needs a steady supply of partitions for upcoming data. If it's the end of the month and you haven't created next month's partition, the first INSERT of that month will fail. There are two ways to handle this:

  • pg_partman—the standard extension for managing partitions: it pre-creates future partitions and prunes expired ones per a retention policy. This is the choice for serious production systems.
  • A homegrown cron job—a periodic task that runs CREATE TABLE ... PARTITION OF ... for the next few months. Simple, and good enough for small systems.

Common traps when rolling out partitioning:

  • Forgetting to create future partitions. The most common failure and a direct cause of downtime. Always pre-create several periods ahead, and monitor to alert early.
  • Queries that don't filter on the partition key. You lose pruning entirely and scan every partition. Make sure your hot queries all carry a condition on the partition column.
  • Too many partitions. Splitting too finely (e.g. hourly across years) creates thousands of partitions and slows down query planning. Pick a sensible granularity—usually monthly or weekly.
  • Forgetting the unique-key constraint. Discovering late that the existing PK doesn't include the partition column, forcing a schema change once data is already large.
  • Migrating a live production table. Converting a huge existing table to partitioned form needs a careful plan (create a new partitioned table, copy data in batches, then ATTACH)—you can't flip it in place with a single command.

Wrapping up: OLTP partitioning vs OLAP

Declarative partitioning in PostgreSQL comes down to a few core ideas: choose a partition key that matches your hottest query dimension (usually time), let partition pruning shrink how much you read, index on the parent so each partition gets a small index, and clean up old data with DROP PARTITION instead of DELETE. Get those four right and a 100-million-row table stays as manageable as it was when small.

A worthwhile perspective: the "table too big" problem shows up in both OLTP and OLAP, and the solution converges on the same idea—split data by time and operate on it one piece at a time. PostgreSQL (OLTP) gives you explicit control over partitions; ClickHouse (OLAP) does it almost automatically via PARTITION BY + TTL, as in the ClickHouse usage analytics post. Understanding both lets you pick the right tool: keep hot transactional data in a partitioned PostgreSQL, and push huge analytical data to an OLAP engine when you need to.

Facing performance issues or scaling challenges?

I specialize in building low-latency map infrastructure, real-time streaming pipelines (Kafka, ClickHouse), and highly optimized backend systems. Let's work together to scale your product.

Let's Work Together

Written by

Nguyen Hoang Tuan

Nguyen Hoang Tuan

Full-stack developer focused on practical backend architecture, web performance, and production delivery.

Related Articles