One of our tables is over 200gb and the old records are hardly ever queried, and this makes me think partitioning is a good solution, however I am having trouble figuring out which column to use as a Partition Key for a table.
We have 2 tables as follows (names and structures are altered for confidentiality):
transactionstable – with a transaction ID,
checkstable – With the schema
(id, transcation_id, created_at, metadata, last_updated_at)there can be multiple different checks with for a given
As part of data aggregation we write a summarized version of the checks into a single column in the
transactions table. The way we do this aggregation is by running an
transactions table with joins on the checks table filtering by
last_updated_at to only check for checks that have been updated recently.
Users however query using
created_at in the
checks table, and also
last_updated_at could potentially change for a given row (although not likely).
Which one should I use as partition key?
Most queries become slower with partitioning.
The exceptions I know are:
- a query that needs a sequential scan, but contains the partitioning key in the
- a query that joins two partitioned tables, and the join condition matches the partition key for both
- a query that groups and aggregates by the partitioning key
So you should rather consider how data expire. Partition your tables so that you can get rid of old, obsolete data by dropping partitions.