Does the partition key also have to be part of the primary key?

Posted on

Question :

I am partitioning a table based on a column that is not a primary key? I’ve read some conflicting information today on whether the partition column must be a part of the primary key. My gut says no, but I am not 100% sure. So questions…

  1. Must the partition column be part of the primary? Is it recommended one way or the other?
  2. Do I have to create an index for the partition key, or does the DBMS do it automatically on its own?

Answer :

Not at all.

One of the most common scenarios for partitioning is to use a date field, which is totally unrelated to your PK.

For instance, if you have a table Orders with the field OrderDate you would most likely partition based on the month and year of OrderDate.

When records age out and are no longer relevant you can move those partitions off to an archive table or database so they are no longer processed.

Partitioning will work with pretty much any field, but in order for it to work WELL the field(s) you partition on should be used in most, if not all, of your queries. If you don’t include your partition keys then you will get essentially an expensive table scan that goes across multiple tables (partitions).

EDIT

For part 2, I think the answer is no as well. The partition key is used to determine which partition to put the row in, but I don’t think an index is maintained. There may be stats in the back end on it though.

In addition to JNK’s answer, you probably should read this article which discusses aligning table partitions and index partitions.

There are many types of scenarios where partitioning scheme does exactly follows the primary key’s first column – for instance in a data warehouse scenario where the snapshot date of a fact table is usually the partition column as well as the first column in the primary key.

But equally, in OLTP environments where the PK is an IDENTITY or other surrogate key, it makes little sense to use this for the partition, since partitioning on arbitrary numbers is not normally terribly useful. In OLTP systems, you also tend to partition by date the most (probably not in the PK), but potentially also regionally or by some kind of organizational division (maybe in the PK if you aren’t using a surrogate).

But it’s not a requirement.

It has to be part of a Candidate Key if not part of the Primary key itself. Idea being, your partitioning should align itself with the primary key.

So the answer is, yes, it is preferred to be part of the PK. If not another key, which is equally good enough to be a PK.

Leave a Reply

Your email address will not be published.