Best practices for generating unique multi-column keys for weak entities?

Posted on

Question :

How should one generate non-unique, non-natural identifiers for weak entities?

For example, if order_id is the primary key for an order table, and (order_id, item_number) is the primary key for an order_item table with a foreign key on order_id, how best to generate item_number?

A few of possibilities come to mind, but none seem ideal:

  1. Autoincrement item_number: the order_item entity is no longer weak, and the composite key is redundant.

  2. Use a trigger to search for the current max item_number for a given order_id, then increment: if a row is deleted this could lead to reassigning a PK to a different record – that doesn’t seem like a good idea? (edit: this can also be done without using triggers as explained in joanolo’s answer)

  3. Use a trigger to create a new sequence for every order_id, and somehow draw item_numbers from the appropriate sequence: this is functionally the desired behaviour, but seems like it would be a mess to implement. Is it even possible to reference a sequence by an order_id?

Edit – closely related (if not duplicate):

Answer :

1. is the least error-prone, simplest and fastest.
Trigger solutions like in 2. or 3. are subject to subtle race conditions under concurrent write access.

Make item_number a serial column and also the PK for order_item in this case. Stick with the default values drawn from the underlying sequence and never update the column.

Create a multi-column index on (order_id, item_number) for performance of typical queries. (Might as well be UNIQUE, but does not have to be.) In a typical setup (order_id and item_number can both be plain integer), the multicolumn index happens to be just as small and fast as an index on just order_id:

(Like I commented:) Typically, the only important role of an item number is to be unique (and immutable). If you need a stable sort order among items, you might just rely on the serial value of item_number. Be aware that those numbers are not necessarily in order of transaction commits. It may be useful to add the transaction timestamp current_timestamp (or possibly statement_timestamp() or clock_timestamp()) to the row. Depends on requirements and access patterns.

You can add a VIEW for the human eye, with item-numbers per order_id starting from 1, dynamically generated with row_number(), ordered by above criteria. But operate with the unique, immutable item_number internally.

You can have a fourth alternative, that is consistent with your definition, where the database takes care of itemNumbers when you INSERT your data; without the need for triggers. [Although you could use them as well.]

Assuming this is your schema:

    "orderId" integer primary key,  -- this should probably be serial
    "order" text not null
) ;

    "orderId" integer not null REFERENCES "Order"("orderId") 
    "itemNumber" integer not null,
    "orderItem" text,

    PRIMARY KEY ("orderId", "itemNumber")
) ;

and that we have three orders:

INSERT INTO "Order" ("orderId", "order") VALUES(1, 'Order number 1') ;
INSERT INTO "Order" ("orderId", "order") VALUES(2, 'Order number 2') ;
INSERT INTO "Order" ("orderId", "order") VALUES(3, 'Order number 3') ;

If you need to add ONE item to order 3, you’d do:

    ("orderId", "orderItem", "itemNumber")
    (3, 'whichever item you need', 
     (SELECT coalesce(max("itemNumber"),0) + 1 
        FROM "OrderItem" 
       WHERE "orderId"=3)
    ) ;

That is, when you INSERT, you just look for the max("itemNumber") for your specific “order_Id” and add 1 to it. If this is the first item you insert an item to an order, the COALESCE(___, 0) + 1 will give you a 0 + 1, instead of NULL + 1 (which wouldn’t work).

If you need to add several items to several orders (which isn’t that common for most applications), you can do it in a slightly more complicated fashion, but using the same principle:

-- Assume you want to add items to orders 1 and 2
WITH "orderItemsToAdd" ("orderId", "orderItem") AS
    (1, 'first item, 1st order'), 
    (1, 'second item, 1st order'),
    (2, 'first item, 2nd order'),
    (2, 'second item, 2nd order'),
    (2, 'third item, 2nd order')

-- Get the max item numbers (or zero, via coalesce) for each order

, "maxItemNumbers" AS
    "orderId", coalesce(max("itemNumber"), 0) AS "baseItemNumber"
    LEFT JOIN "OrderItem" USING("orderId")

-- We insert new items, using row_number() + baseItemNumber as the new item numbers
-- (NOTE: orderItemsToAdd should, in practice be ORDERed BY something!)
    "OrderItem" ("orderId", "itemNumber", "orderItem")
    "orderId", "baseItemNumber" + (row_number() over (partition by "orderId")), "orderItem"
    JOIN "maxItemNumbers" USING("orderId")
    * ;

This is what you’ll get:

    ¦ 1 ¦ 1 ¦ first item, 1st order  ¦
    ¦ 1 ¦ 2 ¦ second item, 1st order ¦
    ¦ 2 ¦ 1 ¦ first item, 2nd order  ¦
    ¦ 2 ¦ 2 ¦ second item, 2nd order ¦
    ¦ 2 ¦ 3 ¦ third item, 2nd order  ¦

You can check it at rextester.

You need to be aware of:

As a side note, if you use PostgreSQL, I’d advise using underscored_lowercase_identifiers instead of camelCaseOnes. You save a lot of time not having to type " (and the risk of forgetting them) 😉

Second side-note: this approach can lead to race conditions (and transactions having to be aborted then retried) if many people try to change the same order at the same time. See comments by @Erwin.

Leave a Reply

Your email address will not be published. Required fields are marked *