Question :
How should one generate non-unique, non-natural identifiers for weak entities?
For example, if order_id
is the primary key for an order
table, and (order_id, item_number)
is the primary key for an order_item
table with a foreign key on order_id
, how best to generate item_number
?
A few of possibilities come to mind, but none seem ideal:
-
Autoincrement
item_number
: theorder_item
entity is no longer weak, and the composite key is redundant. -
Use a trigger to search for the current max
item_number
for a givenorder_id
, then increment: if a row is deleted this could lead to reassigning a PK to a different record – that doesn’t seem like a good idea? (edit: this can also be done without using triggers as explained in joanolo’s answer) -
Use a trigger to create a new sequence for every
order_id
, and somehow drawitem_number
s from the appropriate sequence: this is functionally the desired behaviour, but seems like it would be a mess to implement. Is it even possible to reference a sequence by anorder_id
?
Edit – closely related (if not duplicate):
Answer :
1. is the least error-prone, simplest and fastest.
Trigger solutions like in 2. or 3. are subject to subtle race conditions under concurrent write access.
Make item_number
a serial
column and also the PK for order_item
in this case. Stick with the default values drawn from the underlying sequence and never update the column.
Create a multi-column index on (order_id, item_number)
for performance of typical queries. (Might as well be UNIQUE
, but does not have to be.) In a typical setup (order_id
and item_number
can both be plain integer
), the multicolumn index happens to be just as small and fast as an index on just order_id
:
(Like I commented:) Typically, the only important role of an item number is to be unique (and immutable). If you need a stable sort order among items, you might just rely on the serial value of item_number
. Be aware that those numbers are not necessarily in order of transaction commits. It may be useful to add the transaction timestamp current_timestamp
(or possibly statement_timestamp()
or clock_timestamp()
) to the row. Depends on requirements and access patterns.
You can add a VIEW
for the human eye, with item-numbers per order_id
starting from 1, dynamically generated with row_number()
, ordered by above criteria. But operate with the unique, immutable item_number
internally.
You can have a fourth alternative, that is consistent with your definition, where the database takes care of itemNumbers when you INSERT
your data; without the need for triggers. [Although you could use them as well.]
Assuming this is your schema:
CREATE TABLE "Order"
(
"orderId" integer primary key, -- this should probably be serial
"order" text not null
) ;
CREATE TABLE "OrderItem"
(
"orderId" integer not null REFERENCES "Order"("orderId")
ON DELETE CASCADE ON UPDATE CASCADE,
"itemNumber" integer not null,
"orderItem" text,
PRIMARY KEY ("orderId", "itemNumber")
) ;
and that we have three orders:
INSERT INTO "Order" ("orderId", "order") VALUES(1, 'Order number 1') ;
INSERT INTO "Order" ("orderId", "order") VALUES(2, 'Order number 2') ;
INSERT INTO "Order" ("orderId", "order") VALUES(3, 'Order number 3') ;
If you need to add ONE item to order 3, you’d do:
INSERT INTO
"OrderItem"
("orderId", "orderItem", "itemNumber")
VALUES
(3, 'whichever item you need',
(SELECT coalesce(max("itemNumber"),0) + 1
FROM "OrderItem"
WHERE "orderId"=3)
) ;
That is, when you INSERT
, you just look for the max("itemNumber")
for your specific “order_Id” and add 1 to it. If this is the first item you insert an item to an order, the COALESCE(___, 0) + 1 will give you a 0 + 1, instead of NULL + 1 (which wouldn’t work).
If you need to add several items to several orders (which isn’t that common for most applications), you can do it in a slightly more complicated fashion, but using the same principle:
-- Assume you want to add items to orders 1 and 2
WITH "orderItemsToAdd" ("orderId", "orderItem") AS
(
VALUES
(1, 'first item, 1st order'),
(1, 'second item, 1st order'),
(2, 'first item, 2nd order'),
(2, 'second item, 2nd order'),
(2, 'third item, 2nd order')
)
-- Get the max item numbers (or zero, via coalesce) for each order
, "maxItemNumbers" AS
(
SELECT
"orderId", coalesce(max("itemNumber"), 0) AS "baseItemNumber"
FROM
"orderItemsToAdd"
LEFT JOIN "OrderItem" USING("orderId")
GROUP BY
"orderId"
)
-- We insert new items, using row_number() + baseItemNumber as the new item numbers
-- (NOTE: orderItemsToAdd should, in practice be ORDERed BY something!)
INSERT INTO
"OrderItem" ("orderId", "itemNumber", "orderItem")
SELECT
"orderId", "baseItemNumber" + (row_number() over (partition by "orderId")), "orderItem"
FROM
"orderItemsToAdd"
JOIN "maxItemNumbers" USING("orderId")
RETURNING
* ;
This is what you’ll get:
+--------------------------------+
¦ 1 ¦ 1 ¦ first item, 1st order ¦
¦---+---+------------------------¦
¦ 1 ¦ 2 ¦ second item, 1st order ¦
¦---+---+------------------------¦
¦ 2 ¦ 1 ¦ first item, 2nd order ¦
¦---+---+------------------------¦
¦ 2 ¦ 2 ¦ second item, 2nd order ¦
¦---+---+------------------------¦
¦ 2 ¦ 3 ¦ third item, 2nd order ¦
+--------------------------------+
You can check it at rextester.
You need to be aware of:
- row_number() window function
- Window function calls
- Window Function Processing
As a side note, if you use PostgreSQL, I’d advise using underscored_lowercase_identifiers
instead of camelCaseOnes
. You save a lot of time not having to type "
(and the risk of forgetting them) 😉
Second side-note: this approach can lead to race conditions (and transactions having to be aborted then retried) if many people try to change the same order at the same time. See comments by @Erwin.