Question :
I’m working on a database design that extensively uses UUID
s for PRIMARY KEY
s. However, this confronts me with a very consequential choice. How do I name these columns? I would call them uuid
, except that, UUID
being an identifier, I then have to quote the field name everywhere:
CREATE TABLE thingie (
"uuid" UUID PRIMARY KEY DEFAULT public.gen_random.uuid(),
foo VARCHAR,
bar VARCHAR,
);
A straight-forward alternative seems to be to instead call these columns id
:
CREATE TABLE thingie (
id UUID PRIMARY KEY DEFAULT public.gen_random.uuid(),
foo VARCHAR,
bar VARCHAR,
);
That way, I don’t have the column name, and semantically I can argue that a UUID is indeed a sort of ID; in a Venn diagram, the UUID circle would be placed wholly in the ID circle.
However, I (and I am sure many others with) have grown so accustomed to associating id
with an auto-incrementing INTEGER
column that I’m afraid to break some sort of unspoken rule by calling these IDs id
s.
If you would brighten up my confusion with some solid bike-shedding, I would be much obliged. Indeed my question is: How would you call your UUID
-typed surrogate keys and why?
Answer :
You shouldn’t ascribe meaning to your surrogate keys beyond the fact that they are a unique identifier for each row, as such it doesn’t matter what format it takes from the point of view of your schema design. All you should care about at that point is that it is a value that identifies something, so I stick with calling them ID
â°.
In fact what you have with a UUID, unless you have had a brain-fart and stored them as strings¹, is effectively a large integer. You can’t do arithmetic on it³ but the engine doesn’t see it as any different to a bigger BIGINT
being used as a key.
Another general thing is that I try to avoid any name (for a column, table, function, procedure, …) that could be a keyword so need escaping. This is a point against calling them uuid
if the main point for calling them id
is not enough. Of course this can’t be perfect, I would avoid uuid
as I know it is a type name in Postgres but devs who have only ever use SQL Server up to this point might not know that as it isn’t a reserved word (or even a type name) there, and I might well use words that are not portable elsewhere similarly.
If you effectively store multiple surrogate keys due to integrating with another systems that use something else, your internal identifier, is id
. The others are real data with meaning so far as your system is concerned, not actually surrogate keys, and should be named in a way that describes their content or use. Real world examples include ISBN, StaffReferenceNumber/SRN, IRN, PPN, …, but you might also have something less general like SalesForceId or JoesPartStoreId. Even if you don’t define a surrogate key in these circumstancesâ´, keep the meaningful names instead of just id
so it is obvious that your rules probably don’t control their generation and use.
[0] or id or Id, keeping consistent with the casing rules you follow elsewhere just in case your stuff ends up being interpreted in a case sensitive way later.
[1] I still maintain a legacy system that made this mistake two decades ago, a little before my time. It all works but of course there are storage size and performance “considerationsâ€, and interesting bugs² lead to invalid UUID values such as empty-string turning up.
[2] I’m also counting past bad colleagues, including that young David Spillett who messed up a few such times many years ago, as bugs in the business!
[3] Well, you could if you tried hard enough, but not with any built-in functions.
[4] I would always have a surrogate key in these cases anyway. This means only your bugs can cause issues like duplicate keys or the need for expensive primary key value changes that affect many foreign keys, instead of being beholden to dealing with other systems’ bugs too.
“uuid” is not an SQL key word. Neither in standard SQL, nor in PostgreSQL. It’s the name of the data type uuid
– which, technically, does not stand against using “uuid” as identifier (even without double-quotes).
“id” is a non-reserved key word in standard SQL since SQL:2011. But non-reserved key words can be used as identifier freely.
So you can use either. Doesn’t mean you should. My personal opinion is that “id” (as well as “uuid”) are not sufficiently descriptive names. I consider the wide-spread use of “id” to be an anti-pattern. Any non-trivial query in a relational database joins multiple tables. Then you end up with multiple columns, all named “id”, and you have to table-qualify or deal out aliases. While you still can (or even should) do that, it’s better if don’t have to.
On top of that, I avoid basic type names as identifiers. Unnecessary confusion. Confusing error messages, problems with searching, typo traps, etc.
Use descriptive, legal, lower-case, unquoted identifiers, as short as possible, as long as necessary.
Related:
My preferred naming pattern is to use “foo_id” as name for a surrogate primary key column of a table named “foo”. It’s the ID of my foo entity, no matter its data type. And I use the same name for any foreign key column referencing it (with the same content). The label on the column represents what’s in it, queries are clear. Quick ad-hoc queries can join tables with USING (foo_id)
.
I do not normally include the data type in the column name. (Except when that’s common practice for the term.) That’s just too noisy. I don’t call my price column “price_numeric”, nor my name column “surname_text”. What if you have to change the data type of your foo_id
column from integer
to bigint
? You wouldn’t want to change the column name and propagate that change to all written code …
Accordingly, I use “foo_id”, not “uuid” or “id_uuid”. The regular user only needs to know that “foo_id” is a unique (PK) ID column of the table “foo”. The data type is an implementation detail interesting for special / advanced purposes.