Question :
I have a very strange kind of question in my mind, I’m using the EAV
model in my E-COMMERCE DB. I have a product
table with primary key p_id
which is auto_increment, and to save its attributes I have a different table product_attribute
, that also stores SKU of every product and have p_id
as FOREIGN KEY, which there I am using attribute-value
pair. So, my question is, if I missed storing a product of size
7
of any product of a company A
in product_attribute
table and stored all the other sizes. How, will I know that what was my p_id
for the product I missed at a later time when I want to store it? SHOULD I use UPC instead of p_id
?
Answer :
Auto-increment IDs are what’s called surrogate keys. In most cases, data has some sort of a natural key as well. In your case, it sounds like the natural key is the UPC.
So, if there’s a natural key, why create a surrogate key? There are several reasons. Two of the most important to me are size and changeability.
When a natural key is a varchar
or nvarchar
value, that key is frequently much larger than 4 bytes (the size of an int
). For frequently used keys, the space difference involved can be noticeable.
Also, natural keys are most often values that have a meaning outside of the database. It may happen rarely, but occasionally something like a UPC code may change. If it’s used as the key to your product table, updating all references to it can take a long time, and require a large number of locks.
In your example, I assume that the product of size 7 that didn’t get entered would still have its UPC available. If so, you’d look up the UPC code’s product_id
, and use that value in your EAV table.
If the UPC is not stored in the database, some sort of identifying information still must be. Use the SKU of one of the related products (for example), or (better yet) add an appropriate overall description to the product
record so you can locate it. If there’s nothing related in the database, then logically the product ID could be anything, and wouldn’t have to be connected to any existing data.