I’m designing a database that will store users of varying types. Predominantly (but not exclusively) they will be Actors, Directors and Writers. Currently there are just four user types that are of relevance. There is a outside chance that this number may increase, but the probability is low – and in such a case would be by a very small number.
The plan is to have a
users table responsible pretty much solely for logging into the site (
password columns plus one or two others such as whether they’ve been approved, and updated_at), and additional tables for each of the respective user types that each have their own unique set of columns. Only actors, for example, will have an ethnicity column, only Directors would have a bio column, and only Writers would need to provide their location. However, as I’ve not managed a database of this complexity before, I’m wondering how to organise a couple of aspects:
Firstly, users can be any one, or any combination, of the above types. So I understand I would need something like (for example) a
director_user table with
user_id columns. Would this then be sufficient to be able to filter all users by role type and so on?
Secondly, most users will the option of a twitter profile and phone number. And all actors will have to include at least one URL for any of their other online actor profiles; currently there are three that they can include, but this number may increase. Am I right in assuming that a separate table for each of the possible profiles/contact methods is an optimal way to organise data?
According to my interpretation of your description of the relevant business context, you are dealing with a supertype-subtype1 structure where (a) Actor, Director and Writer are entity subtypes of (b) Person, their entity supertype, and (c) said subtypes are not mutually exclusive.
In this way, if you are interested in building a relational database that mirrors such a scenario accurately â€”and hence are expecting that it functions as suchâ€”, your following comment clarifications are quite significant with respect to the previous points, because they have implications at both (1) the conceptual and (2) the logical levels of representation of the database in question:
- [â€¦] additional tables for each of the respective user types that each have their own unique set of columns.
- [â€¦] there are just four user types that are of relevance. There is a outside chance that this number may increase, but the probability is low – and in such a case would be by a very small number.
I will elaborate on all those aspects and several other critical factors in the sections below.
In order to first define the corresponding conceptual schema â€”which can be used as a subsequent reference so that you can adapt it to make certain that it meets the exact informational requirementsâ€”, I have formulated some business rules that are of particular importance:
- A Person may carry out one-two-or-three (i.e., one-to-all) Roles2. In other words, a Person may be
- an Actor and/or
- a Director and/or
- a Writer.
- A Person may log in via zero-or-one UserProfile.
- An Actor provides one-two-or-three URLs3.
- An Actor is grouped by one Ethnicity.
- An Ethnicity groups zero-one-or-many Actors.
- A Writer is based in one Location.
- A Location is the base of zero-one-or-more Writers.
Expository IDEF1X diagram
Then, I created the IDEF1X4 diagram shown in Figure 1, which groups all the formulations above along with other rules that appear pertinent:
As demonstrated, the Person supertype (i) has its own box, (ii) possesses the properties or attributes that apply to all the subtypes, and (iii) presents lines that connect it with the boxes of every subtype.
In turn, every subtype (a) appears in its own dedicated box, and (b) holds only its applicable properties. The PRIMARY KEY of the supertype, PersonId, migrates5 to the subtypes with the role names6 ActorId, DirectorId, and WriterId respectively.
Also, I avoided coupling Person with the UserProfile entity type, which permits separating all their contextual implications, associations or relationships, etc. The PersonId property has migrated to UserProfile with the role name UserId.
You state in the question body that
And all actors will have to include at least one URL for any of their other online actor profiles; currently there are three that they can include, but this number may increase.
â€¦so URL is an entity type in its own right, and is directly associated with the Actor subtype in accordance with this quote.
And, in comments, you specify that
[â€¦] an actor will have a headshot (photo), while a writer will not [â€¦]
â€¦then, among other features, I included Headshot as a property of the Actor entity type.
As for the Ethnicity and the Location entity types, they of course may entail more complex organizations (e.g., an Actor may belong to one, two or more different ethnic groups in distinct proportions, and a Writer may be based on a place that requires recording country, administrative region, county, etc.) but it looks like that the needs of your business context are covered successfully with the structures here modelled.
Naturally, you can make as many adjustments as necessary.
Illustrative SQL-DDL logical design
Consequently, based on the IDEF1X diagram shown and described above, I wrote the logical DDL layout that is shown as follows (I have supplied notes as comments that explain some of the characteristics that I esteem particularly important with respect to the tables, columns and constraints declared):
-- You should determine which are the most fitting -- data types and sizes for all your table columns -- depending on your business context characteristics. -- Also, you should make accurate tests to define the -- most convenient INDEX strategies based on the exact -- data manipulation tendencies of your business needs. -- As one would expect, you are free to utilize -- your preferred (or required) naming conventions. CREATE TABLE Person ( -- Represents the supertype. PersonId INT NOT NULL, FirstName CHAR(30) NOT NULL, LastName CHAR(30) NOT NULL, BirthDate DATE NOT NULL, GenderCode CHAR(3) NOT NULL, TwitterProfile CHAR(30) NOT NULL, PhoneNumber CHAR(30) NOT NULL, EmailAddress CHAR(30) NOT NULL, CreatedDateTime DATETIME NOT NULL, -- CONSTRAINT Person_PK PRIMARY KEY (PersonId), CONSTRAINT Person_AK1 UNIQUE ( -- Composite ALTERNATE KEY. FirstName, LastName, GenderCode, BirthDate ), CONSTRAINT Person_AK2 UNIQUE (TwitterProfile), -- ALTERNATE KEY. CONSTRAINT Person_AK3 UNIQUE (EmailAddress) -- ALTERNATE KEY. ); CREATE TABLE Ethnicity ( -- Its rows will serve a â€œlook-upâ€ purpose. EthnicityId INT NOT NULL, Name CHAR(30) NOT NULL, Description CHAR(30) NOT NULL, CreatedDateTime DATETIME NOT NULL, -- CONSTRAINT Ethnicity_PK PRIMARY KEY (EthnicityId), CONSTRAINT Ethnicity_AK UNIQUE (Description) ); CREATE TABLE Actor ( -- Stands for one of the subtypes. ActorId INT NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY. Headshot CHAR(30) NOT NULL, -- May, e.g., contain a URL indicating the path where the photo file is actually stored. EthnicityId INT NOT NULL, CreatedDateTime DATETIME NOT NULL, -- CONSTRAINT Actor_PK PRIMARY KEY (ActorId), CONSTRAINT ActorToPerson_FK FOREIGN KEY (ActorId) REFERENCES Person (PersonId), CONSTRAINT ActorToEthnicity_FK FOREIGN KEY (EthnicityId) REFERENCES Ethnicity (EthnicityId) ); CREATE TABLE Director ( -- Denotes one of the subtypes DirectorId INT NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY. Bio CHAR(120) NOT NULL, Etcetera CHAR(30) NOT NULL, CreatedDateTime DATETIME NOT NULL, -- CONSTRAINT Director_PK PRIMARY KEY (DirectorId), CONSTRAINT DirectorToPerson_FK FOREIGN KEY (DirectorId) REFERENCES Person (PersonId) ); CREATE TABLE Country ( CountryCode CHAR(2) NOT NULL, Name CHAR(30) NOT NULL, CreatedDateTime DATETIME NOT NULL, -- CONSTRAINT Country_PK PRIMARY KEY (CountryCode), CONSTRAINT Country_AK UNIQUE (Name) ); CREATE TABLE Location ( -- Its rows will serve a â€œlook-upâ€ purpose. CountryCode CHAR(2) NOT NULL, LocationCode CHAR(3) NOT NULL, Name CHAR(30) NOT NULL, CreatedDateTime DATETIME NOT NULL, -- CONSTRAINT Location_PK PRIMARY KEY (CountryCode, LocationCode), CONSTRAINT Location_AK UNIQUE (CountryCode, Name) ); CREATE TABLE Writer ( -- Represents one of the subtypes. WriterId INT NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY. CountryCode CHAR(2) NOT NULL, LocationCode CHAR(3) NOT NULL, CreatedDateTime DATETIME NOT NULL, -- CONSTRAINT Writer_PK PRIMARY KEY (WriterId), CONSTRAINT WriterToPerson_FK FOREIGN KEY (WriterId) REFERENCES Person (PersonId), CONSTRAINT WriterToLocation_FK FOREIGN KEY (CountryCode, LocationCode) REFERENCES Location (CountryCode, LocationCode) ); CREATE TABLE UserProfile ( UserId INT NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY. UserName CHAR(30) NOT NULL, Etcetera CHAR(30) NOT NULL, CreatedDateTime DATETIME NOT NULL, -- CONSTRAINT UserProfile_PK PRIMARY KEY (UserId), CONSTRAINT UserProfile_AK UNIQUE (UserName), -- ALTERNATE KEY. CONSTRAINT UserProfileToPerson_FK FOREIGN KEY (UserId) REFERENCES Person (PersonId) ); CREATE TABLE URL ( ActorId INT NOT NULL, Address CHAR(90) NOT NULL, Etcetera CHAR(30) NOT NULL, AddedDateTime DATETIME NOT NULL, -- CONSTRAINT URL_PK PRIMARY KEY (ActorId, Address), -- Composite PRIMARY KEY. CONSTRAINT URLtoActor_FK FOREIGN KEY (ActorId) REFERENCES Actor (ActorId) );
This has been tested in this db<>fiddle that runs on MySQL 8.0.
Therefore, (1) every singular aspect of the logical layout above carries a very precise meaning from (2) a singular feature of the business environment of interest7 â€”in agreement with the spirit of the relational framework by Dr. Edgar Frank Coddâ€”, because:
Each base table represents an individual entity type.
Each column stands for a single property of the respective entity type.
A specific data type is fixed for each column in order to ensure that all the values it contains belong to a particular and properly-delimited set a , be it INT, DATETIME, CHAR, etc (and let us hope that MySQL will finally incorporate DOMAIN support in near-future version).
Multiple constraints are configured (declaratively) in order to guarantee that the assertions in form of rows retained in all the tables comply with the business rules determined at the conceptual level.
Each row is meant to convey well-defined semantics, e.g., a
Personrow is read:
The Person identified by PersonId
ris called by the FirstName
sand the LastName
t, was born on BirthDate
u, has the GenderCode
v, tweets on the TwitterProfile
w, is reached through PhoneNumber
x, is contacted via the EmailAddress
y, and was registered on CreatedDateTime
Having a layout like this is decidedly favourable, as you can derive new tables (e.g., SELECT operations that gather columns FROM multiple tables with the help of the JOIN clause) that â€”in successionâ€” carry a very precise meaning too (see the section entitled â€œViewsâ€ below).
It is woth to mention that, with this configuration, (i) a row representing a subtype instance is identified by (ii) the same PRIMARY KEY value that distinguishes the row denoting the complementary supertype occurrence. Thus, it is more than opportune to note that
- (a) attaching an extra column to hold system-generated and system-assigned surrogates8 to (b) the tables standing for the subtypes is (c) entirely superfluous.
With this logical design, if new subtypes are defined as relevant in your business context, you would have to declare a new base table, but that happens as well when other kinds of entity types are deemed of significance, so said situation would be, in fact, ordinary.
In order to â€œfetchâ€, e.g., all the information that corresponds to an Actor, Director or Writer, you can declare some views (i.e., derived or expressible tables) so that you can SELECT directly from one single resource without having to write the concerning JOINs every time; e.g., with the VIEW declared below, you can obtain the â€œfullâ€ Actor information:
-- CREATE VIEW FullActor AS SELECT P.FirstName, P.Lastname, P.BirthDate, P.GenderCode, P.TwitterProfile, P.PhoneNumber, P.EmailAddress, A.Headshot, E.Name AS Ethnicity FROM Person P JOIN Actor A ON A.ActorId = P.PersonId JOIN Ethnicity E ON E.EthnicityId = A.EthnicityId; --
Of course, you may follow a similar approach in order to retrieve the â€œfullâ€ Director and Writer information:
-- CREATE VIEW FullDirector AS SELECT P.FirstName, P.Lastname, P.BirthDate, P.GenderCode, P.TwitterProfile, P.PhoneNumber, P.EmailAddress, D.Bio, D.Etcetera FROM Person P JOIN Director D ON D.DirectorId = P.PersonId; -- CREATE VIEW FullWriter AS SELECT P.FirstName, P.Lastname, P.BirthDate, P.GenderCode, P.TwitterProfile, P.PhoneNumber, P.EmailAddress, L.Name AS Location, C.Name AS Country FROM Person P JOIN Writer W ON W.WriterId = P.PersonId JOIN Country C ON C.CountryCode = W.CountryCode JOIN Location L ON L.LocationCode = W.LocationCode; --
The DML views here discussed are as well included in this MySQL 8.0 db<>fiddle so that you can see and test them â€œin actionâ€.
1 In some conceptual modelling techniques, supertype-subtype associations are referred to as superclass-subclass relationships.
2 Although you mention that there exist in fact more Roles that a Person may perform, but the three you revealed are good enough to discuss the scenario exposing several important ramifications.
3 But, as you noted, in the future an Actor might eventually provide one-to-many URLs.
4 Integration Definition for Information Modeling (IDEF1X) is a highly recommendable modelling technique that was established as a standard in December 1993 by the United States National Institute of Standards and Technology (NIST). It is based on (a) the early theoretical works authored by the sole originator of the relational model of data, i.e., Dr. E. F. Codd; on (b) the entity-relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.
5 The IDEF1X standard defines key migration as â€œThe modeling process of placing the primary key of a parent or generic [i.e., a supertype] entity in its child or category entity [i.e., a subtype] as a foreign keyâ€.
6 In IDEF1X, a role name is a distinctive label assigned to a FK attribute in order to express the meaning that it holds within the scope of its respective entity type.
7 Except, naturally, for the hypothetical conceptual properties (and logical columns) Director.Etcetera and UserProfile.Etcetera, which are merely placeholders that I used to expose the posibility of adding more properties (and columns) that apply to the corresponding conceptual entity type (and logical table).
8 E.g., appending an additional column with the AUTO_INCREMENT attribute to a table of a database â€œrunningâ€ on MySQL.
You should split this across tables like this (only showing the columns needed to present the concept, not necessarily all columns):
Users ID Username FirstName LastName PasswordHash ... 1 'Joe1' 'Joe' 'Smith' 2 'Freddy' 'Fred' 'Jones' Roles ID RoleType .... 1 'Writer' 2 'Director' 3 'Actor' User_Roles User_ID Role_ID ... 1 1 1 2 2 2 2 3
This gives you a table full of users with all the various user columns, a table of roles, and a linking table to connect the two together.
You can see that Joe1 is both a writer and director by the entries in User_Roles. And Freddy is both a director and an actor.
This also allows you to add more roles later on without changing the system. Just insert records for Producer or Editor or whatever down the line.
So to find all usernames of actors, you have a couple of choices:
Select Distinct Username from Users Where User_ID in (select User_ID from User_Roles where Role_ID = 3)
Or if you don’t know the role_ID number, then:
Select Distinct Username from Users Where User_ID in (Select User_ID from User_Roles where Role_ID = (Select ID from Roles where RoleType = 'Actor') )
Or you can also do this:
select u.Username, r.RoleType from Users u inner join User_Roles ur on ur.User_ID = u.ID inner join Roles r on r.ID = ur.Role_ID where r.RoleType = 'Actor'
(In this version, you could use also use
Where r.Role_ID = 3 to get the same results.)
But I’d use the 1st query and whichever WHERE clause I knew. In a big system, knowing the Role_ID will, generally, run quicker than the text, since numeric data is “easier” and more efficient for most SQL engines to process the indexes for.
As for contact methods or pics or whatever, I’d do them in a similar fashion:
Attributes ID MethodText ... 1 'TwitterID' 2 'URL' 3 'CellPhone' 4 'Email' 5 'PictureLink' Role_Attributes Role_ID Attribute_ID isRequired 3 5 1 3 4 1 3 3 0 User_Attributes User_ID Attribute_ID AttributeData 1 4 'Joe@Example.com' 1 1 '@joe' 1 3 '555-555-5555' 1 5 'www.example.com/pics/myFace.png'
…and so on. These would link in the same way as the users to roles.
This shows that each role has 0 to many attributes, which may be optional. Then each user has 0 to many attributes, with the data for those attributes.
This lets you add new attributes as time goes on, without rewriting any code; just update the attributes and role_attributes tables to match your new rules. It also lets you share attributes between roles, without re-entering the same data for each user. If two roles require pics, then they only have to upload 1 pic to get that requirement filled.