I have a
tasks table in my database and, in the business domain of interest, tasks can have multiple states: “open”, “begun”, “in review” and “completed”.
Despite having “open” and “begun” in the same table, identified using flags, I decided for some reason early on to have “in review” and “completed” tasks in their own separate tables. While this seemed like a good idea to start off with, making it easy to query for specific things, when I thought about implementing a comment feature for tasks I realised that the
task_id value would be changing across the three tables.
Have I gone about this all wrong?
Some solutions I thought up would be to redo the organisation and put everything in one table, using flags to differentiate purely between the states (this would lead to a fair amount of reworking); to create some kind of
UUID that is transferable across the tables – though I think there might issues with performance if this is done; and finally, to set the
IDs across the tables to be no-longer auto incremented and to simply “inherit” the original value from the first
task table (which auto increments).
What’s the done thing in this situation? I’m not the most versed in database design and I’m kind of making this up as I go along.
Any help would be greatly appreciated.
Responses to further information requests made by @MDCCL via comments
So, the state (or status) of a task can change over time, right? How many times can the same concrete task present the same state, let us say, “open”? Will it vary depending on the kind of state presented by the relevant task?
Yes, a task.state could theoretically change up to four times. I won’t go into specifics of the logic but the longest chain would be
“begun” → “completed” → “review” → “completed”. And each time, currently, they’re being put into another table.
The exception is whether it’s “begun” or not, a user may abandon and then re-undertake the task infinitely – in theory at least. Which is at least helped by the fact that the “begun” status is a flag.
Ok, but a certain task will present, at all times, exactly one particular “present” state, right? For example, when a
taskrow is INSERTed, it should be accompannied by a certain
statevalue, which I assume would be “open”. Later, the state of such a task will evolve to only one of the other possible options, that is, either “begun” or “in review” or “completed”.
Yea, that’s exactly right. A task will always be in a stage, and can’t be in multiple ones.
Once a particular task is deemed as “completed”, can it change to another state?
Yes, it can be sent back to “review” if certain parameters are reached. And from “review” it can either go back to “begun” or back to “completed”.
The way I see it, the fact that (a) a concrete Task can present distinct States at different points in time entails (b) the creation of a table that is going to contain a time series, as I will detail below.
In order to define the structure and constraints of a relational database, it is paramount to first identify and formulate the corresponding business domain rules (i.e., delineate the conceptual schema) with precision, taking into account the relevant entity types, properties and the corresponding interconnections. In this way, some of the formulations that are especially significant in the scenario under consideration are:
- A Task presents one-to-many StateSpecifications
- A Task cannot present more than one StateSpecification at the same Instant
- A StateSpecification is considered Current
- from the particular Instant in which it is entered
- until the exact Instant when the successive
StateSpecification is entered
- A State defines zero-one-or-many StateSpecifications
- A Task receives zero-one-or-many Comments
Consequently, based on the aforementioned formulations, I created the IDEF1X† diagram shown in Figure 1:
As you can see,
State are depicted as individual entity types, each with its own group of properties (or attributes) and its own (direct) associations (or relationships), which are expressed via the corresponding verb phrases, lines, cardinalities and FOREIGN KEY (FK) marks.
The entity type denominated
StateSpecification, associated with the two brought up before, is a central aspect in the solution I will explain below.
Expository logical design and sample data
I shaped an expository logical design, based on the IDEF1X diagram above, by means of the DDL structure that follows:
-- You should determine which are the most fitting -- data types and sizes for all your table columns -- depending on your business context characteristics. -- Also, you should make accurate tests to define the most -- convenient physical implementation settings; e.g., -- a good indexing strategy based on query tendencies. -- As one would expect, you are free to use your -- preferred or required naming conventions. CREATE TABLE Task ( TaskNumber INT NOT NULL, Description CHAR(90) NOT NULL, Etcetera CHAR(30) NOT NULL, CreatedDateTime DATETIME NOT NULL, -- CONSTRAINT Task_PK PRIMARY KEY (TaskNumber) ); CREATE TABLE State ( StateCode CHAR(1) NOT NULL, Name CHAR(30) NOT NULL, Description CHAR(60) NOT NULL, -- CONSTRAINT State_PK PRIMARY KEY (StateCode), CONSTRAINT State_AK1 UNIQUE (Name), -- ALTERNATE KEY. CONSTRAINT State_AK2 UNIQUE (Description) -- ALTERNATE KEY. ); CREATE TABLE StateSpecification ( TaskNumber INT NOT NULL, SpecifiedDateTime DATETIME NOT NULL, StateCode CHAR(1) NOT NULL, -- CONSTRAINT StateSpecification_PK PRIMARY KEY (TaskNumber, SpecifiedDateTime), -- Composite PRIMARY KEY. CONSTRAINT StateSpecification_to_Task_FK FOREIGN KEY (TaskNumber) REFERENCES Task (TaskNumber), CONSTRAINT StateSpecification_to_State FOREIGN KEY (StateCode) REFERENCES State (StateCode) ); CREATE TABLE TaskComment ( TaskNumber INT NOT NULL, CreatedDateTime DATETIME NOT NULL, Body TEXT NOT NULL, -- CONSTRAINT TaskComment_PK PRIMARY KEY (TaskNumber, CreatedDateTime), CONSTRAINT TaskComment_to_Task FOREIGN KEY (TaskNumber) REFERENCES Task (TaskNumber) ); -- --
The State table
State table previously illustrated fulfills a look-up role, and is meant to contain five rows (as per the five State samples you provided):
+-—————————-+-—————————-+-———————————————-+ | StateCode | Name | Description | +-—————————-+-—————————-+-———————————————-+ | O | Open | Indicates that… | +-----------+-----------+-----------------+ | B | Begun | Indicates that… | +-----------+-----------+-----------------+ | R | In review | Indicates that… | +-----------+-----------+-----------------+ | C | Completed | Indicates that… | +-----------+-----------+-----------------+
Note the definition of a PRIMARY KEY (PK) constraint on a column retaining values that are, at the same time, meaninfgul (regarding interpretation by end users and technicians) and small (in terms of bytes at the physical implementation level), which makes it both readable and fast regarding, e.g., data retrieval.
If new business domain States arise, you can of course INSERT the corresponding rows INTO the
The State Specification table
Now, suposse that Task no. 1750 presents the following State History in the form of (unordered) rows held in the
+-——————————-+-———————————————————————-+-—————————-+ | TaskNumber | SpecifiedDateTime | StateCode | +-——————————-+-———————————————————————-+-—————————-+ | 1750 | 2016-12-01 16:58:12.000 | O | +------------+-------------------------+-----------+ | 1750 | 2016-12-02 09:12:05.000 | B | +------------+-------------------------+-----------+ | 1750 | 2016-12-04 10:57:01.000 | R | +------------+-------------------------+-----------+ | 1750 | 2016-12-07 07:33:08.000 | C | +------------+-------------------------+-----------+ | 1750 | 2016-12-08 12:12:09.000 | R | +------------+-------------------------+-----------+ | 1750 | 2016-12-08 19:46:01.000 | B | +------------+-------------------------+-----------+ | 1750 | 2016-12-09 06:24:07.000 | R | +------------+-------------------------+-----------+ | 1750 | 2016-12-11 07:24:07.000 | C | +------------+-------------------------+-----------+
As demonstrated, each value in the
SpecifiedDateTime column indicates the precise point in time in which a given Task (via the
TaskNumber) started presenting a particular State (by virtue of
Seeing that the
StateCode column keeps values that have a quite clear intention, it helps to save the usage of a JOIN clause. It is paramount to make clear that this does not imply that JOINs are something detrimental (they are, in fact, fundamental and powerful instruments in any relational database) but, whenever suitable, reducing the consumption of the valuable resources of the system is decidedly practical.
The Comment table
Comment table, as exemplified above, can easily be related to the
Task table via a FK constraint definition, because the PK value of a
Task will remain the same, no matter if its State undergoes zero, one or more modifications.
I assume that, in the business context of relevance, a Comment (a) will always depend on the existence of a Task occurrence and (b) will always belong to the same indidivual Task occurrence; therefore, I defined the
Comment table with a composite PK that incorporates the PK column of
Task and the exact
DateTime in which a particular
Comment was created. Apart from that, you might like to evaluate the attachment of a column for system-controled surrogates (e.g., a column with the AUTO_INCREMENT attribute in MySQL) to this table, it may or may not offer a better execution performance (probably not), so carrying out some thorough testing sessions would clarify the case (a non-data addition, as is the case of columns holding system-controled surrogates, should always be well assessed and justified).
Data derivation code samples
There are some important data points that will very likely be required with respect to the scenario in question, but they should be obtained as derived (or calculated) values.
For example, if you need to obtain the
CurrentState for Task no. 1750, you can derive it with the next SELECT statement:
SELECT T.TaskNumber, SS.StateCode AS CurrentStateCode, SS.SpecifiedDateTime FROM Task T JOIN StateSpecification SS ON T.TaskNumber = SS.TaskNumber WHERE T.TaskNumber = 1750 -- You can provide a parameter instead of a fixed value. AND SS.SpecifiedDateTime = ( SELECT MAX(SpecifiedDateTime) FROM StateSpecification InnerSS WHERE T.TaskNumber = InnerSS.TaskNumber );
When you face the necessity to get the
CurrentState for all Tasks, you can declare a statement like the one that follows:
SELECT T.TaskNumber, SS.StateCode AS CurrentStateCode, SS.SpecifiedDateTime FROM Task T JOIN StateSpecification SS ON T.TaskNumber = SS.TaskNumber WHERE SS.SpecifiedDateTime = ( SELECT MAX(SpecifiedDateTime) FROM StateSpecification InnerSS WHERE T.TaskNumber = InnerSS.TaskNumber );
If you have to derive the
EndDate of all the
StateSpecifications that correspond to Task no. 1750, you may utilize:
SELECT T.TaskNumber, T.Description, SS.StateCode, SS.SpecifiedDateTime AS StartDateTime, ( SELECT MIN(SpecifiedDateTime) FROM StateSpecification InnerSS WHERE T.TaskNumber = InnerSS.TaskNumber AND InnerSS.SpecifiedDateTime > SS.SpecifiedDateTime ) AS EndDateTime FROM Task T JOIN StateSpecification SS ON T.TaskNumber = SS.TaskNumber WHERE T.TaskNumber = 1750 -- You can provide a parameter instead of a fixed value. ORDER BY StartDateTime DESC;
In case you must obtain the
EndDate of all the
StateSpecifications, i.e., the whole period in which they were current:
SELECT T.TaskNumber, T.Description, SS.StateCode, SS.SpecifiedDateTime AS StartDateTime, ( SELECT MIN(SpecifiedDateTime) FROM StateSpecification InnerSS WHERE T.TaskNumber = InnerSS.TaskNumber AND InnerSS.SpecifiedDateTime > SS.SpecifiedDateTime ) AS EndDateTime FROM Task T JOIN StateSpecification SS ON T.TaskNumber = SS.TaskNumber ORDER BY StartDateTime DESC;
Naturally, one or more of the SELECT statements displayed above can be set up as VIEWs, so that you can arrive at, e.g., the pertinent information directly FROM one single table (a derived one, yes, but it would still be a table).
Responses to comments
I think I shall have to go about performing this modification, and think the system will benefit from the additional functionality of the
StateSpecificationtable. It goes to show that good database design can accommodate all our needs without making things messy.
I could not agree more with your second assertion. A database that is designed properly (conceptually, logically and physically), usually becomes a key element in a software development project. If a relational database reflects the business context of interest with accuracy, it can —to be brief— ease its maintenance substantially and offer vast versatility. That is why it is very useful to analyze carefully the structure of the relevant things, to observe them in terms of what they are in the real world, and then represent them with precision in a database implementation.
The data, in and of itself, is a highly valuable organizational asset and, a consequence, it should be administered as such. A reliable manner to achieve said objective is employing technical means that are supported on sound theory, and in the field of data management there is nothing more sound than the relational model. This is in agreement with the Dr. E. F. Codd’s 1981 Turing Award Lecture, entitled Relational Database: A Practical Foundation for Productivity.
Also, just to clarify, the
StateSpecificationtable has to be updated manually on every change of state, correct?
Basically yes, the
StateSpecification table has to be brought up to date each time that a Task presents a State change. This procedure should be performed by way of an INSERT operation that introduces a new
StateSpecification row that should include a FK value that points to the
Task row that applies.
It is opportune to stress that when a
Task row is INSERTed, it will have to present a
StateSpecification indicating the State deemed as ‘Open’ (by dint of an ‘O’ value retained in the
StateCode column), hence you should make use of well defined ACID TRANSACTIONS so that both the
Task row and its
StateSpecification counterpart are treated as a single unit. In this manner, the two respective INSERT operations can either succeed or fail as a whole, maintaining the involved data consistent with the identified business rules.
You might find of help
as it includes a case that has similar features.
† Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in December 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) the early theoretical work authored by the originator of the relational model, i.e., Dr. E. F. Codd; on (b) the Entity-Relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.
Honestly at this point you would do better to go back and put everything into one table. I’m going to assume that your other two tables have additional information on them, that isn’t on the “base” table.
You can handle that by either having nullable columns or providing default data that you can safely ignore in your business logic in your app. Alternately you can keep the other two tables but consolidate the common functionality into the base table. Then as you query, if an inner join returns the record in both tables, it must be in that status. (however, this also allows for some auditing tricks later as well, which I’m not going into right now)
Because you want to track a singular event through the system of processing, you should have a singular table to track that entity. You can have child tables for additional detail, but what you’ve described is either three distinct models or a broken schema.
I would pursue consolidation.