Question :
I am completely new to database modelling (except for some academic knowledge from a long time ago) but for my new project I want to build a database for the following application.
I am collecting different income figures from different companies from different product categories. I calculate the aggregate of these figures and calculate some metrics such as averages and growth of the figures over the years/over product categories and over the sectors of these companies. I wanted to build a database to accomplish this and already made the following draft. It is based on a simple star model but I would like to get your input on what can be better.
Answer :
I would like to get your input on what can be better.
There is no way to judge the design of a Dimensional Data Model without an understanding of the business context and the kinds of questions it’s intended to answer.
So make a list of the questions you would like to answer from the data model, providing a specific example for each type of question.
Translate each question to a SQL query against your model.
The design is good to the extent the SQL queries are simple for a human to write and understand, and for an RDBMS to execute.
Well, this does not exactly look like a simple star schema to me. Anyway, maybe first decide if you are designing an analytical (OLAP) or a transactional (OLTP) system. If you are building an analytical/reporting DB, then decide what (where, and how) is going to provide data integrity.
In the diagram, it is possible to insert a row in statements for a company, about a product from a sector that that company does not operate in. Not good for a transactional DB. Ok for an OLAP system, where the integrity is taken care of by a separate ETL process, and a transactional system the data is coming from.
For a star schema I would suggest a fact statements with four dimensions (company, product, sector, time) around it — all FKs are from the fact to dimensions.
For an OLTP type DB (based on the diagram) something like:
-- Industry sector SEC exists.
--
sector {SEC}
PK {SEC}
-- Product type PTY is made by companies
-- in industry sector SEC.
--
product_type {PTY, SEC}
PK {PTY}
SK {PTY, SEC}
FK {SEC} REFERENCES sector {SEC}
-- Company CMP operates in industry sector SEC.
--
company {CMP, SEC}
PK {CMP}
SK {CMP, SEC}
FK {SEC} REFERENCES sector {SEC}
-- Time period TIM exists.
--
time_dim {TIM}
PK {TIM}
-- Company CMP, from industry sector SEC,
-- issued a statement about product type
-- PTY (from that sector), for period TIM.
--
statement {CMP, SEC, PTY, TIM}
PK {CMP, PTY, TIM}
FK1 {CMP, SEC} REFERENCES company {CMP, SEC}
FK2 {PTY, SEC} REFERENCES product_type {PTY, SEC}
FK3 {TIM} REFERENCES time_dim {TIM}
Note:
All attributes (columns) NOT NULL
PK = Primary Key
SK = Proper Superkey (Unique)
FK = Foreign Key
BTW, ERDs are great only if you already grok how to automatically verbalize diagrams. If not yet, then use plain text editor, it is better and more powerful than an ERD.