optimize design for a table with 50 columns [closed]

Posted on

Question :

I have a table with 50 column and one of my friend has told me to split the table in 4 different table to save 12 column in each table. He proposed a design to have Master table and refer the PK of that master table in other four table, and then create NONCLUSTERED INDEX on the new table in all column as the limit is 900 bytes.

The idea was to improve the performance and convert key lookup to seek. I am not sure that it will really improve performance using join rather I think it hamper the performance, Please suggest me what to do here.


Existing Table Test

  B BIGINT - FK --From some other table in DB
 ,C1 -Varchar(100),C2 -Varchar(50),C3 -Varchar(50),,C4 -Varchar(50),C5 -Varchar(50),C6 -Varchar(50),C7 -Varchar(50),C8 -Varchar(50) ,C9 -Varchar(50),C10 -Varchar(50)
 ,D1 -Varchar(100),D2 -Varchar(50),D3 -Varchar(50),,D4 -Varchar(50),D5 -Varchar(50), ,D6 -Varchar(50), D7 -Varchar(50),D8 -Varchar(50),D9 -Varchar(50),D10 -Varchar(50)
 ,E1 -Varchar(100),E2 -Varchar(50),E3 -Varchar(50),,E4 -Varchar(50),E5 -Varchar(50),,E6 -Varchar(50), E7 -Varchar(50),E8 -Varchar(50) ,E9 -Varchar(50) ,E10 -Varchar(50)
  F1 -Varchar(100),F2 -Varchar(50),F3 -Varchar(50),,F4 -Varchar(50),F5 -Varchar(50), ,F6 -Varchar(50),F7 -Varchar(50),F8 -Varchar(50) ,F9 -Varchar(50),F10 -Varchar(50)

Now the proposed design is to have four table with column name as

  A BIGINT - FK --From the Master table Test
  ,C1 -Varchar(100),C2 -Varchar(50),C3 -Varchar(50),,C4 -Varchar(50),C5 -Varchar(50),
   C6 -Varchar(50),C7 -Varchar(50),C8 -Varchar(50) ,C9 -Varchar(50),C10 -Varchar(50)

Similar way other tables are, and the these tables are used in reading data. With big table Test here, I think covering index to use with big table Test.


I have requirement to read the data from these table after there are added and and in some case they will be update with status column set to 0 after some days and these rows will be no longer used and the table will have new fresh entry for the same user.

The target is to optimize the data read operation and also minimize the time in insert and update, which one would be a better choice here? Please help.

Answer :

I believe it’s necessary to refer what kind of data your table is storing. Let’s say that from C1-C10 you’re keeping one group of data and it may be possible that the data in C1-C10 may be redundant (ie, it’s possible that you may have the same row of data from C1-C10 throughout your Master Table). Hence, it’s wasteful to have redundant data because you’re wasting storage space to store the same repeating row of data.

So, what your friend advising you to do is called Normalization. Normalization is a process in which we systematically examine relations for anomalies and, when detected, remove those anomalies by splitting up the relation into two new, related, relations. Normalization can also be thought of as a trade-off between data redundancy and performance. Normalizing a relation reduces data redundancy but introduces the need for joins when all of the data is required by an application such as a report query.

Recall, the Relational Model consists of the elements: relations, which are made up of attributes.

A relation is a set of attributes with values for each attribute such that:

  • Each attribute (column) value must be a single value only.
  • All values for a given attribute (column ) must be of the same
    data type.
  • Each attribute (column) name must be unique.
  • The order of attributes (columns) is insignificant
  • No two tuples (rows) in a relation can be identical.
  • The order of the tuples (rows) is insignificant.

So, if there’s a possibility that the data may be redundant, you may need to separate it to a different table as mentioned by your friend. It will be a pain to join the table together but consider it as a tradeoff towards performance and storage capacity.

Leave a Reply

Your email address will not be published. Required fields are marked *