Need Advice on Normalizing Database

Posted on

Question :

I have a group of database tables that lack normalization. They all have a similar primary key called ‘doc_index’, which is either a PK by itself or part a composite PK.

As you can see below, there are no foreign key (FK) columns in any of the tables. Given what I have below could some one show be how to best have PK and FK relationships on these tables?

The major table is tblDocument. Which in my opinion has the main PK doc_index column. All other tables revolve around tblDocument.

See script below:

CREATE TABLE [dbo].[tblDocument](
    [doc_index] [int] IDENTITY(1,1) NOT NULL,
    [annual_review_reqd] [bit] NOT NULL,
    [available] [nvarchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [doc_number] [nvarchar](18) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [doc_status_code] [nvarchar](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [doc_subtype_code] [nvarchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [doc_type_code] [nvarchar](15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [file_location] [nvarchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [inactive_reason] [nvarchar](45) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [is_form] [bit] NOT NULL,
    [keywd_index] [nvarchar](90) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [opr_agency_code] [nvarchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [original_ao] [nvarchar](25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [sec_class_code] [nvarchar](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [special_tag] [nvarchar](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [special_tag_rmk] [nvarchar](30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [stocked_char] [nvarchar](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [internet_approved] [nvarchar](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [title] [nvarchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [SSMA_TimeStamp] [timestamp] NOT NULL,
    [birth_date] [datetime] NULL,
    [cancellation_date] [datetime] NULL,
    [doc_num_assn_date] [datetime] NULL,
    [next_review_date] [datetime] NULL,
 CONSTRAINT [tblDocument$PrimaryKey] PRIMARY KEY CLUSTERED 
(
    [doc_index] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]


CREATE TABLE [dbo].[tblForm](
    [doc_index] [int] NOT NULL,
    [elect_auth_code] [nvarchar](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [is_electronic] [bit] NOT NULL,
    [prescr_doc_index] [int] NULL,
    [privacy_act] [bit] NOT NULL,
    [unit_issue_code] [nvarchar](7) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [SSMA_TimeStamp] [timestamp] NOT NULL,
    [when_electronic] [datetime] NULL,
 CONSTRAINT [tblForm2$PrimaryKey] PRIMARY KEY CLUSTERED 
(
    [doc_index] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[tblReview_Remark](
    [doc_index] [int] NOT NULL,
    [rmk_index] [smallint] NOT NULL,
    [rmk_initials] [nvarchar](8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [rmk_text] [nvarchar](165) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [rvw_created_date] [datetime] NOT NULL,
    [rmk_created_date] [datetime] NULL,
 CONSTRAINT [tblReview_Remark2$PrimaryKey] PRIMARY KEY CLUSTERED 
(
    [rvw_created_date] ASC,
    [doc_index] ASC,
    [rmk_index] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[tblReview](
    [Field1] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [doc_index] [int] NOT NULL,
    [action_number] [nvarchar](15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [action_officer] [nvarchar](25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [ao_phone] [nvarchar](8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [rvw_status_code] [nvarchar](16) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [rvw_type_code] [nvarchar](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [rvw_created_date] [datetime] NOT NULL,
    [rvw_completen_date] [datetime] NULL,
    [rvw_tasked_date] [datetime] NULL,
    [rvw_suspense] [datetime] NULL,
 CONSTRAINT [tblReview3$PrimaryKey] PRIMARY KEY CLUSTERED 
(
    [doc_index] ASC,
    [rvw_created_date] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[tblJs_Document](
    [doc_index] [int] NOT NULL,
    [cmd_distribution] [nvarchar](15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [fiscal_year] [nvarchar](5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [num_pages] [smallint] NULL,
    [system_char] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [cjcs_jel_www] [bit] NULL,
    [jcs_jwfc_lan] [bit] NULL,
    [jcs_jel_cd_rom] [bit] NULL,
    [cinc_cd_rom] [bit] NULL,
    [controled_internet] [bit] NULL,
    [SSMA_TimeStamp] [timestamp] NOT NULL,
 CONSTRAINT [tblJs_Document$PrimaryKey] PRIMARY KEY CLUSTERED 
(
    [doc_index] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Any help would be greatly appreciated.
Thanks.

Answer :

Clearly [doc_index] is the PK in tblDocument linked to by the FK’s in all the other tables. While they may not be actually linked (i.e. the FK may be enforced in stored procs or code), it appears the database is actually normalised in terms of having PK/FK’s.

The first step though is to ensure that every other table that has [doc_index] has a corresponding record in the tblDocument field. So you will need to run something like:

SELECT * FROM tblForm
WHERE doc_index NOT IN (SELECT doc_index FROM tblDocument)

This needs to be run for every “FK” table (tblForm/tblReview_Remark/tblReview/tblJs_Document). If any are missing, they need to be deleted or fixed to link to an existing record or a new record in tblDocument to be made that reflects this record (or potentially a new tblDocument record that reads “Unknown”).

You then have a bit of work to do to be able to put in proper FKs. I think the best way forward would probably be to clean it up, which is what I am detailing:

  • It appears the tblForm may in fact just have a 1:1 relationship with tblDocument. This needs to be checked, it if is the case, just create new fields in tblDocument and do away with the table.
  • tblReview appears to be a M:1 relationship with tblDocument. This is a good candidate for a surrogate key, recommend you create one for the table, remove the Primary Key constraint then apply it to the new surrogate key (I will call it ReviewID from now on). “Field1” in this table sounds highly suspect, remove it if it is full of garbage and/or not needed.
  • tblReview_Remark by its name sounds like it should be linked to tblReview in a M:1 configuration. Create a new field called “ReviewID” and use the current primary key fields to decide which ReviewID (created in the last step) to link to. Something like:

    UPDATE tblReview_Remark SET ReviewID = r.ReviewID
    FROM tblReviewRemark rr
    INNER JOIN tblReview r ON rr.rvw_created_date = r.rvw_created_date
    AND rr.doc_index = r.doc_index

Hopefully you should have every record linked, but check for records that have nothing in the ReviewID column. Those records may need to be manually checked or deleted.

  • tblJsDocument (shake’s head), check if this table is used anywhere and if it even needs to exist. Sometimes DBAs make copies of tables for testing purposes and forget to clean up – this could be one such case. If it IS needed, it appears to be a 1:1 relationship with your tblDocument, recreate the fields in tblDocument and merge over joining on doc_index.

Once all of that clean up stuff is done, you should have doc_index as a Primary Key, ReviewID as a Primary Key on tblReview. You should then be ready to create the FK relationships using the fields available.

tblReview FK doc_index referencing tblDocument
tblReview_Remark FK ReviewID referencing tblReview
tblForm and tblJsDocument should no longer need to exist
tblReview_Remark can now have doc_index and rvw_created_date and possibly rmk_index removed.

While you may be tempted to just insert new surrogate keys, remove the current PK’s and link everything by the new keys because its easy, it really appears you need to tidy stuff up. That is my recommendation anyway!

Leave a Reply

Your email address will not be published. Required fields are marked *