help in multi-language medical database?

Posted on

Question :

My project is to design a medical database for many countries.

This database contains many parts: medicines, medical centers, patients, etc. for every country with its own language.

I started my design with medicine part but I’m having many issues continuing:

  • How can I put all these complex parts in one database with different languages? I can’t imagine any suitable design for that complex idea.
  • How can I deal with the database growth from data coming fast for each part from each country?
  • I read about primary files and secondary files and how they increase performance of a database. Should I made them for my database? Or should I use the default files?
  • Should I separate every country with its own parts in a separated database?

I uploaded 2 pics for the initial design for medicines part in English language only.

NOTE: the 2 pics are only one but it’s too large so I divided it into 2 halves

I use SQL Server 2008.

part1
part2

Answer :

  • How can i put all these complexed parts in one DB with different languages??

Define the requirements

I can give you an example of how to model multilingual name data by assuming these business requirements:

  • The database should store the names of the ingredients of drugs in English, French and Spanish, and more languages yet to be defined.
  • Each ingredient is known by one name in each langauge.

Identify the languages

You need a way to identify langauges in your database so that you can associate drug names with a particular langauge.

A simple standard for identifying languages is ISO 619-1, which defines a two-letter code for the world’s most common languages. For example, the code for English is en, French is fr, and Spanish is es.

The following SQL creates a table with a language code column, then populates the table with the example languages:

CREATE TABLE Languages (
  LanguageCode CHAR(2) NOT NULL,
  CONSTRAINT PK_Languages PRIMARY KEY (LanguageCode),
  CONSTRAINT CK_LowerAlpha CHECK (LanguageCode NOT LIKE '%[^a-z]%' COLLATE Latin1_General_BIN2)
);

INSERT INTO Languages (
  LanguageCode
)
VALUES
  ('en'),
  ('fr'),
  ('es');

There is nothing in your entity model that corresponds to this table, which could explain why you found it difficult to model multiligual data.

Identify the chemicals

Next you need a way to identify the chemicals that compose each drug.

A common way to identify chemicals is CAS registry numbering, which defines a numerical code for every chemical known to Science. For example, the code for water is 7732-18-5, formaldehyde is 50-00-0, and mustard oil is 8007-40-7.

The following SQL creates a table with a chemical code column, then populates the table with the example chemicals:

CREATE TABLE Chemicals (
  ChemicalCode VARCHAR(12) NOT NULL,
  CONSTRAINT PK_Chemicals PRIMARY KEY (ChemicalCode),
  CONSTRAINT CK_Dashes CHECK (ChemicalCode LIKE '%_-__-_'),
  CONSTRAINT CK_Digits CHECK (REPLACE(ChemicalCode, '-', '') NOT LIKE '%[^0-9]%')
);

INSERT INTO Chemicals (
  ChemicalCode
)
VALUES
  ('7732-18-5'),
  ('50-00-0'),
  ('8007-40-7');

The Chemicals table roughly corresponds to the Drugs_Ingredients entity in your diagram, but lacks a column for the name.

For simplicity, the table does not enforce the checksum constraint. You should enforce the checksum constraint before using CAS registry numbers in production.

Identify the chemical name

If you were to store the name in a column in the Chemicals table, it would be difficult to support more than one language. You could have one column per langauge, but you would have to modify the schema every time you wanted to support a new langauge.

To free yourself from these limitations, you should model the chemical name as an identifiable entity in itself. You can identify the chemical name using a compound identity of the chemical code and the language code.

The following SQL creates a table with columns for the chemical code, the language code, and the checmical name, then populates the table with one name in each language for each chemical:

CREATE TABLE ChemicalNames (
  ChemicalCode VARCHAR(12) NOT NULL,
  LanguageCode CHAR(2) NOT NULL,
  ChemicalName NVARCHAR(50) NOT NULL,
  CONSTRAINT PK_ChemicalNames PRIMARY KEY (ChemicalCode, LanguageCode),
  CONSTRAINT FK_ChemicalNames_ChemicalCode FOREIGN KEY (ChemicalCode) REFERENCES Chemicals(ChemicalCode),
  CONSTRAINT FK_ChemicalNames_LanguageCode FOREIGN KEY (LanguageCode) REFERENCES Languages(LanguageCode)
);

INSERT INTO ChemicalNames (
  ChemicalCode,
  LanguageCode,
  ChemicalName
)
VALUES
  ('7732-18-5', 'en', 'Water'),
  ('7732-18-5', 'fr', 'Eau'),
  ('7732-18-5', 'es', 'Agua'),
  ('50-00-0', 'en', 'Formaldehyde'),
  ('50-00-0', 'fr', 'Méthanal'),
  ('50-00-0', 'es', 'Formaldehído'),
  ('8007-40-7', 'en', 'Mustard oil'),
  ('8007-40-7', 'fr', 'Huile de moutarde'),
  ('8007-40-7', 'es', 'Aceite de mostaza');

The ChemicalName column is of type NVARCHAR so that it can store any Unicode string. You should always use Unicode in international applications to store text so that you can handle the writing systems of different langauges in a uniform way.

Query the chemical names

I can think of two questions that you might reasonably ask of your database. With the ChemicalNames table, you can answer both of them with simple queries.

What are the names of all the chemicals in Spanish?

Query:

SELECT ChemicalCode, ChemicalName
FROM ChemicalNames
WHERE LanguageCode = 'es';

Result:

CHEMICALCODE    CHEMICALNAME
50-00-0 Formaldehído
7732-18-5   Agua
8007-40-7   Aceite de mostaza

What is water called in each langauge?

Query:

SELECT LanguageCode, ChemicalName
FROM ChemicalNames
WHERE ChemicalCode = '7732-18-5';

Result:

LANGUAGECODE    CHEMICALNAME
en  Water
es  Agua
fr  Eau

Multiple DBs – This is an exact repeat of the thread you started yesterday. The answer is still a single DB.

Multiple Files – Yea, multiple files help for performance, but they need to be on separate LUNs or you will get no benefit. Also separate out Lig file into its own spindle and TempDB into its own spindle. You should also consider multiple file groups so that you can control or at least aid parellel reads/writes.

Auto grow – you should perform capacity planning before deciding on the initial file size and appropriately size the files to avoid them growing if possible. Auto growth should still be set as a safe guard

For multiple languages – I suggest you use UNICODE data types for text and store the different languages in the same tables/columns, thus reducing the complexity of your model.

Leave a Reply

Your email address will not be published. Required fields are marked *