Database design – load multiple excels daily

Posted on

Question :

My first post ever 🙂 Apologies if I am not so clear with my question – here I go…

I have a design problem from database schema perspective.

My use case is following:

On a daily basis, system will receive around 30 Load files on a daily basis. These files will be basically excel files – with rows and columns. Each file will at most be related to one other file in terms of data contained. For other files, they might at all not be dependant on other file data. We’re thinking of using SQL Server to contain this data. This will be loaded into database using a programming language.

A plain relational Database approach would be:

  1. Create around 20 Tables (As some files are related to other file(s)).

  2. Every table will have all the fields corresponding to the file(s) it has been created for (fields in the excel file)

  3. On a daily basis, load the data of these files into the corresponding tables.

Each file will load 10 KB to 15 MB of data per day. Also, there can be multiple iterations of loading data from these files in a day (Max 24 – hourly once). I am dreading pushing so much data into database with above proposed structure, as it will become too loaded in a few days.

If I think of non-relational, I need to also process/clean/filter this data once it is loaded (I might want to create a stored procedure) – so that I can render useful data on UI – I think it will be more complicated on no-sqls due to non-availability of joins.

Moreover, my files are fixed and I have a structure. To my understanding – NoSqls are more apt when your data doesn’t have any fixed format.

Can somebody please suggest what kind of database & the design would be good for this use case? I don’t want to create 20 tables for 20 files 🙁

Answer :

If the data needs to be clean and relational before being presented in your UI, I’d use SSIS to handle reading, cleaning, inserting your data into SQL Server, handling rejects, and archiving the input files.

If your data does not need to be clean/relational before presenting in the UI, then a NoSQL solution might work for you.

By the way, Excel is horrible to work with as an input file format. You might want to see if you can receive the data as CSV or some other flat file format.

Leave a Reply

Your email address will not be published. Required fields are marked *