500 databases or 1 database with 500 tables or just 1 table with all the records? [closed]

Posted on

Question :

I currently have an application that is used by a single end customer. For ease of discussion assume the application only needs a single database table for all records. I now need to support multi-tenancy and so in the future I will have about 500 customers using the same database server.

1) A possible solution is to have a separate database for each customer. As there will be around 500 customers that means having MySQL/SQL Server/Oracle/etc… host 500 separate databases. This sounds like overkill for a database server running on an average cloud based server.

2) Another solution is to stick with the single database I already have but to have 500 different tables, one for each end customer. Each customer only averages around 2000 records in their own table. This is easy to implement and I would guess makes it easy to migrate the customer to another server, just move the entire table data over.

3) Lastly I could stick to using a single table in the single database. Instead I add an additional column that identifies the customer the record belongs to. But then the table ends up with about 1,000,000 records which is the aggregate of all 500 customers that average 2,000 each.

I do not know enough about performance and scaling to know 1, 2, 3 is going to give the best performance. Any ideas?

Answer :

Having been on the receiving end of this multiple times from COTS applications, I would choose multiple databases almost every time. In this case, I would still choose multiple databases over any other, and here is why:

!This is for the SQL Server related tag and is not transferrable in logic to other RDBMSs!

  1. Standards – each database is setup the same way with the same standard names and tables.

  2. Flexibility – a new customer needs their language to be finnish and a different collation to be used, easy enough since it’s in it’s own database. Try dealing with that with a single table and everyone’s information in it. Making a change for a request or feature but only for a single customer, you’ll start to like a database for each customer.

  3. Security – depending on needs, a different login and user for each database to keep security tight.

  4. Automation – since each database is an exact copy (or base tables are exact copies at least), this makes it extremely easy to automate tasks.

  5. Upgrades/Changes – changes and updates can be rolled out for each customer on their specific schedules and needs. If a single upgrade or schema change goes wrong, only a single customer is down and not all customers.

  6. Granularity – ever need to restore older data that may have been accidentally deleted? Yep, much easier to do this when segregated by database rather tan all jumbled together in the same table or restoring a huge database for a few records in a different schema.

  7. Locking and Blocking – if all of the records are in the same table, some type of optimistic locking will need to be used if you don’t want too many performance problems of all of the customers attempting to read/write their data at the same time. This is going to lead to more tempdb utilization and make a process much heavier in terms of resources than it should be.

Some of the downsides:

  1. Admin Overhead – if automation is not used and scripts created for re-use… it’s going to make completing tasks manually a large effort. If automated or scripted properly, there should be little administrative overhead.

  2. Backups – now there are more databases and your restore strategy has to take that into account. The backup routine that is being used needs to automatically understand that new databases should be backed up.

  3. Disk Space – sure, using individual databases could lead to lower space utilization efficiency. I find this to be a very small downside, almost non-existent but depending on implementation could represent an impact, especially if in the full recovery model.

The pros (IMHO) extremely out-weigh the cons. We don’t know your specifics so some of these items might be moot, but there are many different things to think about and I hope I captured most of the big ones with what you’ll be needing to use these for.

MySQL Perspective

I once had a client that had a CRM aaplication with this layout

The challenge I faced doing database maintenance was the following 5 procedures:

  1. troubleshooting and optimizing slow queries
  2. add supportive indexes
  3. archiving old data
  4. defragmenting tables
  5. doing backups

It would have been 100 times more difficult to do such maintenance if all 780 clients occupied 162 tables. The flipside of this would be to tune

Of course, all of this was on a 3 bare metal DB Servers in Circular Replication.

Some 3 years, this same client decided to go with getting each customer data its own Amazon MySQL RDS server. Naturally, this meant having the client pay for its own Amazon account and
its Server Instances. Many clients still have not migrated to this and stayed on the bare metal servers.

Your Actual Question

1) A possible solution is to have a separate database for each customer. As there will be around 500 customers that means having MySQL/SQL Server/Oracle/etc… host 500 separate databases. This sounds like overkill for a database server running on an average cloud based server.

This is a situation for bare metal server with some firepower and loads of RAM

2) Another solution is to stick with the single database I already have but to have 500 different tables, one for each end customer. Each customer only averages around 2000 records in their own table. This is easy to implement and I would guess makes it easy to migrate the customer to another server, just move the entire table data over.

Taking a customer offline from the database and doing backups would be rather clumsy

3) Lastly I could stick to using a single table in the single database. Instead I add an additional column that identifies the customer the record belongs to. But then the table ends up with about 1,000,000 records which is the aggregate of all 500 customers that average 2,000 each.

Definitely not a good idea because such a design would need some normalization. Doing the 5 procedures database maintenance could cause performance issues and possible outages for other clients. This happened due to either budgetary concerns or better performance (some like the performance in new setup, other preferred the old).

Conclusion

At first glance, one might think these issues overlap. What causes these issues to stand out would be based on what Storage Engine you use, how well you tune the Storage Engine, how you layout the system tablespace and redo logs, how beefy the hardware is, how noninvasive backups are, and how independent and mutually exclusive customer data needs to be.

You can choose the third option, A table with all records, make a column as

CustomerId with bigint datatype and auto-incrementing(IDENTITY), and make that columns as PRIMARY KEY.

Now you can think indexes later depending on how you want to fetch data from table and will be fine in terms of performance.

This first two options will make too many objects in server/database and will be difficult to manage in future, difficult for any schema change etc.

Leave a Reply

Your email address will not be published. Required fields are marked *