We have a situation were we can (A) deploy instances of an applications in one MySQL database using table prefixing or (B) use different MySQL databases for each instance of the application, for e.g.,
central_database app1_table1 app1_table2 app1_tablen ... appn_table1 appn_table2 appn_tablen
The end result being a large db with many tables.
app1_db table1 table2 tablen ... appn_db table1 table2 tablen
The end result being many databases with some tables.
All things equal (e.g., amount of data, number of app instances, etc), what are the pros and cons of going with either approach? What would be detrimental to database performance and maintenance? The application is PHP 5 based, run over Apache 2.x, and we’re running MySQL 5.x.
Many thanks for your time and thoughts!
I ran a system with the best part of a thousand databases, spread across multiple servers. They were all an identical structure and were synchronised with a template database which was on each of the machines.
This allowed me the ability to migrate databases from one db to another if one was getting excessively over-loaded, and as the client mix changed, I could create new databases on different servers to load balance across the servers. This was the biggest advantage I got from the system, in that I had multiple large lumps of tin performing multiple complicated queries simultaneously on the separate servers.
The great thing about this, is that you can add servers to the configuration at your own speed, as each server starts to get over-loaded, add another into the mix, migrate some dbs across to the new server and end up with a nicely load balanced set of servers. A really nice and simple way to add scale to the system as and when it is required!
The reason I went with this approach rather than the single huge database approach, was the sheer size of the potential database that would have been created… each of the 1000 databases had 200 tables, and many of the individual tables within each of the databases comprised many hundreds of millions of rows of data!
A single database configuration would have required certain tables (approx 8 of them) to have multi-billions of rows of data, and the total db size would have been over 10Tb. We were able to have multiple servers with 5Tb of RAID 10 storage, with many databases on each.
That’s what I would do! Hope it helps your decision making… 🙂
Is the application you are building a SaaS application? If so, I would suggest you consider a third approach – have one DB, with a common structure for all application instances with one difference – add a userid/applicationid column in all tables. This will greatly reduce your application development / maintenance costs. This in my experience is one of the best approaches to storing multi-tenant data.
Also see this great white paper by Microsoft on multi-tenant data architecture
It also highlights the advantages/ disadvantages on the approaches you have mentioned.
Setup B is way easier to manage
tablen sits in a different folder. That can be very beneficial if you do no want to test OS limits.
For example, my employer hosts MySQL for a CRM system of car dealerships. Client has 800 dealerships. Each dealership database has 160 tables. That’s 128,000 tables.
- Under Setup A, all 128,000 tables would sit under one database.
- Under Setup B, each set of 160 tables sits in a subfolder under /var/lib/mysql.
From the perspective of the OS and its ability to handle i-nodes (or FAT tables for Windows), which includes having a maximum number of files per folder:
- Under Setup A, you would worry about 128,000 files under one folder. Can your OS support that many files under a single folder?
- Under Setup B, no such worry.
If you had to tweek table structures using
ALTER TABLE or some other DDL:
- Under Setup A, you would have to script the needed DDL using PHP (or specialized MySQL scripts) against the specific table name and corresponding queries before accessing it and making changes
- Under Setup B, Connect to right database, then access the same named table every time. The access paradigm would always be clean:
- Specific Database
- Specific Folder under
- Specfic TableName.
If you want to put different databases on different disks:
- Under Setup A, symlinks for every table moved to a separate disk will only exacerbate the “number of inodes in a folder” problem. Disk I/O and overall table access complicates more and increases overall server load since
.frmfiles are repeatedly accessed.
- Under Setup B, simply move a whole database folder into a separate data mount. Disk I/O can be distributed on demand.
- CAVEAT : Highly discouraged for InnoDB
Speaking metaphorically, which would you rather have?
- a gigantic apartment with one bedroom, one bathroom and one kitchen (SetupA)
- multiple apartments, each with its own bedroom, bathroom and kitchen (SetupB)
When it comes to fixing a radiator in an apartment:
- With Setup A, every tenant can be inconvenienced and must be involved because you have to talk with affected tenants in front of everyone like it’s everybody’s business
- With Setup B, other than hearing some banging on the wall or in the pipes, tenants can go on with their private lives
- This list and its metaphors can go on and on
IHMO Although budgets may be a driving force to design/infrastructure decisions, I would easily be in favor of separates databases per client.
I also have an SaaS product and use the same setup as Dave Rix mentioned.
Each customer has their own database
I’d make a few more suggestions:
You should have a database “controller” load-balanced (master-master), that stores database location (ip), database name, and customer name. This controller is where your application know where each customer database is.
Your application can be anywhere you want – you can have databases for many datacenters around the globe.
Your application can grow as much as you want. If it’s an Web SaaS, you can make a load-balanced webserver farm pointing to each database, as time as customer login.
You are able to create customized VIEW/Database for some customers – without impacting others. That’s important if you try to offer customization as part of your business.
You can set up two web farms + database farms: one for “EDGE” and another for “STABLE” releases. Then, you’ll need to have a small group of customers willing to test things and confirm that everything is working as expected (in other words, quality assurance [QA]), before you apply to all your customers.
You should have an automated backup job per database at least once a day.
You should have another server to do replication. A same host can replicate many databases (use different ports for each server at same host) if you can’t afford the same amount of “master” and “slave” host servers.
For example, 5 master servers + 1 slave server with 5 databases running on different ports – just have RAM enough to do it.
You should do a “migration” tool to move one database to another server anytime you want.
You should migrate VIP customers to more secure / available database server in order to keep your revenue protected. Remember, many times 20% of customers represent 80% of your revenue. Take care of special customers.
You should have a backup-delete “garbage” collector, to do a “last backup” and delete database when a customer leaves your company.
You must have a database image where you export and use for new accounts.
You must have a database patching tool to apply new patches to existing accounts.
Keep versions of all your SQL patches, using a versioning tool like subversion or git and create your own numbering also. xxx-4.3.0.sql – sometimes patching goes wrong and you must know how to recover/complete patching task.
Well, this is all I do in my company with a product that has about 5k databases with about 600 tables each.