I have seen this related post but it was more focusing on the amount of consumers for the databases, which is not my only problem.
I have a situation, where I am taking part of the design of a new data model implying a lot of relationships across tables and where different users would not be allowed to access all the data in the DB.
At the moment the infrastructure has many databases, even many database servers, with references to each other which makes it chaotic (in my opinion), and I am trying to know whether my idea to make the new data model in a single database is good, or at least, not ridiculous.
These are the reasons why the current architecture was on multiple databases (not servers, I mean databases):
- Possibility to easily handle the access rights of the users (with Windows authentication) at a DB level.
- It makes it easier to restore databases if somebody makes a mistakes and deletes all the records in a table (the size of a unique DB would make the “rollback” (using backups) too lengthy and too difficult)
- The fact that if a single database has a deadlock, then all the architecture is blocked (with multiple databases, at least the independent data is still accessible).
- Storing all the data in a single database is less efficient when the amount of records gets very large.
For all these points, I have the following answers:
- It is possible to handle security very well using schemas. It comes down to having a good data model where security and compliance issues are also included in the design.
- Maybe it’s easier, but if somebody is able to affect the database badly enough, then that user simply shouldn’t have had the right to perform that action. This comes back to a good security policy design.
- Indeed, if there is a server failure, this is an issue. My best answer is that we could have a secondary server replicating the first, ready to take over if the first one was to fail.
- I do not think this is true, data storage management should allow the database server to address performance issues.
Moreover, I would say that having a model where all the references between tables is represented by a foreign key is a key element to maintain data consistency. Having reference ID in one DB coming from another DB is inefficient and dangerous.
That’s my thoughts, but I’m no expert in DB management and I would be extremely happy if somebody told me that I am wrong, or could help my add arguments to my cause.
I would like to add that I am discussing an infrastructure in SQL Server (the price of the required hardware is not really an issue, but I could not change the underlying technology).
I would also be extremely happy to know if some academic research has been done on the subject.
I think you are correct in your answers and JNK is correct about locks. I see some reasons to have multiple databases.
- Replication. If parts of your database should be replicated to local read servers they benefit from being on different database. I don’t know about sql-server specifically but in mysql (and generally) a master can only maintain one binary log per database and if table A and B should replicate to some server and table C and D should replicate to another server than those needs to either be on different databases or the binary log needs to be filtered on the slaves.
- Uptime and accessibility. If different parts of your data have different requirements on uptime and accessibility it would make sense to put them on different databases. One could have hot standby with all data replicated and the other could have a nightly dump.
But again, I think you’re right 🙂