I’m attempting to build a single application that consists of multiple databases each with very different requirements.
Server1: Files (.txt .docx .png) as attachments.
Server2: Calendar events.
Server3: User data.
Sure, I could all all of these to a single database. However, I’m sure a database used for storing files requires completely different optimisation to run efficiently in comparison to one that stores largely string values. So I’m looking to separate the load into multiple databases for that reason, and modularity/security. Having all the data in one database, if compromised, gives access to everything from a single hack.
The question is a fairly simple one. How do I link the databases across multiple servers so the data can be linked/queried e.g.
Example: User “Cathie” will have an entry in the Server3 containing her name and emails, she may also have files attached to her account (Server1), or own a company device (Server4). I would like to be able to search all 3 servers using a single search query, can this be done in MySQL relationships?
Happy to hear some out of the box ideas also.
MySQL doesn’t support one query against multiple instances.
Your application would need a separate connection to each MySQL instance, and then you would write code to execute a separate query on each of these connections, and then combine the results by appending one result set to the other using code.
I posted an answer to a related question on Stack Overflow recently:
A given MySQL connection only knows one MySQL instance. You can’t connect to two instances at once on a single connection. The simplest solution is for you to continue doing what you are currently doing: query both, and union the results in your client application.
There are proxy solutions like Vitess or ProxySQL, but they are just handling the multiple connections for you. You’re not really using two MySQL instances on the same connection, you’re just delegating that work to the proxy.
There’s a MySQL feature called the FEDERATED storage engine, which comes close to what you want. Basically, you can still only connect to one MySQL instance, but that MySQL instance can serve as a proxy to another MySQL instance on a table-by-table basis. That is, the federated table is not really stored on the primary MySQL instance to which you are connected. But you can query it like a view, and those queries are passed to the second MySQL instance. Results are passed through the federated engine and then to your calling application as if the table had existed on the primary MySQL instance. The FEDERATED engine is not enabled by default on modern versions of MySQL.
You seem to be suffering from the Premature Optimization™ syndrome.
I’m sure a database used for storing files requires completely different optimisation to run efficiently in comparison to one that stores largely string values.
Why are you sure? Have you done any testing to confirm that? What exactly would you set up differently for these two use cases?
Of course, you could use MySQL federation to present disparate servers as a single view, but it will offer its own performance challenges. In addition to that, you will be then wading into the morass of distributed transactions, which is something you will want to avoid as much as possible.
Also, by distributing your application data across multiple servers you multiply maintenance overhead. You will now need to ensure user authentication and authorisation rules are synchronised across all these servers. You will also face difficulties creating a database backup that is consistent across all servers.
In my view these issues far outweigh the ephemeral performance advantages that you seek to find.
Sometimes a “Server” (machine) is called a “Database” (aka “schema”). I assume you are not making that terminology error.
There are more disadvantages of segregating Databases across Servers than advantages.
These days, Servers come pretty much in “one configuration fits all”; all dimensions (CPU cores, RAM, disk, etc) tend to grow simultaneously. That is, it would be hard to build Servers that were enough different to matter.
A main reason for using multiple servers for a single “application” is “read scaling” or “write scaling”. “Read scaling” is easily achieved with a Primary and several Replicas (separate servers). Each has a copy of the same data. This gets you close to building a “High Availability” system — the other main reason for multiple servers. (Cf: “replication”, “clustering”, “InnoDB Clustering”, “Galera”, “HA”.)
“Write scaling” is more complex, involving “sharding”.
Others have discussed “federation” (cf “MariaDB FederatedX”) where a single client is trying, very inefficiently, to talk to multiple Servers. It is a terrible waste of resources.
(This Answer addresses the implied, but unspoken Question: “Should I put images in the database or in files.”)
When building a web page with images in it, I consider the following to be appropriate:
Put the images in files. Then write
<img src=...> in the HTML for the page. This instructs the client browser to fetch the images while rendering the page. The browser does not care whether the
src= is on the same server as the text of the page.
That is much simpler than having the image in a database table and being processed by the PHP/Java/VB/whatever.
The table would include the filename of the image. Either the table or the code would include the rest of the URL.
Note that the page and the images could be on the same or different server(s); it does not matter which.