I am designing a scale out solution for a globally dispersed company that has data centers on all continents. Over 10,000 servers report to a SQL Server cluster in Los Angeles, and also query it to perform certain tasks.
In order to move from clusters, the issues with vertical scaling, and to improve performance due to the global network distance between many nodes to the cluster in LA, I thought to use Transactional Replication on specific core tables to subscribers (single pub multi sub) spread out across the world as my first step.
Issue to be Resolved
If for whatever reason the data is over X minutes old (let’s say 10 minutes), I want them to go to the publisher (or another defined server) and read it from there. I would like to automatically grab data from the publisher instance if the preferred one is not available or stale.
I’m not sure what the best method is for doing this, especially if the data is ‘stale’. I know we have replication monitoring tools, but how would I use that to know if the data is stale and force the connections to another server that isn’t?
The reason is I’m concerned that certain issues will take over 10 minutes to resolve, during this time the critical apps such as maintenance and automatic provisioning will get stale data and the business could be greatly impacted.
Possible Solutions & Issues With Them:
Put the Subscribers/Publisher in a ‘read only’ load balancer and have all queries go there.
If the database servers were all in the same data center, I would probably use a load balancer and pull servers out of it, but that will require manual intervention, and I want the system to force all reads to the publisher (or another subscriber) and bypass the subscriber they connect to.
This would not resolve the issue with stale data however, such as if replication falls behind and it takes an hour to fix.
Modify Code To Check Outstanding Replicated Queries And Adjust Accordingly
This would require code modifications on many queries and the devs won’t have the cycle to do so, plus it doesn’t sound like a very plausible idea.
Query Replication Status And Change Connection String Automatically If X Number Of Queries Are not Replicated
I should be able to do this in Powershell, but the majority of servers are Linux so they would need an ‘app’ which queries the replication status, and modifies their connection string if it is over X number of unreplicated commands.
Thank you! I have 1 2012 environment and several 2008 environments I’d like to implement this on, with the idea that we will move them all to 2012. Enterprise is an option.
Having a load balancer in LA that the world sends a very quick small query to, to decide which server to pick might work well actually, as it would be 1 small query and perhaps the requesting end could cache that info temporary.
It seems like I’d have to have the SQL or other DB engine that is the ‘read only’ pair spit back to the app ‘this data is stale, go somewhere else, preferably here’
Transactional replication can be used for something like this, but it would be tricky. However it sounds like what you are doing is keeping multiple front facing caches up to date.
This is tricky as a polling mechanism limits scalability. What works best is a push mechanism where if a change occurs, or some data meets a certain criteria it is send to all caches, in other works a messaging framework.
If this is to be done within SQL Server, service broker might be a better fit if the number of nodes are high and the transactions low. If the transactions are high and the number of nodes relatively small, transactional replication might be a better fit.
If the database is a backend for an app which is providing global failover services an application solution will likely be a better fit. I happen to know a guy who implemented a similar caching mechanism for a large auction site and then