Question :
My site has a main MySQL InnoDB table that it does most of its work on. New rows get inserted at a rate of 1 million per week, and rows older than a week gets moved over to an archive table on a daily basis. These archived rows are processed once a week for stuff like finding trends.
This archive table consequently grows at 1 million new rows every week, and querying it can get really slow. Is MySQL suited for archiving data, or is my strategy very flawed?
Please advise, thank you!
Answer :
I would be very tempted to store in a no-SQL data store, like Mongo or Couch. Writes are incredibly fast, scales well, etc.
You might even archive in a mongo collection, then store “processed” results in an RDBMS, which you can then query very quickly with SQL.
To stay in MySQL, you’re looking at some sort of partitioning scheme to get this to scale at all.
If you have similar amount of data coming every week and you have to run queries on large set of data. Be it any store, it will take some time based on the size of data and the kind of query you are doing on that data. As extracting trends might want you to run complex aggregations.
You might have to do sharding as well when the data will become really huge on mysql.
I would suggest to look at an option of having this data stored in hadoop and then query the same using hive, which gives capabilities of adhoc SQL queries. Which can run on very large data but again it cant be just real time. But will give you huge scalable solution for archival