I have a table that I’m pretty sure in the very near future (within a year) will reach 14 million rows, and I’m trying to scale out my web app before hitting any bottleneck. I’m in dilemma whether I should use sharding or replication?
What is a good strategy for my web app?
Table structure in which I’ll have millions of rows:
id mediumint(8) unsigned NO NULL auto_increment user_id smallint(5) unsigned NO NULL jqci_id mediumint(8) unsigned NO NULL selected_answer enum('a','b','c','d') NO NULL
Mysql replication won’t address performance issues on large tables. It just allows you to have another copy you might use for failover, backup or reporting in cases of heavy hitting queries.
Your table schema is tiny. Even with 14 million rows you could fit that in ram on even modest hardware.
All the same, if you want to look at sharding you could read up on mysql partitioning
You say you expect 14 million rows; you’re dangerously close to the max value for your medium PK there. Just go head and make that an int instead of worrying about the extra byte/entry.
If I understand the question, I presume that your growing table needs all 14 million current rows. If not, why not archive what is not needed during non-peak times?
Since your table definition involves a grain of fact (the selected_answer column), why not add a datetime column, then utilize MySQL table partitioning to horizontally shard your data? Explained here. Then you can use EXPLAIN to examine your queries to make sure they match up with your indexing strategy.