Replication or Sharding for a huge table?

Posted on

Question :

I have a table that I’m pretty sure in the very near future (within a year) will reach 14 million rows, and I’m trying to scale out my web app before hitting any bottleneck. I’m in dilemma whether I should use sharding or replication?

What is a good strategy for my web app?

EDIT:

Table structure in which I’ll have millions of rows:

    id mediumint(8) unsigned NO  NULL  auto_increment   
    user_id  smallint(5) unsigned NO  NULL  
    jqci_id  mediumint(8) unsigned NO   NULL  
    selected_answer  enum('a','b','c','d')  NO  NULL   

Answer :

Mysql replication won’t address performance issues on large tables. It just allows you to have another copy you might use for failover, backup or reporting in cases of heavy hitting queries.

Your table schema is tiny. Even with 14 million rows you could fit that in ram on even modest hardware.

All the same, if you want to look at sharding you could read up on mysql partitioning

You say you expect 14 million rows; you’re dangerously close to the max value for your medium PK there. Just go head and make that an int instead of worrying about the extra byte/entry.

If I understand the question, I presume that your growing table needs all 14 million current rows. If not, why not archive what is not needed during non-peak times?

Since your table definition involves a grain of fact (the selected_answer column), why not add a datetime column, then utilize MySQL table partitioning to horizontally shard your data? Explained here. Then you can use EXPLAIN to examine your queries to make sure they match up with your indexing strategy.

Leave a Reply

Your email address will not be published. Required fields are marked *