Question :
I’ve been assigned to an eCommerce platform that has huge quantities of data ( hundreds of thousands of products, sellers, etc ) and I’ve been trying to choose the proper tools and approaches required to do this.
While there are many eCommerce platforms out there that use relational databases ( WooCommerce, Magento, Presta, etc ), I’m not fully convinced that it’s efficient to develop something like this using relational dbs. Since products form a major part of the dataset, and each have their own specific blueprints that can’t be mapped to a single blueprint unless you add massive metadata for each, store duplicate data and so. One could suggest using JSON, but I’m not sure if thats optimized when it comes to querying.
So I was wondering, would it be a good idea to develop such project using 2 different database types? A non-relational database for catalogs and a relational database for data such as user data?
I’m not an expert in NoSQL, and I’m concerned about the performance of this specifically when it comes to searching and querying the store.
What would be a good approach to maximize efficiency and performance? My hands are untied to choose any technology and stack required.
Answer :
There are really three different questions in here.
Q: Is a table with hundreds of thousands of rows huge?
Generally, no. For example, you can download the Stack Overflow public data export, and it has tables with over a hundred million of rows.
The definition of a very large database (VLDB) has changed over time, but you’ll often hear folks complaining about performance and maintenance issues when they have over a billion rows per table, or over 1 terabyte of data in a single database. That’s not to say you won’t hear folks complain earlier, of course.
Q: Would it be a good idea to develop such project using 2 different database types?
When you’re just getting started, I’d recommend focusing on a single database. It’s hard enough to build an app with one persistence layer that you don’t know well, let alone two.
When you’re experiencing growth pains and you’re considering an additional persistence layer, start with a caching layer like Redis instead. Save commonly accessed, rarely-changing data there – in your case, the Products table would be a good example. Products probably don’t change more than once every 5-10 minutes, but at the same time, they’re constantly being queried in order to render web pages. Caching the products is a quick way to reduce workload from your database without having to become an expert at multiple database platforms.
Q: Which database should I pick when developing a new app?
The one you know best.
All relational databases can perform well – or poorly – with tables in the hundreds-of-thousands-of-rows range. It’ll come down more to how you write the queries than the internals of the database itself.
Without more information on your use cases and the type of data, it’s hard to say what’s the right fit. Regarding performance, when architected properly, a relational database management system (RDBMS) will perform no differently than a NoSQL system, no matter how much data you plan to store.
Aside from that, some people do have the use cases for leveraging both a RDBMS and a NoSQL system concurrently for different purposes of their data. Facebook is a good example of this. They originally were primarily using a RDBMS when they started, and then incorporated a NoSQL system into the mix (in addition to their RDBMS) as a way to help them scale and develop certain parts of their system faster, as their schema changed, to keep pace with the rate they were getting new users. Nowadays they still use both, although recent news indicates they’re starting to rely more heavily on MySQL again.
In any case, neither tool is more peformant than the other in terms of data management. Also hundreds of thousands of rows in a table is small on a RDBMS, I’ve worked with 10s of billions in a single table. Rather, the question becomes what is the structure of your data, how frequently will it change in structure, and will NoSQL help you develop faster by offsetting when and how you’ll need to maintain that structure because of its schema-less nature? Typically if you have well structured data and don’t have any issues in maintaining that structure, then a RDBMS is a good fit.