Filtering a large number of rows in Cassandra

Posted on

Question :

Let’s assume we have a lot of potentially heavy rows (e.g. 500k) in a table that we want to filter by primary keys and send over the internet to a processing engine.Is it reasonable to use the IN clause?

Answer :

No, the IN in Cassandra should be used very carefully. It’s ok to use IN when you have a query against the same partition, but if query is against multiple partitions, then it’s better to send individual requests – it will make less load onto the coordinating node, and also will be faster, as requests will be sent to nodes that hold the data (if you use Prepared queries with default token-aware load balancing)

Leave a Reply

Your email address will not be published. Required fields are marked *