Question :
Let’s assume we have a lot of potentially heavy rows (e.g. 500k) in a table that we want to filter by primary keys and send over the internet to a processing engine.Is it reasonable to use the IN
clause?
Answer :
No, the IN
in Cassandra should be used very carefully. It’s ok to use IN
when you have a query against the same partition, but if query is against multiple partitions, then it’s better to send individual requests – it will make less load onto the coordinating node, and also will be faster, as requests will be sent to nodes that hold the data (if you use Prepared queries with default token-aware load balancing)