MongoDB – slow remove query

Posted on

Question :

I have a collection in MongoDB with 800M+ records. I wanted to do some cleanup, but the remove operation is really really (unusably) slow – even deleting 1000 records can take 10+ minutes.

The structure of one record in the collection is something like this:

{
     'stID' : '00000001',
     'recordID' : '28-4',
     'date' : ISODate('2017-06-06 20:00:00'),
     ......
}

There is a compound index on {'stID' : 1, 'recordID' : 1, 'date' : 1} and my delete query is like this: db.coll.remove({ 'stID' : '00000001', 'recordID' : '28-4' }, { multi: true });

The database is part of a replicaSet, with one secondary and one arbiter, and I do understand that the delete operation has to fill the oplog with the _ids of each record to delete, but still I find it hard to understand why this could be so slow, especially since:

  • there is no considerable system or IO activity on the primary or the secondary
  • the secondary is not lagging behind the primary

We have never experienced problems with find or write/update operations. Can you suggest what other parameters/commands/tools should I use to see what is going on with the query and the servers.

The storage engine we use is WiredTiger. The server has 128GB of RAM and fast SSD drives in a RAID 1 configuration.

Answer :

I’ve had improved performance by using bulkWrite:

db.ig_posts.bulkWrite([
   { deleteMany: { "filter": { /* your query here */ } }
]);

Partition one of your index into groups such that each group contains no more than 10,000 records. For example if you have 2,000 records in each hour, create a list (or two) similar to the following:

["2021-04-30T00:00:00Z", "2021-04-30T01:00:00Z", "2021-04-30T02:00:00Z", ...]

Run a for loop to remove the records in each hour.

I was able to remove about 10M records in 10 minutes.

Leave a Reply

Your email address will not be published. Required fields are marked *