mongodb cluster Data distribution is not uniform

Posted on

Question :

The chunk seems to average

{  "_id" : "flanker",  "primary" : "rs11",  "partitioned" : true }
            shard key: { "_id" : "hashed" }
            unique: false
            balancing: true
                rs1 73
                rs10    73
                rs11    73
                rs12    73
                rs13    73
                rs14    73
                rs15    73
                rs2 72
                rs3 72
                rs4 72
                rs5 72
                rs6 72
                rs7 72
                rs8 72
                rs9 72
            too many chunks to print, use verbose if you want to force print
but the Quantity and size are not same 
node 10:
rs10:PRIMARY> show dbs
admin            0.000GB
bjc_papa         0.351GB
flanker         35.605GB
hades            0.162GB
local           19.968GB
za-ordergather   0.000GB
rs10:PRIMARY> db.bill.find().count()
rs10:PRIMARY> db.claim.find().count()
rs10:PRIMARY> db.commissionBill.find().count()
rs10:PRIMARY> db.endorsement.find().count()
rs10:PRIMARY> db.policy.find().count()
any other node, for example:
rs2:PRIMARY> show dbs
admin           0.000GB
bjc_papa        0.453GB
flanker         3.889GB
hades           0.152GB
local           6.441GB
za-ordergather  0.000GB
rs2:PRIMARY> db.bill.find().count()
rs2:PRIMARY> db.claim.find().count()
rs2:PRIMARY> db.commissionBill.find().count()
rs2:PRIMARY> db.endorsement.find().count()
rs2:PRIMARY> db.policy.find().count()

Can anyone tell me why, Which makes me very confused´╝čthanks

Answer :

As at MongoDB 3.4, collection balancing is based on chunk counts and a migration threshold which is the delta between the shard with the most chunks for a collection and the shard with the least. For example, with your deployment having more than 80 chunks the migration threshold to trigger a balancing round would be a difference of 8 chunks. Your sample sh.status() output includes 15 shards with 72 or 73 chunks each, so the flanker.bill collection would be considered balanced.

A chunk represents a range of shard key values containing documents up to the configured chunkSize which is 64MB by default. Individual chunks are not expected to have similar document counts or data size, and some use cases (eg. highly variable document sizes, frequent data deletion, lack of cardinality in shard keys) may lead to more significant differences in the sum of documents represented by a chunk. The usual outcome is that on average chunk ranges provide a reasonable proxy for balancing data in a sharded collection while requiring minimal accounting/coordination between shards.

With a hashed shard key on a field with good cardinality you should expect reasonably random distribution of shard key values which helps distributes inserts across shards and reduces the need for rebalancing.

Since you appear to have a significant imbalance of data between shards, I would investigate further by calculating more details on actual data stored on the shards as per: How to Determine Chunk Distribution (data and number of docs) in a Sharded MongoDB Cluster. NOTE: calculating exact document totals (vs estimates) can be resource intensive, so exercise caution if this is your production environment and start with estimates.

It is likely that you have a number of chunk ranges which contain few (or in extreme cases, no) documents due to data deletion and more of these happen to have ended up on the same shard. If so, there is a manual administrative procedure to Merge Empty Chunks which may be relevant. Empty or sparsely populated chunks are not merged automatically as the assumption is that new documents will still be inserted into these chunk ranges in future.

Leave a Reply

Your email address will not be published. Required fields are marked *