Question :
The chunk seems to average
{ "_id" : "flanker", "primary" : "rs11", "partitioned" : true }
flanker.bill
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
rs1 73
rs10 73
rs11 73
rs12 73
rs13 73
rs14 73
rs15 73
rs2 72
rs3 72
rs4 72
rs5 72
rs6 72
rs7 72
rs8 72
rs9 72
too many chunks to print, use verbose if you want to force print
but the Quantity and size are not same
node 10:
rs10:PRIMARY> show dbs
admin 0.000GB
bjc_papa 0.351GB
flanker 35.605GB
hades 0.162GB
local 19.968GB
za-ordergather 0.000GB
rs10:PRIMARY> db.bill.find().count()
12730327
rs10:PRIMARY> db.claim.find().count()
4889074
rs10:PRIMARY> db.commissionBill.find().count()
7657692
rs10:PRIMARY> db.endorsement.find().count()
7279615
rs10:PRIMARY> db.policy.find().count()
1729638
any other node, for example:
rs2:PRIMARY> show dbs
admin 0.000GB
bjc_papa 0.453GB
flanker 3.889GB
hades 0.152GB
local 6.441GB
za-ordergather 0.000GB
rs2:PRIMARY> db.bill.find().count()
3971029
rs2:PRIMARY> db.claim.find().count()
1969355
rs2:PRIMARY> db.commissionBill.find().count()
1521794
rs2:PRIMARY> db.endorsement.find().count()
951906
rs2:PRIMARY> db.policy.find().count()
1789974
Can anyone tell me why, Which makes me very confused?thanks
Answer :
As at MongoDB 3.4, collection balancing is based on chunk counts and a migration threshold which is the delta between the shard with the most chunks for a collection and the shard with the least. For example, with your deployment having more than 80 chunks the migration threshold to trigger a balancing round would be a difference of 8 chunks. Your sample sh.status()
output includes 15 shards with 72 or 73 chunks each, so the flanker.bill
collection would be considered balanced.
A chunk represents a range of shard key values containing documents up to the configured chunkSize
which is 64MB by default. Individual chunks are not expected to have similar document counts or data size, and some use cases (eg. highly variable document sizes, frequent data deletion, lack of cardinality in shard keys) may lead to more significant differences in the sum of documents represented by a chunk. The usual outcome is that on average chunk ranges provide a reasonable proxy for balancing data in a sharded collection while requiring minimal accounting/coordination between shards.
With a hashed shard key on a field with good cardinality you should expect reasonably random distribution of shard key values which helps distributes inserts across shards and reduces the need for rebalancing.
Since you appear to have a significant imbalance of data between shards, I would investigate further by calculating more details on actual data stored on the shards as per: How to Determine Chunk Distribution (data and number of docs) in a Sharded MongoDB Cluster. NOTE: calculating exact document totals (vs estimates) can be resource intensive, so exercise caution if this is your production environment and start with estimates.
It is likely that you have a number of chunk ranges which contain few (or in extreme cases, no) documents due to data deletion and more of these happen to have ended up on the same shard. If so, there is a manual administrative procedure to Merge Empty Chunks which may be relevant. Empty or sparsely populated chunks are not merged automatically as the assumption is that new documents will still be inserted into these chunk ranges in future.