Question :
I have two questions, I’ll give you the facts, then my questions.
The actual index I’m using for one of my queries has the following characteristics :
- Multikey (contains 4 keys).
- The average key length is 14 characters.
- 2 indexes values contains 2 letters (country code and state code).
- 1 index values contain numerical values (between 0 and 100).
- The index size is : ~395 MB.
Here’s an example of what should be indexed (I don’t know how mongodb actually is storing it’s indexes, so I’m going to represent it like if it was a collection document):
{
"geolocation.statecode": "AL",
"personnu_field": 50,
"geolocation.countrycode": "US"
"field1.sufield": "Awesome value"
}
Informations about the collection concerned by the index :
- The collection is a 6M documents.
- Fast growing.
- It’s actually a collection of Twitter users with some additional business related fields.
- field1 (see the example given above) is an array of subdocuments.
- avgObjSize: 3 KB.
- totalIndexSize: ~2.9 GB.
- storageSize: ~19 GB.
I’ve done an explain
to this long query :
db.crawler_users.find(
{
"geolocation.statecode": "AL",
personnu_field: { "$lte": 65, "$gte": 30 },
"geolocation.countrycode": "US"
},
{
personnu_field:1, _id:0
}
).hint(
{
"geolocation.countrycode" : 1,
"personnu_field" : -1,
"geolocation.statecode" : 1,
"field1.sufield" : 1
}
).explain()
And here’s the result:
{
"cursor" : "BtreeCursor geolocation.countrycode_1_personnu_field_-1_geolocation.statecode_1_field1.sufield_1",
"isMultiKey" : true,
"n" : 216,
"nscannedObjects" : 788609,
"nscanned" : 788609,
"nscannedObjectsAllPlans" : 788609,
"nscannedAllPlans" : 788609,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 128,
"nChunkSkips" : 0,
"millis" : 127451,
"indexBounds" : {
"geolocation.countrycode" : [
[
"US",
"US"
]
],
"personnu_field" : [
[
65,
-1.7976931348623157e+308
]
],
"geolocation.statecode" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"search.keyword" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
}
As you can see the query takes much time to be executed (>2 minutes). And it’s hitting the collection. even if the chosen fields already exists on the index.
I have two questions somehow related :
- Why is
indexOnly: false
. Isn’t it supposed to be a covered index query? (see the explain later) - I need to retrieve some additional fields from the collection (the id and the profile_picture url). Should I add them to the index to avoid hitting the collection, even if I’ll never have to query them?
Answer :
Why is indexOnly: false. Isn’t it supposed to be a covered index
query? (see the explain later)
I believe this is a result of the isMultiKey : true
field in the explain results. Basically, currently indexOnly
is never true when isMultiKey
is true.
This is a known problem in general with multi key indexes. You can find the relevant bug here:
https://jira.mongodb.org/browse/SERVER-3173
As well as some decent explanation in the linked/dupe bug here:
https://jira.mongodb.org/browse/SERVER-7595
I think you have done some manual munging of the fields here for some reason, but I would guess that search.keywords
is the problem here. Try an index without that as the final field and see if that performs better.
I need to retrieve some additional fields from the collection (the id
and the profile_picture url). Should I add them to the index to avoid
hitting the collection, even if I’ll never have to query them?
I’d recommend a separate index for those queries rather than massive single index. If you end up with too many fields in the index you are going to lose most of the benefit by simply having to scan through a massive index instead of a collection. An index that big will also likely have performance issues for updates/writes.
- Maybe the query is not
indexOnly
because ofgeolocation.statecode
. It is not in index, so mongo needs to get it from collection.A covered index query is a query in which all the queried fields are part of an index. They are “covered queries” because an index “covers” the query. MongoDB can fulfill the query by using only the index. MongoDB need not scan documents from the database.
- Avoid indexes with many keys. They are not so efficient, as 1,2-key indexes. They are query-tied and not generic (I mean, they are used with small set of queries, while other queries cannot use them). Indexes should be small enough to fit in RAM, take this in mind. Also, such indexes will slow down writes a lot.
Take a look at this doc