I have a MongoDB database with ~100GB data size. I ran a test with 300 threads and queries are all reads, no writes (except writes to db profiler I guess). I enabled database profiler to keep track of slow queries. I noticed that queries with high ‘numYields’ result in high ‘millis’; likewise, queries with low numYields responded very fast in low millis.
90% of the queries ran very fast, in 1 ~ 2 ms, however, around 2% of queries end up in 60,000 ms or higher.
According to MongoDB doc:
numYields is a counter that reports the number of times the operation has yielded to allow other operations to complete.
Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete quickly while MongoDB reads in data for the yielding operation.
I understand a slow query was trying to read data from disk, while yielding to other queries that already have data in memory. However, if this result in 60,000ms for that particular query, it becomes unreasonable.
Perhaps there’s a way to limit numYields? Or perhaps try to fit everything to memory? Any suggestions?
This is a common misconception, i.e. that yields are somehow causing the slowness. In fact they are a symptom, not a cause. Even if there is no lock contention that requires a yield (writes basically), the queries still yield when they have to page from disk. They then re-acquire the lock when a certain amount of paging is done and look to yield again if more paging is needed (repeat until complete). If there is no lock contention from writes, then this is all pretty much instantaneous and does not add to the overall execution time.
If a query yields a lot, then it was hitting disk a lot, and that is the cause of the slowness – the disk access. Hence,
numYields is just a way to infer that it was indeed paging to disk that caused the query to be slow. If you want those queries to be fast, then you need to have that data set in memory, and have enough memory for it to stay there long terms and not be evicted.
Note: by default the kernel will use LRU to decide what gets evicted, so the likely candidates for slowness are queries on (large) parts of your data set that are not accessed very often.
There is no way to limit
numYields, and it wouldn’t really make sense to do so, but yes the remedy is to identify the data being addressed by those slow queries and make it fit into memory (note: the first query on any data will still be slow unless you pre-heat in some way, the second query will be in memory and fast).
In Jonathan Muller’s case it may be due to “heavy” operations like aggregations. I would expect a high load average in that case though so this is just my take on what the issue might be. Jonathan, can you see if there are very long running queries with a possibly large cpu demand? If not, is ALL your data+indexes in memory already (i.e. disk IO is not commensurate to the slowness)?