What’s the difference between DISTINCT_SCAN and IXSCAN in explain output?

Posted on

Question :

In Mongo 3.4 explain("executionStats").distinct(...), for some queries
the winning plan contains an IXSCAN step, and for other queries it contains an DISTINCT_SCAN step.

What is the difference between the two? Is one of them faster than the other?

Answer :

They are different stages for different purposes. I believe DISTINCT_SCAN is only present when there is an index in the field. An IXSCAN stage basically means that the query planner scans an index as one of the stages.

For example:

> db.test.createIndex({a:1})
{
  "createdCollectionAutomatically": true,
  "numIndexesBefore": 1,
  "numIndexesAfter": 2,
  "ok": 1
}

Doing find() on an indexed field will trigger the presence of an IXSCAN stage:

> db.test.explain().find({a:1})
{
  ...
    "winningPlan": {
      "stage": "FETCH",
      "inputStage": {
        "stage": "IXSCAN",
        "keyPattern": {
          "a": 1
        },
        "indexName": "a_1",
        ...

While doing distinct() on the indexed field will trigger the DISTINCT_SCAN stage:

> db.test.explain().distinct('a')
{
  ...
    "winningPlan": {
      "stage": "PROJECTION",
      "transformBy": {
        "_id": 0,
        "a": 1
      },
      "inputStage": {
        "stage": "DISTINCT_SCAN",
        "keyPattern": {
          "a": 1
        },
        "indexName": "a_1",
        ...

Note that in both cases, the index {a: 1} is used. Without the presence of the index, you will see a COLLSCAN stage instead.

One is not “faster” than the other. They’re different stages serving different purposes.

In short: a MongoDB DISTINCT_SCAN is a special kind of IXSCAN. The difference is that it returns just one document per index key value. It efficiently skips past duplicates rather than including them all.

For example, let’s say you insert the following documents:

  • {a:foo, b:10, c:rat}
  • {a:bar, b:1, c:cat}
  • {a:bar, b:2, c:bat}
  • {a:bar, b:1000000, c:mat}

Notice that there are lots of documents with a=bar.

If an index is defined as db.test.createIndex({a:1}) then a DISTINCT_SCAN of that index could be used to return just the following:

  • {a:foo, b:10, c:rat}
  • {a:bar, b:1, c:cat} <– an arbitrary document with the value a:bar is chosen from among the many available. Generally this will be the first one encountered by the index scan.

The original question was “which is faster?” All other things being equal, a DISTINCT_SCAN will be faster if there are multiple documents per index value. However, keep in mind that index scans are configurable with parameters like indexBounds. So, two IXSCANs using the exact same index can differ significantly in their execution time and the same variability applies to a comparison of DISTINCT_SCAN vs IXSCAN.

In practice I’d give this advice:

  • DISTINCT_SCAN and IXSCAN are both good things to see in a query plan.
  • Look at the .explain("executionStats") to see the details of the index scans’ execution.

Leave a Reply

Your email address will not be published. Required fields are marked *