PostgreSQL ignores index, runs seq scan

Posted on

Question :

My table contains an index for column total_balance:

d balances_snapshots
                                          Table "public.balances_snapshots"
    Column     |            Type             | Collation | Nullable |                    Default
---------------+-----------------------------+-----------+----------+------------------------------------------------
 user_id       | integer                     |           |          |
 asset_id      | text                        |           |          |
 timestamp     | timestamp without time zone |           |          | now()
 total_balance | numeric                     |           | not null |
 id            | integer                     |           | not null | nextval('balances_snapshots_id_seq'::regclass)
Indexes:
    "balances_snapshots_pkey" PRIMARY KEY, btree (id)
    "balances_snapshots_asset_id_idx" btree (asset_id)
    "balances_snapshots_timestamp_idx" btree ("timestamp")
    "balances_snapshots_user_id_idx" btree (user_id)
    "balances_total_balance_idx" btree (total_balance)
Foreign-key constraints:
    "balances_snapshots_asset_id_fkey" FOREIGN KEY (asset_id) REFERENCES assets(id) ON UPDATE CASCADE ON DELETE CASCADE
    "balances_snapshots_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE

And simple query goes for seq scan

explain analyze SELECT EXISTS (
  SELECT
    1
  FROM
    balances_snapshots
  WHERE
    total_balance = double precision 'NaN'
  LIMIT 1
) as exists;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=4.75..4.76 rows=1 width=1) (actual time=237365.680..237365.681 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Seq Scan on balances_snapshots  (cost=0.00..9257326.32 rows=1948181 width=0) (actual time=237365.675..237365.676 rows=0 loops=1)
           Filter: ((total_balance)::double precision = 'NaN'::double precision)
           Rows Removed by Filter: 389636289
 Planning Time: 23.985 ms
 Execution Time: 237365.719 ms
(7 rows)

How can I make PostgreSQL use index? Or in other words, is there more efficient way to scan table on presence of NaN values?

Answer :

Precise explanation

Postgres has an operator for float8 = float8 as well as for numeric = numeric. But not for numeric = float8. One operand has to be cast.

double precision, a.k.a. float8 is the “preferred” data type among numeric types. See pg_type.typispreferred.

Operator type resolution is eventually decided in paragraph 3.d.

Run through all candidates and keep those that accept preferred types

float8 = float8 wins. Your index is built on the operator class for numeric and is not applicable. Bang.

Solution

Replace that cast to double precision in your query with a more sensible cast to numeric, matching the type of total_balance – like Stanislav already suggested.

Or simply use an untyped literal 'NaN' without explicit cast. That resolves to the type of total_balance in the assignment automatically. See:

Optimized solution

is there more efficient way to scan table on presence of NaN values?

If that’s the focus of your queries, a partial index will be much more efficient for your case:

CREATE INDEX balances_total_balance_nan_idx ON balances_snapshots ((true))
WHERE total_balance = 'NaN';

The actual index expression hardly matters. I used a constant. See:

Your EXPLAIN output reports close to 400 million rows (389636289) and none of them actually have total_balance = 'NaN' (rows=0). The suggested partial index has minimal size of 8 kB instead of ~ 10 GB (?) for a full index , hardly any write cost, and makes the query work near instantaneously.

(Do you even need all the other existing indexes?)

The expression in the WHERE clause has to match the expression used in queries. If there is a good reason for the odd type-mismatch in your query (?) make that:

...
WHERE total_balance = float8 'NaN'

Aside: wasted storage

Speaking of which, reordering your table columns like this will save quite a few MB of storage and RAM:

                                          Table "public.balances_snapshots"
    Column     |            Type             | Collation | Nullable |                    Default
---------------+-----------------------------+-----------+----------+------------------------------------------------
 id            | integer                     |           | not null | nextval('balances_snapshots_id_seq'::regclass)
 user_id       | integer                     |           |          |
 total_balance | numeric                     |           | not null |
 asset_id      | text                        |           |          |
 timestamp     | timestamp without time zone |           |          | now()

See:

The problem is that PG has to cast original numeric to double precision:

Filter: ((total_balance)::double precision = ...

So the index that was built contains numeric, but you need double precision – hence the index can’t be used. You need to stay away from casting the values in the column:

  WHERE
    total_balance = 'NaN'::numeric

PS: it’s weird to see NaN in a database. Why not store null?

As Stanislav said, the problem is that query filters by first casting total_balance to double precision.

After I created an index on casted value query started using this index:

create index balances_total_balance_nan_idx on balances_snapshots (cast(total_balance as double precision));

Leave a Reply

Your email address will not be published. Required fields are marked *