Question :
My table contains an index for column total_balance
:
d balances_snapshots
Table "public.balances_snapshots"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+------------------------------------------------
user_id | integer | | |
asset_id | text | | |
timestamp | timestamp without time zone | | | now()
total_balance | numeric | | not null |
id | integer | | not null | nextval('balances_snapshots_id_seq'::regclass)
Indexes:
"balances_snapshots_pkey" PRIMARY KEY, btree (id)
"balances_snapshots_asset_id_idx" btree (asset_id)
"balances_snapshots_timestamp_idx" btree ("timestamp")
"balances_snapshots_user_id_idx" btree (user_id)
"balances_total_balance_idx" btree (total_balance)
Foreign-key constraints:
"balances_snapshots_asset_id_fkey" FOREIGN KEY (asset_id) REFERENCES assets(id) ON UPDATE CASCADE ON DELETE CASCADE
"balances_snapshots_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
And simple query goes for seq scan
explain analyze SELECT EXISTS (
SELECT
1
FROM
balances_snapshots
WHERE
total_balance = double precision 'NaN'
LIMIT 1
) as exists;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=4.75..4.76 rows=1 width=1) (actual time=237365.680..237365.681 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on balances_snapshots (cost=0.00..9257326.32 rows=1948181 width=0) (actual time=237365.675..237365.676 rows=0 loops=1)
Filter: ((total_balance)::double precision = 'NaN'::double precision)
Rows Removed by Filter: 389636289
Planning Time: 23.985 ms
Execution Time: 237365.719 ms
(7 rows)
How can I make PostgreSQL use index? Or in other words, is there more efficient way to scan table on presence of NaN values?
Answer :
Precise explanation
Postgres has an operator for float8 = float8
as well as for numeric = numeric
. But not for numeric = float8
. One operand has to be cast.
double precision
, a.k.a. float8
is the “preferred” data type among numeric types. See pg_type.typispreferred.
Operator type resolution is eventually decided in paragraph 3.d.
Run through all candidates and keep those that accept preferred types
float8 = float8
wins. Your index is built on the operator class for numeric
and is not applicable. Bang.
Solution
Replace that cast to double precision
in your query with a more sensible cast to numeric
, matching the type of total_balance
– like Stanislav already suggested.
Or simply use an untyped literal 'NaN'
without explicit cast. That resolves to the type of total_balance
in the assignment automatically. See:
Optimized solution
is there more efficient way to scan table on presence of NaN values?
If that’s the focus of your queries, a partial index will be much more efficient for your case:
CREATE INDEX balances_total_balance_nan_idx ON balances_snapshots ((true))
WHERE total_balance = 'NaN';
The actual index expression hardly matters. I used a constant. See:
Your EXPLAIN
output reports close to 400 million rows (389636289
) and none of them actually have total_balance = 'NaN'
(rows=0
). The suggested partial index has minimal size of 8 kB instead of ~ 10 GB (?) for a full index , hardly any write cost, and makes the query work near instantaneously.
(Do you even need all the other existing indexes?)
The expression in the WHERE
clause has to match the expression used in queries. If there is a good reason for the odd type-mismatch in your query (?) make that:
...
WHERE total_balance = float8 'NaN'
Aside: wasted storage
Speaking of which, reordering your table columns like this will save quite a few MB of storage and RAM:
Table "public.balances_snapshots"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+------------------------------------------------
id | integer | | not null | nextval('balances_snapshots_id_seq'::regclass)
user_id | integer | | |
total_balance | numeric | | not null |
asset_id | text | | |
timestamp | timestamp without time zone | | | now()
See:
The problem is that PG has to cast original numeric to double precision:
Filter: ((total_balance)::double precision = ...
So the index that was built contains numeric, but you need double precision – hence the index can’t be used. You need to stay away from casting the values in the column:
WHERE
total_balance = 'NaN'::numeric
PS: it’s weird to see NaN in a database. Why not store null?
As Stanislav said, the problem is that query filters by first casting total_balance to double precision.
After I created an index on casted value query started using this index:
create index balances_total_balance_nan_idx on balances_snapshots (cast(total_balance as double precision));