PostgreSQL is not using index with large data

Posted on

Question :


My PostgreSQL (9.2) schema looks like this:

   id_first bigint NOT NULL,
   first_date timestamp without time zone NOT NULL,
   CONSTRAINT first_pkey PRIMARY KEY (id_first)

CREATE INDEX first_first_date_idx
   ON first
   USING btree

   id_second bigint NOT NULL,
   id_first bigint NOT NULL,
   CONSTRAINT second_pkey PRIMARY KEY (id_second),
   CONSTRAINT fk_first FOREIGN KEY (id_first)
      REFERENCES first (id_first) MATCH SIMPLE

CREATE INDEX second_id_first_idx
   ON second
   USING btree

   id_third bigint NOT NULL,
   id_second bigint NOT NULL,
   CONSTRAINT third_pkey PRIMARY KEY (id_third),
   CONSTRAINT fk_second FOREIGN KEY (id_second)
      REFERENCES second (id_second) MATCH SIMPLE

CREATE INDEX third_id_second_idx
   ON third
   USING btree

So, I have 3 tables with own PK. First has an index on first_date, Second has a FK from First and index on it. Third as a FK from Second and index on it aswell:

 First (0 --> n) Second (0 --> n) Third

First table contains about 10 000 000 records.
Second table contains about 20 000 000 records.
Third table contains about 18 000 000 records.

Date range in column first_date is from 2016-01-01 till today.

random_cost_page is set to 2.0.
default_statistics_target is set to 100.
All FK, PK and first_date STATISTICS are set to 5000

Task to do

I want to count all Third rows connected with First, where first_date < X

My query:

SELECT count(t.id_third) AS count
FROM first f
JOIN second s ON s.id_first = f.id_first 
JOIN third t ON t.id_second = s.id_second
WHERE first_date < _my_date

Problem description

  • Asking for 2 days – _my_date = '2016-01-03'

Everything working pretty well. Query lasts 1-2 seconds.

"Aggregate  (cost=8585512.55..8585512.56 rows=1 width=8) (actual time=67.310..67.310 rows=1 loops=1)"
"  ->  Merge Join  (cost=4208477.49..8583088.04 rows=969805 width=8) (actual time=44.277..65.948 rows=17631 loops=1)"
"        Merge Cond: (s.id_second = t.id_second)"
"        ->  Sort  (cost=4208477.48..4211121.75 rows=1057709 width=8) (actual time=44.263..46.035 rows=19230 loops=1)"
"              Sort Key: s.id_second"
"              Sort Method: quicksort  Memory: 1670kB"
"              ->  Nested Loop  (cost=0.01..4092310.41 rows=1057709 width=8) (actual time=6.169..39.183 rows=19230 loops=1)"
"                    ->  Index Scan using first_first_date_idx on first f  (cost=0.01..483786.81 rows=492376 width=8)  (actual time=6.159..12.223 rows=10346 loops=1)"
"                          Index Cond: (first_date < '2016-01-03 00:00:00'::timestamp without time zone)"
"                    ->  Index Scan using second_id_first_idx on second s  (cost=0.00..7.26 rows=7 width=16) (actual time=0.002..0.002 rows=2 loops=10346)"
"                          Index Cond: (id_first = f.id_first)"
"        ->  Index Scan using third_id_second_idx on third t  (cost=0.00..4316649.89 rows=17193788 width=16) (actual time=0.008..7.293 rows=17632 loops=1)"
"Total runtime: 67.369 ms"
  • Asking for 10 days or more – _my_date = '2016-01-11' or more

Query is not using a indexscan anymore – replaced by seqscan and last 3-4 minutes…
Query plan:

"Aggregate  (cost=8731468.75..8731468.76 rows=1 width=8) (actual time=234411.229..234411.229 rows=1 loops=1)"
"  ->  Hash Join  (cost=4352424.81..8728697.88 rows=1108348 width=8) (actual time=189670.068..234400.540 rows=138246 loops=1)"
"        Hash Cond: (t.id_second = o.id_second)"
"        ->  Seq Scan on third t  (cost=0.00..4128080.88 rows=17193788 width=16) (actual time=0.016..124111.453 rows=17570724 loops=1)"
"        ->  Hash  (cost=4332592.69..4332592.69 rows=1208810 width=8) (actual time=98566.740..98566.740 rows=151263 loops=1)"
"              Buckets: 16384  Batches: 16  Memory Usage: 378kB"
"              ->  Hash Join  (cost=561918.25..4332592.69 rows=1208810 width=8) (actual time=6535.801..98535.915 rows=151263 loops=1)"
"                    Hash Cond: (s.id_first = f.id_first)"
"                    ->  Seq Scan on second s  (cost=0.00..3432617.48 rows=18752248 width=16) (actual time=6090.771..88891.691 rows=19132869 loops=1)"
"                    ->  Hash  (cost=552685.31..552685.31 rows=562715 width=8) (actual time=444.630..444.630 rows=81650 loops=1)"
"                          ->  Index Scan using first_first_date_idx on first f  (cost=0.01..552685.31 rows=562715 width=8) (actual time=7.987..421.087 rows=81650 loops=1)"
"                                Index Cond: (first_date < '2016-01-13 00:00:00'::timestamp without time zone)"
"Total runtime: 234411.303 ms"

For test purposes, I have set:

 SET enable_seqscan = OFF;

My queries start using indexscan again and last for 1-10 s (depends on range).

Here is EXPLAIN ANALYZE for 2nd query after turning off seqscan:

"Aggregate  (cost=8825283.56..8825283.57 rows=1 width=8) (actual time=641.201..641.201 rows=1 loops=1)"
"  ->  Merge Join  (cost=4445761.20..8822512.69 rows=1108348 width=8) (actual time=389.803..628.788 rows=138246 loops=1)"
"        Merge Cond: (s.id_second =t.id_second)"
"        ->  Sort  (cost=4445761.19..4448783.22 rows=1208810 width=8) (actual time=389.767..414.198 rows=151263 loops=1)"
"              Sort Key: s.id_second"
"              Sort Method: external sort  Disk: 2664kB"
"              ->  Merge Join  (cost=612085.24..4311835.22 rows=1208810 width=8) (actual time=132.170..304.782 rows=151263 loops=1)"
"                    Merge Cond: (f.id_first = s.id_first)"
"                    ->  Sort  (cost=611927.84..613334.63 rows=562715 width=8) (actual time=132.136..144.508 rows=81650 loops=1)"
"                          Sort Key: f.id_first"
"                          Sort Method: external sort  Disk: 1440kB"
"                          ->  Index Scan using first_first_date_idx on first f  (cost=0.01..552685.31 rows=562715 width=8) (actual time=9.036..82.768 rows=81650 loops=1)"
"                                Index Cond: (first_date < '2016-01-13 00:00:00'::timestamp without time zone)"
"                    ->  Index Scan using second_id_first_idx on second s  (cost=0.00..3638125.09 rows=18752248 width=16) (actual time=0.026..80.109 rows=151264 loops=1)"
"        ->  Index Scan using third_id_second_idx on third t  (cost=0.00..4316649.89 rows=17193788 width=16) (actual time=0.027..92.270 rows=138247 loops=1)"
"Total runtime: 643.523 ms"


Why this is working like that? How to convince a Query Planner to use a indexscan?


allow_system_table_mods             off
application_name                    pgAdmin III - Narz??dzie Zapytania
archive_command                     (disabled)
archive_mode                        off
archive_timeout                     0
array_nulls                         on
authentication_timeout              60
autovacuum                          on
autovacuum_analyze_scale_factor     0.1
autovacuum_analyze_threshold        50
autovacuum_freeze_max_age           200000000
autovacuum_max_workers              3
autovacuum_naptime                  60
autovacuum_vacuum_cost_delay        20
autovacuum_vacuum_cost_limit        -1
autovacuum_vacuum_scale_factor      0.2
autovacuum_vacuum_threshold         50
backslash_quote                     safe_encoding
bgwriter_delay                      200
bgwriter_lru_maxpages               100
bgwriter_lru_multiplier             2
block_size                          8192
bonjour                             off
bytea_output                        escape
check_function_bodies               on
checkpoint_completion_target        0.9
checkpoint_segments                 32
checkpoint_timeout                  300
checkpoint_warning                  30
client_encoding                     UNICODE
client_min_messages                 notice
commit_delay                        0
commit_siblings                     5
constraint_exclusion                partition
cpu_index_tuple_cost                0.005
cpu_operator_cost                   0.0025
cpu_tuple_cost                      0.01
cursor_tuple_fraction               0.1
DateStyle                           ISO, MDY
db_user_namespace                   off
deadlock_timeout                    1000
debug_assertions                    off
debug_pretty_print                  on
debug_print_parse                   off
debug_print_plan                    off
debug_print_rewritten               off
default_statistics_target           100
default_text_search_config          pg_catalog.english
default_transaction_deferrable      off
default_transaction_isolation       read committed
default_transaction_read_only       off
default_with_oids                   off
effective_cache_size                393216
effective_io_concurrency            1
enable_bitmapscan                   on
enable_hashagg                      on
enable_hashjoin                     on
enable_indexonlyscan                on
enable_indexscan                    on
enable_material                     on
enable_mergejoin                    on
enable_nestloop                     on
enable_seqscan                      on
enable_sort                         on
enable_tidscan                      on
escape_string_warning               on
event_source                        PostgreSQL
exit_on_error                       off
extra_float_digits                  0
from_collapse_limit                 8
fsync                               on
full_page_writes                    on
geqo                                on
geqo_effort                         5
geqo_generations                    0
geqo_pool_size                      0
geqo_seed                           0
geqo_selection_bias                 2
geqo_threshold                      12
gin_fuzzy_search_limit              0
hot_standby                         off
hot_standby_feedback                off
ignore_system_indexes               off
integer_datetimes                   on
IntervalStyle                       postgres
join_collapse_limit                 8
krb_caseins_users                   off
krb_srvname                         postgres
lc_collate                          en_US.UTF-8
lc_ctype                            en_US.UTF-8
lc_messages                         en_US.UTF-8
lc_monetary                         en_US.UTF-8
lc_numeric                          en_US.UTF-8
lc_time                             en_US.UTF-8
listen_addresses                    *
lo_compat_privileges                off
log_autovacuum_min_duration         -1
log_checkpoints                     on
log_connections                     on
log_destination                     stderr
log_disconnections                  off
log_duration                        off
log_error_verbosity                 default
log_executor_stats                  off
log_file_mode                       600
log_hostname                        off
log_line_prefix                     %d %t %u
log_lock_waits                      off
log_min_duration_statement          3000
log_min_error_statement             error
log_min_messages                    warning
log_parser_stats                    off
log_planner_stats                   off
log_rotation_age                    1440
log_rotation_size                   0
log_statement                       ddl
log_statement_stats                 off
log_temp_files                      -1
log_timezone                        Poland
log_truncate_on_rotation            on
logging_collector                   on
maintenance_work_mem                262144
max_connections                     100
max_files_per_process               1000
max_function_args                   100
max_identifier_length               63
max_index_keys                      32
max_locks_per_transaction           64
max_pred_locks_per_transaction      64
max_prepared_transactions           0
max_stack_depth                     2048
max_standby_archive_delay           30000
max_standby_streaming_delay         30000
max_wal_senders                     5
password_encryption                 on
port                                5433
post_auth_delay                     0
pre_auth_delay                      0
quote_all_identifiers               off
random_page_cost                    2
replication_timeout                 60000
restart_after_crash                 on
search_path                         $user",public"
segment_size                        131072
seq_page_cost                       1
server_encoding                     UTF8
server_version                      9.2.13
server_version_num                  90213
session_replication_role            origin
shared_buffers                      131072
sql_inheritance                     on
ssl                                 off
ssl_cert_file                       server.crt
ssl_key_file                        server.key
ssl_renegotiation_limit             524288
standard_conforming_strings         on
statement_timeout                   0
superuser_reserved_connections      3
synchronize_seqscans                on
synchronous_commit                  on
syslog_facility                     local0
syslog_ident                        postgres
tcp_keepalives_count                9
tcp_keepalives_idle                 7200
tcp_keepalives_interval             75
temp_buffers                        1024
temp_file_limit                     -1
TimeZone                            Poland
timezone_abbreviations              Default
trace_notify                        off
trace_recovery_messages             log
trace_sort                          off
track_activities                    on
track_activity_query_size           1024
track_counts                        on
track_functions                     none
track_io_timing                     off
transaction_deferrable              off
transaction_isolation               read committed
transaction_read_only               off
transform_null_equals               off
unix_socket_permissions             777
update_process_title                on
vacuum_cost_delay                   0
vacuum_cost_limit                   200
vacuum_cost_page_dirty              20
vacuum_cost_page_hit                1
vacuum_cost_page_miss               10
vacuum_defer_cleanup_age            0
vacuum_freeze_min_age               50000000
vacuum_freeze_table_age             150000000
wal_block_size                      8192
wal_buffers                         2048
wal_keep_segments                   64
wal_level                           hot_standby
wal_receiver_status_interval        10
wal_segment_size                    2048
wal_sync_method                     fdatasync
wal_writer_delay                    200
work_mem                            5242
xmlbinary                           base64
xmloption                           content
zero_damaged_pages                  off


After reducing a random_page_cost to 1.1, I can select about 30 days now still using a indexscan. Query plan changed a little bit:

"Aggregate  (cost=8071389.47..8071389.48 rows=1 width=8) (actual  time=4915.196..4915.196 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.01..8067832.28 rows=1422878 width=8) (actual time=14.402..4866.937 rows=399184 loops=1)"
"        ->  Nested Loop  (cost=0.01..3492321.55 rows=1551849 width=8) (actual time=14.393..3012.617 rows=436794 loops=1)"
"              ->  Index Scan using first_first_date_idx on first f  (cost=0.01..432541.99 rows=722404 width=8) (actual time=14.372..729.233 rows=236007 loops=1)"
"                    Index Cond: (first_date < '2016-02-01 00:00:00'::timestamp without time zone)"
"              ->  Index Scan using second_id_first_idx on second s  (cost=0.00..4.17 rows=7 width=16) (actual time=0.008..0.009 rows=2 loops=236007)"
"                    Index Cond: (second = f.id_second)"
"        ->  Index Scan using third_id_second_idx on third t  (cost=0.00..2.94 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=436794)"
"              Index Cond: (id_second = s.id_second)"
"Total runtime: 4915.254 ms"

However, I still don get it why asking for more couse a seqscan

Iteresting is that, when I ask for range just above some kind of limit I getting a Query plan like this (here select for 40 days – asking for more will produce full seqscan again):

"Aggregate  (cost=8403399.27..8403399.28 rows=1 width=8) (actual time=138303.216..138303.217 rows=1 loops=1)"
"  ->  Hash Join  (cost=3887619.07..8399467.63 rows=1572656 width=8) (actual time=44056.443..138261.203 rows=512062 loops=1)"
"        Hash Cond: (t.id_second = s.id_second)"
"        ->  Seq Scan on third t  (cost=0.00..4128080.88 rows=17193788 width=16) (actual time=0.004..119497.056 rows=17570724 loops=1)"
"        ->  Hash  (cost=3859478.04..3859478.04 rows=1715203 width=8) (actual time=5695.077..5695.077 rows=560503 loops=1)"
"              Buckets: 16384  Batches: 16  Memory Usage: 1390kB"
"              ->  Nested Loop  (cost=0.01..3859478.04 rows=1715203 width=8) (actual time=65.250..5533.413 rows=560503 loops=1)"
"                    ->  Index Scan using first_first_date_idx on first f  (cost=0.01..477985.28 rows=798447 width=8) (actual time=64.927..1688.341 rows=302663 loops=1)"
"                          Index Cond: (first_date < '2016-02-11 00:00:00'::timestamp without time zone)"
"                    ->  Index Scan using second_id_first_idx on second s (cost=0.00..4.17 rows=7 width=16) (actual time=0.010..0.012 rows=2 loops=302663)"
"                          Index Cond: (id_first = f.id_first)"
"Total runtime: 138303.306 ms"

Answer :

You should benefit from increasing work_mem to a resonable value. Being a test machine you can push hard on memory settings, so try the following:

work_mem = 32MB
shared_buffers = 4GB
effective_cache_size = 6GB

where shared_buffers and effective_cache_size are 50% and 75% of total RAM size respectively. work_mem should lay between 16MB and 32MB for your use case.Test it and give us a feedback.

There are also two other moves you should make:

  • Upgrade to 9.5 or 9.6, as suggested by @a_horse_with_no_name
  • Partition first table by first_date attribute, so the planner knows a priori in which partition date value ranges are stored. A common criteria is partitioning by month, semester, year etc. The granularity level should be set on the best trade-off between partition size (row numbers) and query execution over more partitions. Partitioning doesn’t affect your queries (except for re-writing some WHERE conditions in order to met the partition definition and to force to use specific partitions) and your application but requires to write more DDL statements in order to create check constraints, indexes, vacuum and tables.

Leave a Reply

Your email address will not be published. Required fields are marked *