We are using postgres 11.4 and nodejs to run sql scripts. We use node-postgres module and its Pool feature, setting it to 10.
Database is running on a server with 4GB total memory. When the script is not running the total available memory is around 200MB. When the script is running it can go down to 160MB.
We have a script that most of the time is able to complete on smaller operations but gets stuck because of a query / queries that’s being awaited and doesn’t finish on larger operations.
The queries that don’t finish have a state of ‘active’ when we do a select on pg_stat_activity. Other connections from the same pool (based on client_addr) have a state of ‘idle’.
The node script running these queries does not appear to have become unresponsive when this happens, (we will verify this as it has a message client in it).
In the node script there is a query that fetches 1000 records and these records are iterated over with some logic executed on each one. The loop does a progress update after every 500 records processed and then fetches another 1000 records. This is done to process large numbers of records. Eg; 2 – 3 million of records.
The progress updates are done as async database calls that consist of one update and one select. The update query is asyc/awaited but the select is just async fire and forget, and then loop continues.
We are seeing that when we process a large batch of records that some of the two update/select queries for progress get stuck at the same time. A select query and an update query on the same table. The update is for updating one of the progress columns, and the select computes total progress that is then sent to the client.
Sometimes it happens on the maindata table while doing a select of the 1000 records to be processed but that doesn’t happen often.
Sometimes when there is a query that gets stuck we also see autovacuum: VACUUM tablename in pg_stat_activity. And usually once that autovacuum disappears, one
pair of select,update query gets to finish and the script it is running in runs to completion. BUT not all pair of select,update or other query not related to the table finishes.
Also, after a query(s) is stuck, we can still run select/update/insert queries on that same table from another node script without issue.
We have no idea what is causing this or how to identify/fix the issue.
Any help would be much appreciated.
Thank your for your time.
Screenshot of pg_stat_activity showing stuck select statement and all rows in pg_locks. Couldn’t get the select,update combo that get stucks; for now seems to get eventually unstuck.
Select progress query:
select ((tn.progress1 + tn.progress2 + tn.progress3) * 100 / (3 * 100)) total_percent ,concat( 'Total Progress: ', ((tn.progress1 + tn.progress2 + tn.progress3) * 100 / (3 * 100)), E'n', 'Status: ', E'n', tn.message, E'n', E'nn' ) final_message ,(case coalesce(tn.state,0) <> [ERRORCODE] when TRUE then 0 else 1 end) error from table tn where tn.id = [id] and tn.guid1 = [guid1] and tn.guid2 = [guid2] group by tn.progress1, tn.progress2, tn.progress3, tn.message, tn.state;
We are now looking at docker + logs as having caused the issue. It’s not the database.