I have data-pulling functionality that once in 5 seconds grabs all the data from Postgres table basing on modified_timestamp column. It works the following way:
SELECT * FROM my_table WHERE modified_timestamp > _some_persisted_timestamp
- _some_persisted_timestamp = CURRENT_TIMESTAMP
- Processing data received from step 1
- Sleep for 5s
- Go to step 1
Where modified_timestamp updated with trigger (after any row update modified_timestamp becomes
It worked fine until I noticed that
CURRENT_TIMESTAMP in Postgres is transaction start timestamp in fact and some of the updates are lost. Why are they lost? That’s quite simple – at the moment when I execute query
SELECT * FROM my_table WHERE modified_timestamp > _some_persisted_timestamp some of the changes have already occurred, but modified_timestamp is before updated _some_persisted_timestamp because transaction is in progress still.
This problem could be easily solved by assigning in step 2 timestamp when update becomes visible for other transactions (transaction commit timestamp in other words) instead of CURRENT_TIMESTAMP or clock_timestamp().
I read documentation, but have found nothing related to transaction commit timestamp. Could you kindly suggest smth?
Btw, I’m aware of logical decoding and I know that this mechanism suits better for my needs in theory, but there are certain practical problems not allowing me to use it.
This problem could be easily solved by assigning in step 2 timestamp
when update becomes visible for other transactions (transaction commit
timestamp in other words) instead of CURRENT_TIMESTAMP or
This is logically impossible. Postgres writes new row versions before it finally commits to make them visible. It would require prophetic capabilities to write a future timestamp yet unknown at the time of writing.
However, you can get commit timestamps from a different source: since Postgres 9.5, there is a GUC setting
track_commit_timestamp to start logging commit timestamps globally.
Then you can get commit timestamps with the utility function
pg_xact_commit_timestamp(xid). Your query could look like:
SELECT * FROM my_table t WHERE pg_xact_commit_timestamp(t.xmin) > _some_persisted_timestamp;
Be aware that commit timestamps are not kept around forever. After
two billion transactions transactions (2^31), transaction IDs are “frozen”. That does not delete it right away, but after 4 billion transactions, the information is gone for certain. That’s a big number of transactions, and only very busy databases burn that much over a lifetime. But there can be programming errors burning through transaction numbers more quickly than expected …
Your step 2 and step 3 trade positions, and you record the commit timestamp instead of
CURRENT_TIMESTAMP – or
xmin from any freshly updated row to derive the commit timestamp with
pg_xact_commit_timestamp() once more.
- How do I write a Postgres SQL command based on metadata of the tables themselves?
But I am not completely sure I understand your task. Maybe you need a queuing tool or process rows one by one like discussed here: