Question :
I have data-pulling functionality that once in 5 seconds grabs all the data from Postgres table basing on modified_timestamp column. It works the following way:
SELECT * FROM my_table WHERE modified_timestamp > _some_persisted_timestamp
- _some_persisted_timestamp = CURRENT_TIMESTAMP
- Processing data received from step 1
- Sleep for 5s
- Go to step 1
Where modified_timestamp updated with trigger (after any row update modified_timestamp becomes CURRENT_TIMESTAMP
).
It worked fine until I noticed that CURRENT_TIMESTAMP
in Postgres is transaction start timestamp in fact and some of the updates are lost. Why are they lost? That’s quite simple – at the moment when I execute query SELECT * FROM my_table WHERE modified_timestamp > _some_persisted_timestamp
some of the changes have already occurred, but modified_timestamp is before updated _some_persisted_timestamp because transaction is in progress still.
This problem could be easily solved by assigning in step 2 timestamp when update becomes visible for other transactions (transaction commit timestamp in other words) instead of CURRENT_TIMESTAMP or clock_timestamp().
I read documentation, but have found nothing related to transaction commit timestamp. Could you kindly suggest smth?
Btw, I’m aware of logical decoding and I know that this mechanism suits better for my needs in theory, but there are certain practical problems not allowing me to use it.
Answer :
This problem could be easily solved by assigning in step 2 timestamp
when update becomes visible for other transactions (transaction commit
timestamp in other words) instead of CURRENT_TIMESTAMP or
clock_timestamp().
This is logically impossible. Postgres writes new row versions before it finally commits to make them visible. It would require prophetic capabilities to write a future timestamp yet unknown at the time of writing.
However, you can get commit timestamps from a different source: since Postgres 9.5, there is a GUC setting track_commit_timestamp
to start logging commit timestamps globally.
Then you can get commit timestamps with the utility function pg_xact_commit_timestamp(xid)
. Your query could look like:
SELECT * FROM my_table t
WHERE pg_xact_commit_timestamp(t.xmin) > _some_persisted_timestamp;
Be aware that commit timestamps are not kept around forever. After
two billion transactions transactions (2^31), transaction IDs are “frozen”. That does not delete it right away, but after 4 billion transactions, the information is gone for certain. That’s a big number of transactions, and only very busy databases burn that much over a lifetime. But there can be programming errors burning through transaction numbers more quickly than expected …
Your step 2 and step 3 trade positions, and you record the commit timestamp instead of CURRENT_TIMESTAMP
– or xmin
from any freshly updated row to derive the commit timestamp with pg_xact_commit_timestamp()
once more.
More:
- How do I write a Postgres SQL command based on metadata of the tables themselves?
- https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.5#Commit_timestamp_tracking
About xmin
:
But I am not completely sure I understand your task. Maybe you need a queuing tool or process rows one by one like discussed here: