Merge two rows letting the second one override the first in PostgreSQL

Posted on

Question :

Say that you want to build an append only table in Postgres, where each modification is actually just a new row added to the table. Say that you have the below simple, and disregard all questions about performance, table of users.

create table user_revisions (id uuid, name text, password text, version bigserial);
insert into user_revisions (id, name, password)
values ('743ccdf0-e9d8-4268-b6d7-0645eb70feb9', 'Bengan', 'nthusaeo');
insert into user_revisions (id, name, password)
values ('743ccdf0-e9d8-4268-b6d7-0645eb70feb9', 'Bengan', 'åäö');

Getting the latest version of a user would look something like:

select * from user_revisions e1
order by version desc limit 1;

But then you come to the part where you want to start modifying the users. Each request to the database always has the uuid of the user being modified and some subset of all the fields that are otherwise present in the user_revisions table.
You want to make an insert with the last revision of the user as a set of default values? That is, do the equivalent of the Clojure code (merge a b) or Python {**a, **b}, where the two associative data structures a and b are merged and if there are any keys in a that also exist in b the values held in b will override those of a.

Below are some failed attempts at writing a select that produces the output that we in turn wants to insert as the latest revision.

with temp (password) as (values ('snthsnth'))
select * from user_revisions e1 natural left join temp

with temp (password) as (values ('snthsnth'))
select * from user_revisions e1 natural inner join temp

with temp (password) as (values ('snthsnth'))
select * from user_revisions e1 natural right join temp

The desired output would be:

 password  |                  id                  |  name  
 snthsnth  | 743ccdf0-e9d8-4268-b6d7-0645eb70feb9 | Bengan

This output would then be inserted into the table as the latest version of the user with the id 743ccdf0-e9d8-4268-b6d7-0645eb70feb9.

Answer :

It depends if you need something generic or if it can work with a “small” number of columns.

In your specific example, this query:

WITH new_data (id, name, password) AS (values ('743ccdf0-e9d8-4268-b6d7-0645eb70feb9'::uuid, NULL, 'snthsnth'))
COALESCE(new_data.password, current_data.password) AS password, 
FROM user_revisions current_data
INNER JOIN new_data USING (id)
WHERE id='743ccdf0-e9d8-4268-b6d7-0645eb70feb9'
ORDER BY current_data.version DESC LIMIT 1;

would produce the expected:

id                                      name    password    version
743ccdf0-e9d8-4268-b6d7-0645eb70feb9    Bengan  snthsnth    2

You can test things in this SQLFiddle:!17/c4700/5/0

I am quite confident it could be made generic if needed, but it may be enough already for your needs?

However it will not work correctly if you need to store NULL values, and if you need to handle transition such as “some data value => NULL”

In such cases there would be a specific need to realy test the existence of each field and just take its value if present, including if it is NULL

In passing I would also say like @Lennart that it is better to separate archive data from live data for many reasons, one would be performance, to compute the “next” version you will only have to read one row (the live one) and then doing something as above, instead of having to do things like order by version desc limit 1 each time, which will be more costly, even with an index on version.

Leave a Reply

Your email address will not be published. Required fields are marked *