Merge duplicates on condition

Posted on

Question :

I have the following table in a PostgreSQL db

id | value_1 | value_2
-----------------------------
1  | 100     | 10
1  | NULL    | 20
2  | 100     | 20
3  | 100     | 12
4  | 100     | 16
5  | NULL    | 200
5  | 400     | 200
5  | 580     | 200

I would like to retain the record with the max value given in value_1 and remove all other duplicated records (have the same id) so that the query will reduce the table to:

id | value_1 | value_2
-----------------------------
1  | 100     | 10
2  | 100     | 20
3  | 100     | 12
4  | 100     | 16
5  | 580     | 200

Answer :

This can easily be done using PostgreSQL’s (proprietary) distinct on () operator:

select distinct on (id) *
from the_table
order by id, value_1 desc nulls last;

Assuming:

  • You want to actually DELETE duplicates from the table.
  • id is defined NOT NULL.
  • There is at least one not-null value_1 per distinct id.
  • Only few duplicated to be deleted.
DELETE FROM tbl t0
WHERE  EXISTS (
   SELECT FROM tbl t1
   WHERE  t1.id = t0.id
   AND   (t1.value_1 > t0.value_1  -- a row with bigger value exists
       OR t0.value_1 IS NULL)      -- or NULL 
   );

An index on (id) or even (id, value_1) would be helpful in this case. Related:

After cleaning up, a UNIQUE index on (id) would keep it that way. (Plus making it NOT NULL.)

If any of my assumptions do not hold (be more specific!), a different solution may be in order.

Leave a Reply

Your email address will not be published. Required fields are marked *