Question :
I have the following table in a PostgreSQL db
id | value_1 | value_2
-----------------------------
1 | 100 | 10
1 | NULL | 20
2 | 100 | 20
3 | 100 | 12
4 | 100 | 16
5 | NULL | 200
5 | 400 | 200
5 | 580 | 200
I would like to retain the record with the max value given in value_1
and remove all other duplicated records (have the same id
) so that the query will reduce the table to:
id | value_1 | value_2
-----------------------------
1 | 100 | 10
2 | 100 | 20
3 | 100 | 12
4 | 100 | 16
5 | 580 | 200
Answer :
This can easily be done using PostgreSQL’s (proprietary) distinct on ()
operator:
select distinct on (id) *
from the_table
order by id, value_1 desc nulls last;
Assuming:
- You want to actually
DELETE
duplicates from the table. id
is definedNOT NULL
.- There is at least one not-null
value_1
per distinctid
. - Only few duplicated to be deleted.
DELETE FROM tbl t0
WHERE EXISTS (
SELECT FROM tbl t1
WHERE t1.id = t0.id
AND (t1.value_1 > t0.value_1 -- a row with bigger value exists
OR t0.value_1 IS NULL) -- or NULL
);
An index on (id)
or even (id, value_1)
would be helpful in this case. Related:
After cleaning up, a UNIQUE
index on (id)
would keep it that way. (Plus making it NOT NULL
.)
If any of my assumptions do not hold (be more specific!), a different solution may be in order.