Question :
Assume I have a PET table with three columns: Name, Type, and Date. I want to order the three columns by Date but only display the ONE most recent of a specific particular type (“Dog” for this example), all for the rest. For example, my raw list of data:
Name Type Date
Nemo Fish June 1, 2016
Fido Dog January 1, 2016
Felix Dog February 1, 2016
Whiskers Cat April 1, 2016
Marlin Fish August 1, 2016
Shifu Cat March 3, 2016
Would look like this when the query was applied.
Felix Dog February 1, 2016
Shifu Cat March 3, 2016
Whiskers Cat April 1, 2016
Nemo Fish June 1, 2016
Marlin Fish August 1, 2016
I’m sure there are multiple ways to do this, but only a few that are the most optimal from an efficiency perspective.
Answer :
Solution 1
CREATE TABLE pet ( name text, type text, date date);
INSERT INTO pet VALUES
('Nemo', 'Fish', 'June 1, 2016'),
('Fido', 'Dog', 'January 1, 2016'),
('Felix', 'Dog', 'February 1, 2016'),
('Whiskers', 'Cat', 'April 1, 2016'),
('Marlin', 'Fish', 'August 1, 2016'),
('Shifu', 'Cat', 'March 3, 2016');
Using PostgreSQL-specific DISTINCT ON clause we can do this in a single table scan:
SELECT name, type, date
FROM (
SELECT DISTINCT ON (
type,
CASE WHEN type<>'Dog' THEN date END
)
name, type, date
FROM pet
ORDER BY
type,
CASE WHEN type<>'Dog' THEN date END,
date DESC
) AS subq1
ORDER BY date;
Solution 2
Using more ANSI-compatible SQL, like UNION and LIMIT. It will work on MySQL, DB2 and some others. Similar solution can be done on Oracle, just replace LIMIT with ROWNUM.
(
SELECT * FROM pet
WHERE type = 'Dog'
ORDER BY date DESC
LIMIT 1
) UNION ALL (
SELECT * FROM pet
WHERE type <> 'Dog'
)
ORDER BY date;
Notes on performance
With small tables (less than 1000000 rows), any solution will work. DISTINCT ON
is slightly faster:
Iterations: 100000
Query: /*q1*/ SELECT name, type, date FROM ( SELECT DISTINCT ON ( type, CASE WHEN type<>'Dog' THEN date END ) name, type, date FROM pet ORDER BY type, CASE WHEN type<>'Dog' THEN date END, date DESC ) AS subq1 ORDER BY date;
Time: 24.834 s (31%)
Average: 0.248 ms
Rows: 500000
Winner: 80914 times (80%)
Query: /*q2*/ ( SELECT * FROM pet WHERE type = 'Dog' ORDER BY date DESC LIMIT 1 ) UNION ALL ( SELECT * FROM pet WHERE type <> 'Dog' ) ORDER BY date;
Time: 26.778 s (33%)
Average: 0.268 ms
Rows: 500000
Winner: 12881 times (12%)
Query: /*q3*/ select name, type, date from ( select pet.*, case when type = 'Dog' then rank() over (partition by type order by date desc) else 1 end as rnk from pet ) as x where rnk = 1 order by date;
Time: 27.490 s (34%)
Average: 0.275 ms
Rows: 500000
Winner: 6205 times (6%)
With “big data”, make sure your query does only one, indexed, table scan.
As a complement to @filiprems solution, here’s one using a conditional window function:
test=# select name, type, date from (
select pet.*, case when type <> 'Dog'
then 1
else rank() over (partition by type
order by date desc) end as rnk
from pet
) as x
where rnk = 1
order by date;
name | type | date
----------+------+------------
Felix | Dog | 2016-02-01
Shifu | Cat | 2016-03-03
Whiskers | Cat | 2016-04-01
Nemo | Fish | 2016-06-01
Marlin | Fish | 2016-08-01
(5 rows)
I don’t expect it to better in any regard, it just demonstrates another way of doing things.