Question :
I guess this is an easy and stupid question. Consider this MySQL purchase
table (where p_id
is autoincremented):
+---------+-------------+---------------+---------+
| p_id | item_id | user_id | count | ...
+---------+-------------+---------------+---------+
| 1 | 4 | 1 | 22|
| 2 | 4 | 2 | 1|
| 3 | 1 | 1 | 1|
| 4 | 0 | 3 | 1|
| 5 | 3 | 1 | 182|
| 6 | 0 | 4 | 1|
| 7 | 3 | 2 | 7|
| 8 | 3 | 2 | 14|
+---------+-------------+---------------+---------+
What’s an appropriate way (query) to get the rows where the users whose ids are 1 and 2 have purchased the same items, and only retrieved their last purchase of those items?
The result should be something like this:
+---------+-------------+---------------+---------+
| p_id | item_id | user_id | count | ...
+---------+-------------+---------------+---------+
| 1 | 4 | 1 | 22|
| 2 | 4 | 2 | 1|
| 5 | 3 | 1 | 182|
| 8 | 3 | 2 | 14|
+---------+-------------+---------------+---------+
Answer :
Assuming:
p_id
is unique- ‘last purchase’ is defined by highest
p_id
MySQL 5.5.32 Schema Setup:
create table t(p_id integer, item_id integer, user_id integer, cnt integer);
insert into t(p_id, item_id, user_id, cnt) values(1,4,1,22);
insert into t(p_id, item_id, user_id, cnt) values(2,4,2,1);
insert into t(p_id, item_id, user_id, cnt) values(3,1,1,1);
insert into t(p_id, item_id, user_id, cnt) values(4,0,3,1);
insert into t(p_id, item_id, user_id, cnt) values(5,3,1,182);
insert into t(p_id, item_id, user_id, cnt) values(6,0,4,1);
insert into t(p_id, item_id, user_id, cnt) values(7,3,2,7);
insert into t(p_id, item_id, user_id, cnt) values(8,3,2,14);
Query 1:
select *
from( select *
from t tt
where user_id=1 and p_id=( select max(p_id)
from t
where user_id=1 and item_id=tt.item_id )
union all
select *
from t tt
where user_id=2 and p_id=( select max(p_id)
from t
where user_id=2 and item_id=tt.item_id ) ) u
where item_id in(select item_id from t where user_id=1)
and item_id in(select item_id from t where user_id=2)
| P_ID | ITEM_ID | USER_ID | CNT |
|------|---------|---------|-----|
| 1 | 4 | 1 | 22 |
| 5 | 3 | 1 | 182 |
| 2 | 4 | 2 | 1 |
| 8 | 3 | 2 | 14 |
For lack of information, I make the same assumptions as @Jack:
p_id
is unique.- ‘last purchase’ is defined by highest
p_id
.
SELECT t.*
FROM t
JOIN (
SELECT min(p_id) AS p_id1
,max(p_id) AS p_id2
FROM (
SELECT user_id, item_id, max(p_id) AS p_id
FROM t
WHERE user_id IN (1,2)
GROUP BY user_id, item_id
) sub1
GROUP BY item_id
HAVING count(*) = 2 -- for two user_ids
) sub2 ON t.p_id = sub2.p_id1
OR t.p_id = sub2.p_id2
ORDER BY t.p_id;
This form avoids correlated subqueries, which should therefore perform better. It works well for two IDs at a time, but not for more.
Could be much simpler if MySQL supported CTEs or window functions like other modern RDBMS. Or even DISTINCT ON
(Postgres specific).
The query that you are looking for, might be similar to this:
SELECT
pp1 . *
FROM
(SELECT
*
FROM
(SELECT
*
FROM
purchase
WHERE
user_id IN (1 , 2)
ORDER BY p_id DESC) pp
GROUP BY pp.user_id , pp.item_id
ORDER BY pp.p_id) pp1
INNER JOIN
(SELECT
*
FROM
(SELECT
*
FROM
purchase
WHERE
user_id IN (1 , 2)
ORDER BY p_id DESC) pp
GROUP BY pp.user_id , pp.item_id
ORDER BY pp.p_id) pp2 ON pp1.item_id = pp2.item_id
AND pp1.user_id != pp2.user_id
ORDER BY pp1.p_id;