Question :
Disclaimer: While the basis of my question—the greatest-n-per-group question—has been asked and also answered so many times before, I am facing a seemingly hard issue due to extra conditions that I have to apply.
In case I really missed the answer to my exact question that was given somewhere, I am sorry. I searched, read, tried, a lot, and I spent already half my day on this.
I tried to describe the abstract problem in the title: for any other field A value, get the row with some special field B value, and at the same time, respect further conditions.
Now, my concrete problem is as follows…
I have the following table:
CREATE TABLE IF NOT EXISTS `mdl_datasets` (
`id` bigint(10) NOT NULL AUTO_INCREMENT,
`cmid` bigint(10) NOT NULL DEFAULT '0',
`userid` bigint(10) NOT NULL DEFAULT '0',
`timecreated` bigint(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;
For each user (i.e., userid
) in one specific module (i.e., cmid
), I want to get the most recent row (i.e., maximum timecreated
), before a given timestamp (i.e., again timecreated
). The latter part makes it hard.
I don’t see how I can do this without using at least three (sub)queries, and even then I’m unable to make it work properly in any case.
So far, I only managed to get it partially working, for example, only for users with at least two rows. This is wrong, of course. If there is a user with only a single row, which has a correct timecreated
value, then I want to get this row as a result. For any other user, I want to get the one row with the greatest timecreated
value that is below the given maximum date (as Unix epoch timestamp).
Is there anything else that I should mention? Tell me.
Answer :
I think that is exactly the same case of the max-per-group
but with some extended conditions and can be solved by single subquery:
SELECT IF( b.id IS NULL, NULL, a.id ) AS id -- an opposite to COALESCE()
, a.cmid
, a.userid
, b.maxts -- can be NULL
FROM table AS a
LEFT JOIN ( SELECT id -- if you need no rows for users have no rows before
-- the given timestamp replace the LEFT JOIN
-- by plain JOIN
, cmid
, userid
, MAX(timecreated) AS maxts
FROM table
WHERE cmid = given_cmid
AND timecreated < given_timecreated -- additional condition
GROUP BY userid
) AS b ON b.userid = a.userid
AND b.cmid = a.cmid
AND b.maxts = a.timecreated
WHERE a.cmid = given_cmid
;
This query produce the result for each userid
in the table including those having no rows before the given_timecreated
but with maxts
, id
and cmid
set to the NULL
.
Here condition WHERE cmid = given_cmid
is used twice to reduce the size of the subquery result.
Sure you need an index (cmid, userid, timecreated)
for acceptable performance. The order columns listed in the index can be different according to the specific data you have.
Try that:
select user_id, max(timecreated) from mdl_datasets
where cmid=yourmoduleid and timecreated < yourtime
group by user_id ;
Max will return the very last row before “yourtime”, for each user.
This works for postgresql, but i am not very sure for mysql, because group by of mysql sometimes actcs like distinct on (column) in postgresql that will bring any row without any criteria for the other columns (because that postgresql, will reject group by with columns out of it if you dont specify the criteria (like max or something)) – but i think because of max mysql will understand the exactly row we want for each user as postgresql does.