Disclaimer: While the basis of my question—the greatest-n-per-group question—has been asked and also answered so many times before, I am facing a seemingly hard issue due to extra conditions that I have to apply.
In case I really missed the answer to my exact question that was given somewhere, I am sorry. I searched, read, tried, a lot, and I spent already half my day on this.
I tried to describe the abstract problem in the title: for any other field A value, get the row with some special field B value, and at the same time, respect further conditions.
Now, my concrete problem is as follows…
I have the following table:
CREATE TABLE IF NOT EXISTS `mdl_datasets` ( `id` bigint(10) NOT NULL AUTO_INCREMENT, `cmid` bigint(10) NOT NULL DEFAULT '0', `userid` bigint(10) NOT NULL DEFAULT '0', `timecreated` bigint(10) NOT NULL DEFAULT '0', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;
For each user (i.e.,
userid) in one specific module (i.e.,
cmid), I want to get the most recent row (i.e., maximum
timecreated), before a given timestamp (i.e., again
timecreated). The latter part makes it hard.
I don’t see how I can do this without using at least three (sub)queries, and even then I’m unable to make it work properly in any case.
So far, I only managed to get it partially working, for example, only for users with at least two rows. This is wrong, of course. If there is a user with only a single row, which has a correct
timecreated value, then I want to get this row as a result. For any other user, I want to get the one row with the greatest
timecreated value that is below the given maximum date (as Unix epoch timestamp).
Is there anything else that I should mention? Tell me.
I think that is exactly the same case of the
max-per-group but with some extended conditions and can be solved by single subquery:
SELECT IF( b.id IS NULL, NULL, a.id ) AS id -- an opposite to COALESCE() , a.cmid , a.userid , b.maxts -- can be NULL FROM table AS a LEFT JOIN ( SELECT id -- if you need no rows for users have no rows before -- the given timestamp replace the LEFT JOIN -- by plain JOIN , cmid , userid , MAX(timecreated) AS maxts FROM table WHERE cmid = given_cmid AND timecreated < given_timecreated -- additional condition GROUP BY userid ) AS b ON b.userid = a.userid AND b.cmid = a.cmid AND b.maxts = a.timecreated WHERE a.cmid = given_cmid ;
This query produce the result for each
userid in the table including those having no rows before the
given_timecreated but with
cmid set to the
WHERE cmid = given_cmid is used twice to reduce the size of the subquery result.
Sure you need an index
(cmid, userid, timecreated) for acceptable performance. The order columns listed in the index can be different according to the specific data you have.
select user_id, max(timecreated) from mdl_datasets where cmid=yourmoduleid and timecreated < yourtime group by user_id ;
Max will return the very last row before “yourtime”, for each user.
This works for postgresql, but i am not very sure for mysql, because group by of mysql sometimes actcs like distinct on (column) in postgresql that will bring any row without any criteria for the other columns (because that postgresql, will reject group by with columns out of it if you dont specify the criteria (like max or something)) – but i think because of max mysql will understand the exactly row we want for each user as postgresql does.