For each value in field A, select the row with some special value in field B, while respecting further criteria

Posted on

Question :

Disclaimer: While the basis of my question—the greatest-n-per-group question—has been asked and also answered so many times before, I am facing a seemingly hard issue due to extra conditions that I have to apply.

In case I really missed the answer to my exact question that was given somewhere, I am sorry. I searched, read, tried, a lot, and I spent already half my day on this.


I tried to describe the abstract problem in the title: for any other field A value, get the row with some special field B value, and at the same time, respect further conditions.

Now, my concrete problem is as follows…
I have the following table:

CREATE TABLE IF NOT EXISTS `mdl_datasets` (
  `id` bigint(10) NOT NULL AUTO_INCREMENT,
  `cmid` bigint(10) NOT NULL DEFAULT '0',
  `userid` bigint(10) NOT NULL DEFAULT '0',
  `timecreated` bigint(10) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;

For each user (i.e., userid) in one specific module (i.e., cmid), I want to get the most recent row (i.e., maximum timecreated), before a given timestamp (i.e., again timecreated). The latter part makes it hard.

I don’t see how I can do this without using at least three (sub)queries, and even then I’m unable to make it work properly in any case.

So far, I only managed to get it partially working, for example, only for users with at least two rows. This is wrong, of course. If there is a user with only a single row, which has a correct timecreated value, then I want to get this row as a result. For any other user, I want to get the one row with the greatest timecreated value that is below the given maximum date (as Unix epoch timestamp).

Is there anything else that I should mention? Tell me.

Answer :

I think that is exactly the same case of the max-per-group but with some extended conditions and can be solved by single subquery:

SELECT IF( b.id IS NULL, NULL, a.id ) AS id -- an opposite to COALESCE()
     , a.cmid
     , a.userid
     , b.maxts -- can be NULL 
  FROM table AS a 
  LEFT JOIN ( SELECT id -- if you need no rows for users have no rows before 
                        -- the given timestamp replace the LEFT JOIN 
                        -- by plain JOIN
                   , cmid
                   , userid
                   , MAX(timecreated) AS maxts
                FROM table 
               WHERE cmid = given_cmid
                 AND timecreated < given_timecreated -- additional condition
               GROUP BY userid
            ) AS b ON b.userid = a.userid
                  AND b.cmid   = a.cmid
                  AND b.maxts  = a.timecreated
  WHERE a.cmid = given_cmid
;

This query produce the result for each userid in the table including those having no rows before the given_timecreated but with maxts, id and cmid set to the NULL.

Here condition WHERE cmid = given_cmid is used twice to reduce the size of the subquery result.

Sure you need an index (cmid, userid, timecreated) for acceptable performance. The order columns listed in the index can be different according to the specific data you have.

Try that:

select user_id,   max(timecreated) from mdl_datasets 
where  cmid=yourmoduleid and timecreated < yourtime
group by user_id ; 

Max will return the very last row before “yourtime”, for each user.
This works for postgresql, but i am not very sure for mysql, because group by of mysql sometimes actcs like distinct on (column) in postgresql that will bring any row without any criteria for the other columns (because that postgresql, will reject group by with columns out of it if you dont specify the criteria (like max or something)) – but i think because of max mysql will understand the exactly row we want for each user as postgresql does.

Leave a Reply

Your email address will not be published. Required fields are marked *