Question :
How can I simplify the code of the query shown below? Requirements are always the same as I describe as follows:
(select * from questions where points = 3 and type = 1 order by rand() asc limit 6)
union all
(select * from questions where points = 2 and type = 1 order by rand() asc limit 4)
union all
(select * from questions where points = 1 and type = 1 order by rand() asc limit 2)
union all
(select * from questions where points = 3 and type = 0 order by rand() asc limit 10)
union all
(select * from questions where points = 2 and type = 0 order by rand() asc limit 6)
union all
(select * from questions where points = 1 and type = 0 order by rand() asc limit 4)
Is there any way to do it without UNION
?
I am just curious if there is a faster/more efficient way of doing it. SQL Server or MySQL, I haven’t decided yet, because I am at design stage right now.
Answer :
You can use the ROW_NUMBER()
analytical function to generate row numbers for each partition of (points, type)
:
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
FROM
questions
And then you can compare rownum
to a certain value that depends on points
and type
:
WHERE
rownum <= some_expression
Of course, you will need to use nesting in order to be able to reference rownum
like that:
SELECT
...
FROM
(
SELECT
...
ROW_NUMBER() ... AS rownum
FROM
...
) AS derived
WHERE
rownum <= ...
And you need to keep in mind that because of the nesting the rownum
column becomes part of the derived table’s column set – so, specifying *
as the main SELECT’s column list will include the rownum
column as well. If you want it out of the result, you will have to list each questions
column explicitly:
SELECT
question_id, /* or whatever the PK column is going to be called */
points,
type,
... /* other "questions" columns */
FROM
(
...
Taking all of the above points into account (the last one in particular), I am unsure if the resulting query can be called a simplification of the original, but here is my attempt anyway:
SELECT
question_id,
points,
type,
...
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
FROM
questions
) AS derived
WHERE
rownum <= CASE
WHEN points = 3 AND type = 1 THEN 6
WHEN points = 2 AND type = 1 THEN 4
WHEN points = 3 AND type = 1 THEN 2
WHEN points = 3 AND type = 0 THEN 10
WHEN points = 2 AND type = 0 THEN 6
WHEN points = 1 AND type = 0 THEN 4
END
;
The CASE expression returns a row limit for each required partition and a NULL for any other, thus effectively providing a filter to eliminate all irrelevant rows (the rows where points
and type
have values other than those mentioned in the conditions). This kind of filter, however, is not pushed down to the underlying questions
table and, therefore, does not prevent the query from generating row numbers for all combinations of (points, type)
before filtering out those that are not needed. You can avoid the redundant work by adding an explicit filter to the questions
table like this:
...
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
FROM
questions
WHERE
points IN (1, 2, 3)
AND type IN (0, 1)
) AS derived
WHERE
...
Alternative
Alternatively you can rewrite the entire query in a different way, such that rows are filtered and row limits assigned at the same time. If you use a virtual table like this:
(
SELECT 3, 1, 6 UNION ALL
SELECT 2, 1, 4 UNION ALL
SELECT 1, 1, 2 UNION ALL
SELECT 3, 0, 10 UNION ALL
SELECT 2, 0, 6 UNION ALL
SELECT 1, 0, 4
) AS limits (points, type, rowlimit)
you can join questions
to it and thus have both a filter and a way to assign row limits to each partition. The full query will then look like this:
SELECT
question_id,
points,
type,
...
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
FROM
questions AS q
INNER JOIN
(
SELECT 3, 1, 6 UNION ALL
SELECT 2, 1, 4 UNION ALL
SELECT 1, 1, 2 UNION ALL
SELECT 3, 0, 10 UNION ALL
SELECT 2, 0, 6 UNION ALL
SELECT 1, 0, 4
) AS limits (points, type, rowlimit)
USING (points, type)
) AS derived
WHERE
rownum <= rowlimit
;
You can then go further and make the limits
dataset an actual table in your database. You have mentioned that requirements are not going to alter, but perhaps knowing that you can configure the requirements in the table will change that perspective.
Product-related notes
You mention that you have not decided in which SQL product this is going to be used. Here are some product-related notes to consider.
-
Both variations of the solution will work in MySQL 8.0 or newer version. Version 8.0 is the minimum because that is where support for
ROW_NUMBER()
was first introduced in MySQL. -
The other product you mentioned, SQL Server, has supported
ROW_NUMBER()
since version 2005 but other aspects of the solution will require changes in order to make it work in SQL Server.2.1. You are using
ORDER BY RAND()
to randomise the rows selected. In SQL Server, theRAND()
function is a runtime constant†, therefore you will need to use a different method;ORDER BY NEWID()
could be one:ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY NEWID()) AS rownum
2.2. The
JOIN ... USING
syntax is not supported in SQL Server. You will need to replace it with theJOIN ... ON
one. Because of this switch, you will also need to rewrite the nested SELECT column list. The reason is,USING
automatically suppresses duplication of the same-name columns used for joining,points
andtype
, when you are using*
in the SELECT clause. TheON
operator has no such effect, therefore you will need to rewrite the derived query like this:( SELECT q.*, limits.rowlimit, ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY NEWID()) AS rownum FROM questions AS q INNER JOIN ( ... ) AS limits (points, type, rowlimit) ON q.points = limits.points AND q.type = limits.type ) AS derived
2.3. While not a required change per se, you can take advantage of SQL Server’s support of the
VALUES
constructor as a means of defining a derived table to make thelimits
definition more compact:( VALUES (3, 1, 6), (2, 1, 4), (1, 1, 2), (3, 0, 10), (2, 0, 6), (1, 0, 4) ) AS limits (points, type, rowlimit)
†The result of the function will not change for the duration of the SELECT statement’s execution.