# Simplifying the “repetitive” code of a query involving UNION ALL, WHERE and LIMIT

Posted on

### Question :

How can I simplify the code of the query shown below? Requirements are always the same as I describe as follows:

``````(select * from questions where points = 3 and type = 1 order by rand() asc limit 6)
union all
(select * from questions where points = 2 and type = 1 order by rand() asc limit 4)
union all
(select * from questions where points = 1 and type = 1 order by rand() asc limit 2)
union all
(select * from questions where points = 3 and type = 0 order by rand() asc limit 10)
union all
(select * from questions where points = 2 and type = 0 order by rand() asc limit 6)
union all
(select * from questions where points = 1 and type = 0 order by rand() asc limit 4)
``````

Is there any way to do it without `UNION` ?

I am just curious if there is a faster/more efficient way of doing it. SQL Server or MySQL, I haven’t decided yet, because I am at design stage right now.

You can use the `ROW_NUMBER()` analytical function to generate row numbers for each partition of `(points, type)`:

``````SELECT
*,
ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
FROM
questions
``````

And then you can compare `rownum` to a certain value that depends on `points` and `type`:

``````WHERE
rownum <= some_expression
``````

Of course, you will need to use nesting in order to be able to reference `rownum` like that:

``````SELECT
...
FROM
(
SELECT
...
ROW_NUMBER() ... AS rownum
FROM
...
) AS derived
WHERE
rownum <= ...
``````

And you need to keep in mind that because of the nesting the `rownum` column becomes part of the derived table’s column set – so, specifying `*` as the main SELECT’s column list will include the `rownum` column as well. If you want it out of the result, you will have to list each `questions` column explicitly:

``````SELECT
question_id,  /* or whatever the PK column is going to be called */
points,
type,
...  /* other "questions" columns */
FROM
(
...
``````

Taking all of the above points into account (the last one in particular), I am unsure if the resulting query can be called a simplification of the original, but here is my attempt anyway:

``````SELECT
question_id,
points,
type,
...
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
FROM
questions
) AS derived
WHERE
rownum <= CASE
WHEN points = 3 AND type = 1 THEN 6
WHEN points = 2 AND type = 1 THEN 4
WHEN points = 3 AND type = 1 THEN 2
WHEN points = 3 AND type = 0 THEN 10
WHEN points = 2 AND type = 0 THEN 6
WHEN points = 1 AND type = 0 THEN 4
END
;
``````

The CASE expression returns a row limit for each required partition and a NULL for any other, thus effectively providing a filter to eliminate all irrelevant rows (the rows where `points` and `type` have values other than those mentioned in the conditions). This kind of filter, however, is not pushed down to the underlying `questions` table and, therefore, does not prevent the query from generating row numbers for all combinations of `(points, type)` before filtering out those that are not needed. You can avoid the redundant work by adding an explicit filter to the `questions` table like this:

``````  ...
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
FROM
questions
WHERE
points IN (1, 2, 3)
AND type IN (0, 1)
) AS derived
WHERE
...
``````

### Alternative

Alternatively you can rewrite the entire query in a different way, such that rows are filtered and row limits assigned at the same time. If you use a virtual table like this:

``````(
SELECT 3, 1,  6 UNION ALL
SELECT 2, 1,  4 UNION ALL
SELECT 1, 1,  2 UNION ALL
SELECT 3, 0, 10 UNION ALL
SELECT 2, 0,  6 UNION ALL
SELECT 1, 0,  4
) AS limits (points, type, rowlimit)
``````

you can join `questions` to it and thus have both a filter and a way to assign row limits to each partition. The full query will then look like this:

``````SELECT
question_id,
points,
type,
...
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
FROM
questions AS q
INNER JOIN
(
SELECT 3, 1,  6 UNION ALL
SELECT 2, 1,  4 UNION ALL
SELECT 1, 1,  2 UNION ALL
SELECT 3, 0, 10 UNION ALL
SELECT 2, 0,  6 UNION ALL
SELECT 1, 0,  4
) AS limits (points, type, rowlimit)
USING (points, type)
) AS derived
WHERE
rownum <= rowlimit
;
``````

You can then go further and make the `limits` dataset an actual table in your database. You have mentioned that requirements are not going to alter, but perhaps knowing that you can configure the requirements in the table will change that perspective.

### Product-related notes

You mention that you have not decided in which SQL product this is going to be used. Here are some product-related notes to consider.

1. Both variations of the solution will work in MySQL 8.0 or newer version. Version 8.0 is the minimum because that is where support for `ROW_NUMBER()` was first introduced in MySQL.

2. The other product you mentioned, SQL Server, has supported `ROW_NUMBER()` since version 2005 but other aspects of the solution will require changes in order to make it work in SQL Server.

2.1. You are using `ORDER BY RAND()` to randomise the rows selected. In SQL Server, the `RAND()` function is a runtime constant, therefore you will need to use a different method; `ORDER BY NEWID()` could be one:

``  ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY NEWID()) AS rownum``

2.2. The `JOIN ... USING` syntax is not supported in SQL Server. You will need to replace it with the `JOIN ... ON` one. Because of this switch, you will also need to rewrite the nested SELECT column list. The reason is, `USING` automatically suppresses duplication of the same-name columns used for joining, `points` and `type`, when you are using `*` in the SELECT clause. The `ON` operator has no such effect, therefore you will need to rewrite the derived query like this:

``````  (
SELECT
q.*,
limits.rowlimit,
ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY NEWID()) AS rownum
FROM
questions AS q
INNER JOIN ( ... ) AS limits (points, type, rowlimit)
ON q.points = limits.points AND q.type = limits.type
) AS derived
``````

2.3. While not a required change per se, you can take advantage of SQL Server’s support of the `VALUES` constructor as a means of defining a derived table to make the `limits` definition more compact:

``````  (
VALUES
(3, 1,  6),
(2, 1,  4),
(1, 1,  2),
(3, 0, 10),
(2, 0,  6),
(1, 0,  4)
) AS limits (points, type, rowlimit)
``````

The result of the function will not change for the duration of the SELECT statement’s execution.