Using session variables to generate GROUP information

Posted on

Question :

We want to know whether the given approach is correct or not. I understand that there might other and presumably better and more optimized ways of accomplishing the task. But it is a curiosity that I want to fulfill. Consider it an academic exercise.
Consider the following two tables :

    TABLE-ONE                   TABLE-TWO
+----+-----------+         +----+-------+-------+
| id | key       |         | id | fk_id | value |
+----+-----------+         +----+-------+-------+
|  1 | k1        |         |  1 | 1     | v1    |
|  2 | k2        |         |  2 | 1     | v2    |
|  3 | k3        |         |  3 | 1     | v3    |
+----+-----------+         |  4 | 2     | v1    |
                           |  5 | 2     | v2    |
                           |  6 | 3     | v1    |
                           |  7 | 3     | v2    |
                           |  8 | 3     | v3    |
                           |  9 | 3     | v4    |
                           | 10 | 3     | v5    |
                           +----+-------+-------+

We need to JOIN these two tables and ORDER the result on TABLE-ONE.id
The resultant table should look like :

+----+-------+-------+----------------+
| id | key   | value | is_first_entry |
+----+-------+-------+----------------+
|  1 | k1    | v1    |  1             |
|  1 | k1    | v2    |  0             |
|  1 | k1    | v3    |  0             |
|  2 | k2    | v1    |  1             |
|  2 | k2    | v2    |  0             |
|  3 | k3    | v1    |  1             |
|  3 | k3    | v2    |  0             |
|  3 | k3    | v3    |  0             |
|  3 | k3    | v4    |  0             |
|  3 | k3    | v5    |  0             |
+----+-------+-------+----------------+

The is_first_entry column should return 1 if it is the first entry for that key in the result-set. Else it should return zero.
One way of achieving this is by via an INNER JOIN. We were exploring the option of achieving the result without the self join
by using session-variables.

To achieve this, we first created a function

CREATE FUNCTION `is_first_entry`( _id INT) RETURNS int(11)
BEGIN

IF @current_id is NULL
THEN
  SET @current_id = _id;
  RETURN 1;
END IF;

IF @current_id = _id
THEN
  RETURN 0;
ELSE 
  SET @current_id = _id;
  RETURN 1;
END IF;

END

The final query that we created to achieve the result.

SELECT one.id, one.key, two.value, is_first_entry(one.id) as is_first_entry
FROM TABLE-ONE one
INNER JOIN TABLE-TWO two
ON one.id = two.fk_id;
INNER JOIN (SELECT @current_id := 1 as v) t
on 1 = 1
ORDER BY one.id;

Even though it is working for small data set.
Will it give correct result on scale ?
Can someone think of a scenario/example where this technique might break. It is okay, if this query executes slowly.
For the time being, we are only concerned about the correctness and not about optimization.

Answer :

You asked if I had a counter point, yes I have.

Your list is unsorted and therefore gives a false result. if a later value comes to the mix.

Only when you don’t let latecomers into your table, it would hold.

To solve the problem with the late comers, you first have to get is_first value from a sorted table-two and inner join t1 later.

CREATE TABLE `TABLE-TWO`   (
  `id` INTEGER,
  `fk_id` INTEGER,
  `value` VARCHAR(2)
);

INSERT INTO `TABLE-TWO`  
  (`id`, `fk_id`, `value`)
VALUES
  ('1', '1', 'v1'),
  ('2', '1', 'v2'),
  ('3', '1', 'v3'),
  ('4', '2', 'v1'),
  ('5', '2', 'v2'),
  ('6', '3', 'v1'),
  ('7', '3', 'v2'),
  ('8', '3', 'v3'),
  ('9', '3', 'v4'),
  ('10', '3', 'v5'),
  ('11', '2', 'v3');

CREATE TABLE `TABLE-ONE` (
  `id` INTEGER,
  `key` VARCHAR(2)
);

INSERT INTO `TABLE-ONE`
  (`id`, `key`)
VALUES
  ('1', 'k1'),
  ('2', 'k2'),
  ('3', 'k3');
✓

✓

✓

✓
CREATE FUNCTION `is_first_entry`( _id INT) RETURNS int(11)
BEGIN

IF @current_id is NULL
THEN
  SET @current_id = _id;
  RETURN 1;
END IF;

IF @current_id = _id
THEN
  RETURN 0;
ELSE 
  SET @current_id = _id;
  RETURN 1;
END IF;

END
SELECT 
  t2a.id
    ,t1.key
    ,t2a.value,IF(t2b.minid = t2a.id,1,0) is_first_entry
FROM
`TABLE-TWO` t2a 
INNER JOIN (SELECT MIN(id) minid,fk_id FROM `TABLE-TWO` GROUP BY fk_id) t2b ON t2a.fk_id = t2b. fk_id 
INNER JOIN `TABLE-ONE` t1 ON t1.id = t2a.fk_id
ORDER By id
id | key | value | is_first_entry
-: | :-- | :---- | -------------:
 1 | k1  | v1    |              1
 2 | k1  | v2    |              0
 3 | k1  | v3    |              0
 4 | k2  | v1    |              1
 5 | k2  | v2    |              0
 6 | k3  | v1    |              1
 7 | k3  | v2    |              0
 8 | k3  | v3    |              0
 9 | k3  | v4    |              0
10 | k3  | v5    |              0
11 | k2  | v3    |              0
SELECT one.id, one.key, two.value, is_first_entry(one.id) as is_first_entry
FROM `TABLE-ONE` one
INNER JOIN `TABLE-TWO` two
ON one.id = two.fk_id;
INNER JOIN (SELECT @current_id := 1 as v) t
on 1 = 1
ORDER BY one.id;
id | key | value | is_first_entry
-: | :-- | :---- | -------------:
 1 | k1  | v1    |              1
 1 | k1  | v2    |              0
 1 | k1  | v3    |              0
 2 | k2  | v1    |              1
 2 | k2  | v2    |              0
 3 | k3  | v1    |              1
 3 | k3  | v2    |              0
 3 | k3  | v3    |              0
 3 | k3  | v4    |              0
 3 | k3  | v5    |              0
 2 | k2  | v3    |              1

db<>fiddle here

As id is your only indicator for your first selection you can do it simply by choosing the minimum of id

SELECT 
    t2a.id
    ,t1.key
    ,t2a.value,IF(t2b.minid = t2a.id,1,0) is_first_entry
FROM
`TABLE-TWO` t2a 
INNER JOIN (SELECT MIN(id) minid,fk_id FROM `TABLE-TWO` GROUP BY fk_id) t2b ON t2a.fk_id = t2b. fk_id 
INNER JOIN `TABLE-ONE` t1 ON t1.id = t2a.fk_id
ORDER By id
CREATE TABLE `TABLE-TWO`   (
  `id` INTEGER,
  `fk_id` INTEGER,
  `value` VARCHAR(2)
);

INSERT INTO `TABLE-TWO`  
  (`id`, `fk_id`, `value`)
VALUES
  ('1', '1', 'v1'),
  ('2', '1', 'v2'),
  ('3', '1', 'v3'),
  ('4', '2', 'v1'),
  ('5', '2', 'v2'),
  ('6', '3', 'v1'),
  ('7', '3', 'v2'),
  ('8', '3', 'v3'),
  ('9', '3', 'v4'),
  ('10', '3', 'v5');

CREATE TABLE `TABLE-ONE` (
  `id` INTEGER,
  `key` VARCHAR(2)
);

INSERT INTO `TABLE-ONE`
  (`id`, `key`)
VALUES
  ('1', 'k1'),
  ('2', 'k2'),
  ('3', 'k3');
✓

✓

✓

✓
SELECT 
  t2a.id
  ,t1.key
  ,t2a.value,IF(t2b.minid = t2a.id,1,0) is_first_entry
FROM
`TABLE-TWO` t2a 
INNER JOIN (SELECT MIN(id) minid,fk_id FROM `TABLE-TWO` GROUP BY fk_id) t2b ON t2a.fk_id = t2b. fk_id 
INNER JOIN `TABLE-ONE` t1 ON t1.id = t2a.fk_id
ORDER By id
id | key | value | is_first_entry
-: | :-- | :---- | -------------:
 1 | k1  | v1    |              1
 2 | k1  | v2    |              0
 3 | k1  | v3    |              0
 4 | k2  | v1    |              1
 5 | k2  | v2    |              0
 6 | k3  | v1    |              1
 7 | k3  | v2    |              0
 8 | k3  | v3    |              0
 9 | k3  | v4    |              0
10 | k3  | v5    |              0

db<>fiddle here

Leave a Reply

Your email address will not be published. Required fields are marked *