Question :
We want to know whether the given approach is correct or not. I understand that there might other and presumably better and more optimized ways of accomplishing the task. But it is a curiosity that I want to fulfill. Consider it an academic exercise.
Consider the following two tables :
TABLE-ONE TABLE-TWO
+----+-----------+ +----+-------+-------+
| id | key | | id | fk_id | value |
+----+-----------+ +----+-------+-------+
| 1 | k1 | | 1 | 1 | v1 |
| 2 | k2 | | 2 | 1 | v2 |
| 3 | k3 | | 3 | 1 | v3 |
+----+-----------+ | 4 | 2 | v1 |
| 5 | 2 | v2 |
| 6 | 3 | v1 |
| 7 | 3 | v2 |
| 8 | 3 | v3 |
| 9 | 3 | v4 |
| 10 | 3 | v5 |
+----+-------+-------+
We need to JOIN these two tables and ORDER the result on TABLE-ONE.id
The resultant table should look like :
+----+-------+-------+----------------+
| id | key | value | is_first_entry |
+----+-------+-------+----------------+
| 1 | k1 | v1 | 1 |
| 1 | k1 | v2 | 0 |
| 1 | k1 | v3 | 0 |
| 2 | k2 | v1 | 1 |
| 2 | k2 | v2 | 0 |
| 3 | k3 | v1 | 1 |
| 3 | k3 | v2 | 0 |
| 3 | k3 | v3 | 0 |
| 3 | k3 | v4 | 0 |
| 3 | k3 | v5 | 0 |
+----+-------+-------+----------------+
The is_first_entry column should return 1 if it is the first entry for that key in the result-set. Else it should return zero.
One way of achieving this is by via an INNER JOIN. We were exploring the option of achieving the result without the self join
by using session-variables.
To achieve this, we first created a function
CREATE FUNCTION `is_first_entry`( _id INT) RETURNS int(11)
BEGIN
IF @current_id is NULL
THEN
SET @current_id = _id;
RETURN 1;
END IF;
IF @current_id = _id
THEN
RETURN 0;
ELSE
SET @current_id = _id;
RETURN 1;
END IF;
END
The final query that we created to achieve the result.
SELECT one.id, one.key, two.value, is_first_entry(one.id) as is_first_entry
FROM TABLE-ONE one
INNER JOIN TABLE-TWO two
ON one.id = two.fk_id;
INNER JOIN (SELECT @current_id := 1 as v) t
on 1 = 1
ORDER BY one.id;
Even though it is working for small data set.
Will it give correct result on scale ?
Can someone think of a scenario/example where this technique might break. It is okay, if this query executes slowly.
For the time being, we are only concerned about the correctness and not about optimization.
Answer :
You asked if I had a counter point, yes I have.
Your list is unsorted and therefore gives a false result. if a later value comes to the mix.
Only when you don’t let latecomers into your table, it would hold.
To solve the problem with the late comers, you first have to get is_first value from a sorted table-two and inner join t1 later.
CREATE TABLE `TABLE-TWO` ( `id` INTEGER, `fk_id` INTEGER, `value` VARCHAR(2) ); INSERT INTO `TABLE-TWO` (`id`, `fk_id`, `value`) VALUES ('1', '1', 'v1'), ('2', '1', 'v2'), ('3', '1', 'v3'), ('4', '2', 'v1'), ('5', '2', 'v2'), ('6', '3', 'v1'), ('7', '3', 'v2'), ('8', '3', 'v3'), ('9', '3', 'v4'), ('10', '3', 'v5'), ('11', '2', 'v3'); CREATE TABLE `TABLE-ONE` ( `id` INTEGER, `key` VARCHAR(2) ); INSERT INTO `TABLE-ONE` (`id`, `key`) VALUES ('1', 'k1'), ('2', 'k2'), ('3', 'k3');
✓ ✓ ✓ ✓
CREATE FUNCTION `is_first_entry`( _id INT) RETURNS int(11) BEGIN IF @current_id is NULL THEN SET @current_id = _id; RETURN 1; END IF; IF @current_id = _id THEN RETURN 0; ELSE SET @current_id = _id; RETURN 1; END IF; END
✓
SELECT t2a.id ,t1.key ,t2a.value,IF(t2b.minid = t2a.id,1,0) is_first_entry FROM `TABLE-TWO` t2a INNER JOIN (SELECT MIN(id) minid,fk_id FROM `TABLE-TWO` GROUP BY fk_id) t2b ON t2a.fk_id = t2b. fk_id INNER JOIN `TABLE-ONE` t1 ON t1.id = t2a.fk_id ORDER By id
id | key | value | is_first_entry -: | :-- | :---- | -------------: 1 | k1 | v1 | 1 2 | k1 | v2 | 0 3 | k1 | v3 | 0 4 | k2 | v1 | 1 5 | k2 | v2 | 0 6 | k3 | v1 | 1 7 | k3 | v2 | 0 8 | k3 | v3 | 0 9 | k3 | v4 | 0 10 | k3 | v5 | 0 11 | k2 | v3 | 0
SELECT one.id, one.key, two.value, is_first_entry(one.id) as is_first_entry FROM `TABLE-ONE` one INNER JOIN `TABLE-TWO` two ON one.id = two.fk_id; INNER JOIN (SELECT @current_id := 1 as v) t on 1 = 1 ORDER BY one.id;
id | key | value | is_first_entry -: | :-- | :---- | -------------: 1 | k1 | v1 | 1 1 | k1 | v2 | 0 1 | k1 | v3 | 0 2 | k2 | v1 | 1 2 | k2 | v2 | 0 3 | k3 | v1 | 1 3 | k3 | v2 | 0 3 | k3 | v3 | 0 3 | k3 | v4 | 0 3 | k3 | v5 | 0 2 | k2 | v3 | 1
db<>fiddle here
As id is your only indicator for your first selection you can do it simply by choosing the minimum of id
SELECT
t2a.id
,t1.key
,t2a.value,IF(t2b.minid = t2a.id,1,0) is_first_entry
FROM
`TABLE-TWO` t2a
INNER JOIN (SELECT MIN(id) minid,fk_id FROM `TABLE-TWO` GROUP BY fk_id) t2b ON t2a.fk_id = t2b. fk_id
INNER JOIN `TABLE-ONE` t1 ON t1.id = t2a.fk_id
ORDER By id
CREATE TABLE `TABLE-TWO` ( `id` INTEGER, `fk_id` INTEGER, `value` VARCHAR(2) ); INSERT INTO `TABLE-TWO` (`id`, `fk_id`, `value`) VALUES ('1', '1', 'v1'), ('2', '1', 'v2'), ('3', '1', 'v3'), ('4', '2', 'v1'), ('5', '2', 'v2'), ('6', '3', 'v1'), ('7', '3', 'v2'), ('8', '3', 'v3'), ('9', '3', 'v4'), ('10', '3', 'v5'); CREATE TABLE `TABLE-ONE` ( `id` INTEGER, `key` VARCHAR(2) ); INSERT INTO `TABLE-ONE` (`id`, `key`) VALUES ('1', 'k1'), ('2', 'k2'), ('3', 'k3');
✓ ✓ ✓ ✓
SELECT t2a.id ,t1.key ,t2a.value,IF(t2b.minid = t2a.id,1,0) is_first_entry FROM `TABLE-TWO` t2a INNER JOIN (SELECT MIN(id) minid,fk_id FROM `TABLE-TWO` GROUP BY fk_id) t2b ON t2a.fk_id = t2b. fk_id INNER JOIN `TABLE-ONE` t1 ON t1.id = t2a.fk_id ORDER By id
id | key | value | is_first_entry -: | :-- | :---- | -------------: 1 | k1 | v1 | 1 2 | k1 | v2 | 0 3 | k1 | v3 | 0 4 | k2 | v1 | 1 5 | k2 | v2 | 0 6 | k3 | v1 | 1 7 | k3 | v2 | 0 8 | k3 | v3 | 0 9 | k3 | v4 | 0 10 | k3 | v5 | 0
db<>fiddle here