joining tables without duplicate rows / NULL values [closed]

Posted on

Question :

so I have multiple tables which I have to merge when query but both join and union doesn’t seems to achieve what my object is

Union:

mysql>
SELECT  NAME, currentstatus, NULL AS username, NULL AS status
    FROM  statustable
    WHERE  Name like '%ahsan%'
UNION  ALL 
SELECT  NULL AS NAME, NULL AS currentstatus, username, status
    from  awsaccount
    WHERE  username like '%ahsan%';
+--------------+-------------------+----------+--------------------------------+
| NAME         | currentstatus     | status   | username                         |
+--------------+-------------------+----------+--------------------------------+
| Ahsan Naseem | activated         | NULL     | NULL                           |
| NULL         | NULL              | ACTIVE   | ahsan.naseem       |
| NULL         | NULL              | INACTIVE | ahsantest                      |
| NULL         | NULL              | ACTIVE   | ahsantest2                     |
| NULL         | NULL              | ACTIVE   | ahsantestuserprofileupdate     |
| NULL         | NULL              | INACTIVE | ahsantestuserprofileupdate_cli |
| NULL         | NULL              | INACTIVE | decent.ahsan         |
| NULL         | NULL              | INACTIVE | decent.ahsan_cli               |
+--------------+-------------------+----------+--------------------------------+
8 rows in set (0.00 sec)

Join:

mysql>
select  *
    from ( SELECT * from  statustable where     Name like '%ahsan%' ) as ab
    JOIN ( SELECT * from  awsaccount  where username like '%ahsan%' ) as asb;

    +--------------+-------------------+--------------------------------+----------+
    | Name         | currentstatus | username                       | status   |
    +--------------+-------------------+--------------------------------+----------+
    | Ahsan Naseem | activated         | ahsan.naseem                    | ACTIVE   |
    | Ahsan Naseem | activated         | ahsantest                      | INACTIVE |
    | Ahsan Naseem | activated         | ahsantest2                     | ACTIVE   |
    | Ahsan Naseem | activated         | ahsantestuserprofileupdate     | ACTIVE   |
    | Ahsan Naseem | activated         | ahsantestuserprofileupdate_cli | INACTIVE |
    | Ahsan Naseem | activated         | decent.ahsan         | INACTIVE |
    | Ahsan Naseem | activated         | decent.ahsan_cli               | INACTIVE |
    +--------------+-------------------+--------------------------------+----------+
    7 rows in set (0.00 sec)

What I am trying to achieve is :

+--------------+-------------------+----------+--------------------------------+
| NAME         | currentstatus    | status   | username                       |
+--------------+-------------------+----------+--------------------------------+
| Ahsan Naseem | Active            | ACTIVE   | ahsan.naseem        |
| NULL         | NULL              | INACTIVE | ahsantest                      |
| NULL         | NULL              | ACTIVE   | ahsantest2                     |
| NULL         | NULL              | ACTIVE   | ahsantestuserprofileupdate     |
| NULL         | NULL              | INACTIVE | ahsantestuserprofileupdate_cli |
| NULL         | NULL              | INACTIVE | decent.ahsan        |
| NULL         | NULL              | INACTIVE | decent.ahsan_cli               |
+--------------+-------------------+----------+--------------------------------+

I am new to sql and will be grateful for your advice and suggestion

Answer :

First, I should note that this is basically a formatting requirement: you don’t want to repeat the values in the first two columns if they’re the same as they were in the previous column. If you have more CPU power available at the application end than on the SQL Server, then you will be likely to get better performance if you can allow the application to handle the formatting requirements.

That said, sometimes the SQL Server has more than enough capacity to handle the formatting, or doing it at the application end doesn’t make sense for some reason.

The following query should do what you want:

SET @PrevName := '';
SET @PrevStatus := '';
SET @RowNum := 0;

SELECT Name
      ,currentstatus
      ,`status`
      ,username
  FROM (
        SELECT IF(@PrevName = ab.Name, '', ab.Name) as Name
              ,IF(@PrevName = ab.Name AND @PrevStatus = ab.currentstatus, '', ab.currentstatus) as currentstatus
              ,asb.`status`
              ,asb.username
              ,@PrevName := ab.Name as ActualName
              ,@PrevStatus := ab.currentstatus as Act_curstatus
              ,@RowNum := @RowNum + 1 as RowOrder
          FROM statustable ab
                 CROSS JOIN awsaccount asb
         WHERE (    ab.Name LIKE '%ahsan%'
                AND asb.username LIKE '%ahsan%'
               )
         ORDER BY ActualName, Act_curstatus
       ) sq
 ORDER BY RowOrder
;

I tested this via db-fiddle.com; you can play with the query and test data here.

Quick explanation:

We use the variables to hold the value of Name and currentstatus from the previous row. If Name is different, we display both values; if Name is the same but currentstatus has changed, we leave Name blank but show the new status.

Establishing an order is critically important if we have multiple names or current statuses; if we don’t sort these, then rows with different names could be interleaved, meaning the same name would be repeated several times, as seen here:

...
Ahsan Naseem        | activated   | ACTIVE   | ahsan.naseem 
                    |             | INACTIVE | ahsantest
Bruce Banner        | activated   | ACTIVE   | mr.green
Ahsan Naseem        | activated   | ACTIVE   | ahsantest2
                    |             | ACTIVE   | ahsantestuserprofileupdate
...

Further, we must preserve the order in which the rows are processed, or else the line with the name might not be the first one, as here:

...
                    |             | INACTIVE | ahsantest
                    |             | ACTIVE   | ahsantest2
Ahsan Naseem        | activated   | ACTIVE   | ahsan.naseem 
                    |             | ACTIVE   | ahsantestuserprofileupdate
...

So, we use @RowNum to sequentially number the rows as we process them, and ultimately sort on that number. (I wasn’t sure this was needed; however, without it, I got the results back inverted, so I obviously did need it!)

We put our query in a subquery, both so we can force the processed order onto the results, and so we can leave out the Actual_Name, Act_currentstatus, and RowOrder columns.

If you really wanted NULL instead of empty string for the repeated values, just replace the empty string in the IF functions with NULL. As I’m assuming this is for display purposes, empty string is probably a better alternative, but you may have some other reason for doing this.

So if i understand this right you want to get all rows from both tables and where its possible you want to join them.

You can simulate a full outer join in MySQL by doing two joins with a UNION between them. You just have to switch use LEFT on one side and RIGHT on the other.

SELECT * FROM statustable t1
LEFT JOIN awsaccount t2 ON t1.statustable_id = t2.awsaccount_id
UNION
SELECT * FROM statustable t1
RIGHT JOIN awsaccount t2 ON t1.statustable_id = t2.awsaccount_id;

I reused @RDFozz DB Fiddle to simulate this: DB Fiddle

Leave a Reply

Your email address will not be published. Required fields are marked *