Select rows from two joined table that not in a third table

Posted on

Question :

I have 3 table A,B,C,D with the same columns and I'm trying to retrieve the 
user_id that are in table a and b but not c.

For instance:
Table A                       
user_id     closed acct
10102345     Yes
12456786     Yes

Table B             
user_id  closed acct  
10102345     Yes
12456786     Yes  

Table C
user_id  closed acct
45345696     No
45698788     Yes
12456786     Yes

Here is what the code that I try but it taking so long to execute:
SELECT user_id FROM TableA WHERE user_id NOT IN (SELECT DISTINCT user_id
FROM TableB) OR user_id NOT IN (SELECT DISTINCT ID FROM Table C)

Answer :

SELECT A.*, B.*
FROM A 
JOIN B ON A.user_id = B.user_id
WHERE NOT EXISTS ( SELECT NULL
                   FROM C
                   WHERE C.user_id = B.user_id )

or

SELECT A.*, B.*
FROM A 
JOIN B ON A.user_id = B.user_id
LEFT JOIN C ON C.user_id = B.user_id
WHERE C.user_id IS NULL

Depending on the rows number in tables TableA and TableB versus the rows number in table TableC, you can use the EXCEPT clause with SQL Server / TSQL :

Version 1 corresponding to your first remark :

I’m trying to retrieve the user_id that are in table a and b but not
c.

WITH USERS (user_id) AS (SELECT user_id FROM TableA
                          UNION ALL
                         SELECT user_id FROM TableB)
SELECT DISTINCT user_id FROM USERS
EXCEPT 
SELECT user_id FROM TableC;

EDIT: Thanks to @ypercubeTM in the comments, the corresponding simplified SQL query :

( SELECT user_id FROM TableA 
  UNION 
  SELECT user_id FROM TableB
)
EXCEPT
  SELECT user_id FROM TableC ;

See 1 for the DISTINCT clause.

Version 2 corresponding to SQL Statement :

SELECT user_id FROM TableA WHERE user_id NOT IN (SELECT DISTINCT
user_id FROM TableB) OR user_id NOT IN (SELECT DISTINCT ID FROM Table
C)

(Which hasn’t the same meaning as your first remark, this statement asks for user_ids from TableA that don’t exist in TableB nor TableC)

WITH USERS (user_id) AS (SELECT user_id FROM TableB
                          UNION ALL
                         SELECT user_id FROM TableC)
SELECT user_id FROM TableA
EXCEPT 
SELECT DISTINCT user_id FROM USERS;

EDIT: and again the corresponding simplified SQL query :

SELECT user_id FROM TableA
EXCEPT
( SELECT user_id FROM TableC
  UNION
  SELECT user_id FROM TableB
) ;

See 1 for the DISTINCT clause.

If you can expect very few rows in :

  • (version 1) TableC proportionally to TableA + TableB
  • (version 2) TableB + TableC proportionally to TableA

This way will be very efficient on the performance side and not too costly on resources side. I recommend the same approach on other RDBMS (EXCEPT or MINUS clause) in theses cases, far better when applicable than NOT IN / NOT EXISTS / LEFT JOIN .. WHERE NULL.

In your case, as user_id is clearly the PK of theses 3 tables, i choose UNION ALL versus UNION in the WITH clause for performance. As we can’t guess if user_id could be in TableA and also in TableB (version 1) / TableB and also in TableC (version 2), we have to DISTINCT1 them on the SELECT FROM USERS. So there is no gain to deduplicate them in the WITH clause.

EDIT: 1 As states @ypercubeTM in the comments, the DISTINCT has no use here in version 2, and would only have an effect on version 1 if there was some identical user_ids in TableA and TableB joined with an UNION ALL (no use of DISTINCT using UNION). The DISTINCT clause could be removed from the queries, i let them be here as a way to provide awareness on the data / DML on top of the schema / DDL.

Leave a Reply

Your email address will not be published. Required fields are marked *