Question :
I have 3 table A,B,C,D with the same columns and I'm trying to retrieve the
user_id that are in table a and b but not c.
For instance:
Table A
user_id closed acct
10102345 Yes
12456786 Yes
Table B
user_id closed acct
10102345 Yes
12456786 Yes
Table C
user_id closed acct
45345696 No
45698788 Yes
12456786 Yes
Here is what the code that I try but it taking so long to execute:
SELECT user_id FROM TableA WHERE user_id NOT IN (SELECT DISTINCT user_id
FROM TableB) OR user_id NOT IN (SELECT DISTINCT ID FROM Table C)
Answer :
SELECT A.*, B.*
FROM A
JOIN B ON A.user_id = B.user_id
WHERE NOT EXISTS ( SELECT NULL
FROM C
WHERE C.user_id = B.user_id )
or
SELECT A.*, B.*
FROM A
JOIN B ON A.user_id = B.user_id
LEFT JOIN C ON C.user_id = B.user_id
WHERE C.user_id IS NULL
Depending on the rows number in tables TableA
and TableB
versus the rows number in table TableC
, you can use the EXCEPT clause with SQL Server / TSQL :
Version 1 corresponding to your first remark :
I’m trying to retrieve the user_id that are in table a and b but not
c.
WITH USERS (user_id) AS (SELECT user_id FROM TableA
UNION ALL
SELECT user_id FROM TableB)
SELECT DISTINCT user_id FROM USERS
EXCEPT
SELECT user_id FROM TableC;
EDIT: Thanks to @ypercubeTM in the comments, the corresponding simplified SQL query :
( SELECT user_id FROM TableA
UNION
SELECT user_id FROM TableB
)
EXCEPT
SELECT user_id FROM TableC ;
See 1 for the DISTINCT clause.
Version 2 corresponding to SQL Statement :
SELECT user_id FROM TableA WHERE user_id NOT IN (SELECT DISTINCT
user_id FROM TableB) OR user_id NOT IN (SELECT DISTINCT ID FROM Table
C)
(Which hasn’t the same meaning as your first remark, this statement asks for user_ids from TableA that don’t exist in TableB nor TableC)
WITH USERS (user_id) AS (SELECT user_id FROM TableB
UNION ALL
SELECT user_id FROM TableC)
SELECT user_id FROM TableA
EXCEPT
SELECT DISTINCT user_id FROM USERS;
EDIT: and again the corresponding simplified SQL query :
SELECT user_id FROM TableA
EXCEPT
( SELECT user_id FROM TableC
UNION
SELECT user_id FROM TableB
) ;
See 1 for the DISTINCT clause.
If you can expect very few rows in :
- (version 1)
TableC
proportionally toTableA
+TableB
- (version 2)
TableB
+TableC
proportionally toTableA
This way will be very efficient on the performance side and not too costly on resources side. I recommend the same approach on other RDBMS (EXCEPT or MINUS clause) in theses cases, far better when applicable than NOT IN / NOT EXISTS / LEFT JOIN .. WHERE NULL.
In your case, as user_id is clearly the PK of theses 3 tables, i choose UNION ALL versus UNION in the WITH clause for performance. As we can’t guess if user_id could be in TableA
and also in TableB
(version 1) / TableB
and also in TableC
(version 2), we have to DISTINCT1 them on the SELECT FROM USERS. So there is no gain to deduplicate them in the WITH clause.
EDIT: 1 As states @ypercubeTM in the comments, the DISTINCT has no use here in version 2, and would only have an effect on version 1 if there was some identical user_ids in TableA
and TableB
joined with an UNION ALL (no use of DISTINCT using UNION). The DISTINCT clause could be removed from the queries, i let them be here as a way to provide awareness on the data / DML on top of the schema / DDL.