Question :
I have a table named contacts
(primary key id
) which is referenced by a number of other tables via foreign keys. When a list of contacts is displayed in the application, I get their primary keys in an array and need to determine if these records could be deleted. This is just used for display, no actual deleting is taking place here.
Based on this answer, I can find out if one record can be deleted:
(SELECT 1 FROM case_contacts WHERE contact_id = 5000 LIMIT 1)
UNION ALL
(SELECT 1 FROM case_payments WHERE contact_id = 5000 LIMIT 1)
UNION ALL
(SELECT 1 FROM invoices WHERE contact_id = 5000 LIMIT 1)
-- etc
LIMIT 1;
I could run this query for every contact ID I’m given, but that seems very inefficient.
Is there a way to send a single query for multiple contact IDs (up to ~500 at once)? Any kind of result would be fine, including only the IDs that can be deleted, or only the IDs that cannot be deleted, or a result set of (contact_id, can_be_deleted)
.
EDIT: This is what I’m using until I find a more efficient solution:
SELECT c.id
FROM contacts c
WHERE c.id IN (1,10,20,1557,5000,15057)
AND NOT EXISTS (SELECT 1 FROM case_contacts WHERE contact_id = c.id)
AND NOT EXISTS (SELECT 1 FROM case_payments WHERE contact_id = c.id)
AND NOT EXISTS (SELECT 1 FROM invoices WHERE contact_id = c.id);
Answer :
You may need to alter this slightly for postgres (is is MS SQL syntax) but something like:
SELECT id
FROM contacts
LEFT OUTER JOIN FROM case_contacts ON case_contacts.contact_id = contacts.id
LEFT OUTER JOIN FROM case_payments ON case_payments.contact_id = contacts.id
LEFT OUTER JOIN FROM invoices ON invoices.contact_id = contacts.id
WHERE case_contacts.contact_id IS NULL
AND case_payments.contact_id IS NULL
AND invoices.contact_id IS NULL
or:
SELECT id
FROM contacts
WHERE id NOT IN (SELECT contact_id FROM case_contacts)
AND id NOT IN (SELECT contact_id FROM case_payments)
AND id NOT IN (SELECT contact_id FROM invoices)
should list the IDs that are not referred to in those three tables. The query planner *should * see these as equivalent and optimise accordingly, but if not then the former will be more efficient (the latter risks running the three sub-queries once for every row in contacts
).
For the simple result (just the id
), EXCEPT
might be simplest and fastest:
Also, since you write:
I get their primary keys in an array
Simply use unnest()
and don’t waste time joining to the contacts
table:
SELECT id
FROM unnest('{1,10,20,1557,5000,15057}'::int[]) id -- actual array (not list)
EXCEPT ALL SELECT contact_id FROM case_contacts
EXCEPT ALL SELECT contact_id FROM case_payments
EXCEPT ALL SELECT contact_id FROM invoices;
Unless you need to verify existence, then you need the join:
SELECT id
FROM unnest(arr) id
JOIN contacts c USING (id)
EXCEPT ALL ...
Overview over available techniques:
I would not suggest to use NOT IN (subselect)
, which is typically the slowest alternative and carries traps for NULL values. Your NOT EXISTS
variant looks perfectly fine.