Question :
Coming from a Rails background, I have a large production database that I have a copy of where I only need records from 4 of the 2000+ companies. I’m trying to delete all the rows except the ones belonging to the 4, and I know the way I have it is not optimal.
DELETE FROM appointments
WHERE (appointments.company_id NOT IN (6, 753, 785, 1611))
another example is when I have to delete records on a table where the company_id is on an associated table:
DELETE FROM mappings
WHERE mappings.id IN (SELECT mappings.id
FROM code_mappings
INNER JOIN codes ON codes.remote_id = mappings.code_remote_id
WHERE (codes.company_id NOT IN (6, 753, 785, 1611)))
Answer :
In relation to the first table, appointments
, make sure that you have an index on company_id
column.
In relation to the mappings
table, using EXISTS
rather than IN
may yield better performance. You can re-write your query as following:
DELETE FROM mappings AS m
WHERE EXISTS ( SELECT 1
FROM code_mappings AS cm
INNER JOIN codes AS c
ON c.remote_id = cm.code_remote_id
WHERE
(
c.company_id NOT IN (6, 753, 785, 1611)
AND cm.id = m.id
)
)
In the above query, you will also benefit from indexes on the mappings
and code
tables.
Documentation for creating indexes is @ https://www.postgresql.org/docs/current/static/sql-createindex.html. In your case, you can create indexes on the relevant tables as following:
CREATE INDEX company_id_idx ON appointments (company_id);
CREATE INDEX remote_id_company_id_idx ON codes (remote_id, company_id);
CREATE INDEX code_remote_id_id_idx ON code_mappings (code_remote_id, id);
-- If you don't already have a primary key OR index on `id` column in the `mappings` table, then create one:
ALTER TABLE mappings ADD PRIMARY KEY (id);
-- Choose primary key, or index: CREATE INDEX id_idx ON mappings (id);
I think you’re fine, honestly. You could always dig in and analyze the query plans for various options suggested here.
The performance benefit of the EXISTS predicate (or NOT EXISTS), for instance, comes from letting the query optimizer choose how to use indexes and it works best when the subquery is simplest:
EXISTS (select * from blah where outertable.key = key)
In this instance, you are dealing with very low selectivity. You’re essentially saying delete everything! Well, except these four, leave them alone.
The query optimizer isn’t going to choose to utilize your indexes, because there isn’t really any point. It’s going to do a table scan. If it did choose an index (it won’t), it would actually perform worse (probably).