Fast alternative for “NOT IN”

Posted on

Question :

I have a table A that got a field called id that is the primary key of this table. I also have a table called B which also have a field called id as primary key.

Now I want to get all rows from Table A where the id value is not present as value in any table B id field-value.

My first query looked like this:

SELECT a.id FROM a WHERE a.id NOT IN (SELECT DISTINCT b.id FROM b)

Then I build a query that looked like this to improve the speed:

SELECT a.id FROM a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL

Now I got 600k rows in table A and 400k rows in table B and things are getting very slow. Is there any better query to run for this kind of operation or is there perhaps a better way to solve the problem at all? Any hints or pointers?

Answer :

What you must do is increase the join_buffer_size. What is join_buffer_size ?

The minimum size of the buffer that is used for plain index scans, range index scans, and joins that do not use indexes and thus perform full table scans. Normally, the best way to get fast joins is to add indexes. Increase the value of join_buffer_size to get a faster full join when adding indexes is not possible. One join buffer is allocated for each full join between two tables. For a complex join between several tables for which indexes are not used, multiple join buffers might be necessary.

I highlighted that sentence because your query is as simple as it is going to get. You are doing a join on an integer which I assume is already indexed in both tables. If id in table b us not indexed, please index it ASAP.

I would suggest raising join_buffer_size to 4M.

You can change join_buffer_size in your session and rerun the query

SET join_buffer_size = 1024 * 1024 * 4;
SELECT a.id FROM a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL;

GIVE IT A TRY !!!

Leave a Reply

Your email address will not be published.