Question :
I have just read about N+1 queries problem. So I have the doubt about the performance of 2 ways.
The way 1: I have 1000 queries like that:
select * from topic where user_id = 1
select * from topic where user_id = 2
…
select * from topic where user_id = 1000
I supposed the topic table has 1.000.000 rows and each query uses binary search, so I think the Big O’s way is:
1000 * log2(1.000.000)
The way 2: I have 1 query:
select * from topic where user_id in (1,2,…, 1000)
I don’t know the algorithm of 2nd way. So can you explain me the algorithm of (select .. in) query. Is there any case the 1st way is better than 2nd way?
Answer :
to get the same result, you need to UNION
the result and you can’t union 10000 selects. the limit is 63 table references per query
Further IN
clause has also a limit of elements you can use.
The number of values in the IN() list is only limited by the max_allowed_packet value.
see manual
As you have many ids, you should try
select t1.*
from topic t1
INNER JOIN
(SELECT user_id FROM topic WHERE user_id <= 1000) t2 ON t1.user_id = t2.user_id
On big numbers for the IN
you will get faster results.
I have just read about N+1 queries problem.
This basically boils down to this paradigm:
To do a Thing takes a finite amount of time.
To do the same Thing a thousand times takes at least a thousand times as long.
Remember what’s happening each and every time you run a query:
- Your client [application] sends some SQL text and arguments to the database server,
- The database works out how best to “get at” the data you want.
This might involve parsing the query to work out how to do that or, more likely, re-using a cached execution plan, - The database goes and gets the data.
Any indexes that it needs will probably be in memory anyway, but it may have to go and pull some data pages from disk. - The database assembles the data, parcelling it up, ready to go back across the network,
- The database sends the data back to the client,
- The client does what it does with that data.
(Phew!)
It will take far, far longer to do all that a thousand times over, one row at a time, than it will to do the job just once, with a single query that pulls back a thousand rows of data, all in one go.
The other, big problem here is that there is absolutely nothing that the database (or, more importantly, the poor DBA on the receiving end of the complaints) can do to tune these queries or otherwise improve things.
In such cases, the database is often as happy as a lark, being asked to find a record based on a single value in a uniquely-indexed field, over and over again, in rapid succession. In fact, Life doesn’t get much better for a relational database, but the Users will be complaining like mad that the Application is “running” like a slug.
Applications operating in this [appalling] fashion simply have to be recoded.
Linear:
with a(id) as
(values(1),(2),(3))
select t.* from topic t join a on t.id=a.id