Is a join faster with 5 tables joined together Or 5 separate queries? [duplicate]

Posted on

Question :

I have confusion that I already have in my application 5 tables say a,b,c,d,e and have relationship with each other like a.b_id is primary key of b and chain goes on.

Now my question is that which approach will give me better performance/faster result?
5 queries
Or
A query join with 5 tables?

Thank you.

Answer :

Look at it this way… There is a lot of overhead in each SELECT:

  • Round trip between client and server
  • Allocate a thread to work on the query
  • Parse the SQL
  • Optimize it
  • Perform it (The meat!)
  • Send the results back

In some experiments, I have seen the “overhead” be 90% of the work. 5 queries takes nearly 5 times the overhead.

It depends!
on:

  • what is your bottleneck
    • database speed / computation power
    • transfer speed / bandwitdth
    • application computation power
    • application main memory
  • which indices are present
    • are the joins supported by them
    • are the selects supported by them
  • how is the data structured and what data does the application need of that data
    • do you need all the data anyway or does the join come with conditions that prunes away 90 percent of the data in all tables
  • can you do the selects in parallel or do they need to run sequentially
  • can you re-use the data from the selects to replace multiple join queries

In most cases, one select will be faster. 5 selects mean 5 times the overhead, as @RickJames said.

If your database is not on the same server your query is running on, this also means 5 times the network latency, which can add a lot to your overall time. (I’ve seen this a few times recently, when employees had an application that talks to the database directly; the application was OK when running at the office, with <= 1 ms network delay to the database server; when they ran the same application over VPN in their home office, with ~50 ms delay, it became almost impossible to work).

There are, however, some scenarios where splitting queries might make sense:

  • You’re doing a full join over all the tables. Something like select * from a, b, c, d, e without any conditions. Assuming each of the tables has only 10 rows, this will return 100000 result rows. Querying each of them separately needs just 50 rows, which will save a lot of memory in the database server, and a lot of traffic bandwidth. But this example is so pathological, if you’re doing that then you’re doing it wrong anyway.

  • Query optimizer getting it wrong. Imagine something like select * from a, b, c where a.x=b.x and b.y=c.y and a.z=c.z. This will give the optimizer 9 different options which table to start with and which table to use next. Sometimes, query optimizers get this wrong, requiring (almost) full table scans on some of the tables, and thus taking forever. In a case like this, it may be that select * from a followed by select * from b where <condition from a> followed by select * from c where <condition from a,b> is faster.
    However, using optimizer hints will help the database engine get the join order right, which will, again, help more than splitting the query.

So, if you can, let the database engine do the join. If it doesn’t get it right (explain select ... is your friend here), make sure you have your indexes right. If you do, use optimizer hints. And only if all that fails, give splitting the queries a try.

Leave a Reply

Your email address will not be published. Required fields are marked *