Multiple roundtrips vs cross joins

Posted on

Question :

I have a feeling the answer is “it depends on…”, but I’m wondering if there’s a concrete answer.

In Entity Framework, you build a query using C# code, and the framework converts to SQL and sends to the server to pull down the data. Suppose I want to retrieve a record from three tables. I’ve got at least three choices:

  1. Use straight SQL (manual ADO.NET), send three SELECT statements in the same command, and map the results one resultset at a time with a DataReader. This is clearly the best approach from the database perspective, but it’s the most work since I wouldn’t be able to use the Entity Framework approach.

  2. Send three separate commands from Entity Framework – this requires three roundtrips to the database server:

Person person1 = context.Persons.First(p => p.PersonID == 1);
Car car1 = context.Cars.First(c => c.CarID == 1);
House house1 = context.Houses.First(h => h.HouseID == 1);
// translates to the following SQL, one roundtrip at a time:
SELECT TOP(1) [p].[PersonID], [p].[PersonName] FROM [Person] AS [p] WHERE [p].[PersonID] = 1
SELECT TOP(1) [c].[CarID], [c].[CarName] FROM [Car] AS [c] WHERE [c].[CarID] = 1
SELECT TOP(1) [h].[HouseID], [h].[HouseName] FROM [House] AS [h] WHERE [h].[HouseID] = 1
  1. Make one big ugly statement which still uses Entity Framework with a single command and a single statement:
var query = (from p in context.Persons.Where(p => p.PersonID == 1)
             from c in context.Cars.Where(c => c.CarID == 1)
             from h in context.Houses.Where(h => h.HouseID == 1)
             select new { p, c, h }).First();
Person person2 = query.p;
Car car2 = query.c;
House house2 = query.h;
// which makes the following SQL:
SELECT TOP(1) [p].[PersonID], [p].[PersonName], [t].[CarID], [t].[CarName], [t0].[HouseID], [t0].[HouseName]
FROM [Person] AS [p]
    SELECT [c].[CarID], [c].[CarName] FROM [Car] AS [c] WHERE [c].[CarID] = 1
) AS [t]
    SELECT [h].[HouseID], [h].[HouseName] FROM [House] AS [h] WHERE [h].[HouseID] = 1
) AS [t0]
WHERE [p].[PersonID] = 1             

The cross joins end up just flattening out the result since there’s only one record returned per table, and the right indexes fire, so I would think the database would be happy with it – fewer roundtrips but the same amount of data returned.

The code in #3 is uglier, but that’s not what I’m asking – I’m just asking from a performance perspective. Do the cross joins in the third example cause any potential issues? Maybe locking problems or extra CPU or memory that wouldn’t otherwise be needed by using individual commands?

Answer :

It’s worth mentioning up front that those two queries might not return the same results. Three separate queries by ID will always return your three rows. In the CROSS JOIN approach, if PersonID = 1 doesn’t exist, the information about the other two entities won’t be returned. So your application has to be certain that all of these IDs exist, or you might get unexpected results.

As long as your tables each have a unique clustered index (or primary key) on the ID column (or a covering nonclustered index – the point is a seek to the correct row is possible), the performance difference between the two approaches, from a SQL Server perspective, is negligible.

Both approaches will result in 3 seeks against the clustered index. Neither should go parallel, or require sorting, and the cross join should use nested loops – so no memory grants would be needed in either case.

The three batches approach is pretty obvious:

Screenshot of SSMS execution plan showing 3 seeks in 3 batches

The CROSS JOIN approach looks like this:

Screenshot of SSMS execution plan show 3 seeks in two cross joins

The way this plan operates, each seek operator executes, at most, once:

  • The seek on PK_House returns one row to the Nested Loops operator
  • This triggers execution of the seek on PK_Person, which returns one row to the same Nested Loops operator
  • This “flattened” row is passed to the next Nested Loops operator…
  • …which triggers execution of the seek on PK_Car, returning one row to that second Nested Loops operator
  • This single flattened row is returned to the top operator, which satisfies its request for one row, so no more rows are requested

If you want to not write cross joins all over your app (potentially confusing to other developers), you can avoid the overhead of multiple roundtrips by using one of the Entity Framework extensions that provides “future” queries. These let you queue up multiple queries, and they will all be sent and resolved in one batch when the result of any of them is needed.

One such extension is Entity Framework Plus – query-future.

Leave a Reply

Your email address will not be published. Required fields are marked *