What is the best practice for selecting from two tables with multiple relations?

Posted on

Question :

I have two tables with a structure like below:
enter image description here
and am selecting the data with below query:

SELECT
(SELECT Name FROM Management_EnumerableItem ME WHERE ME.ID = DN) DN,
(SELECT Name FROM Management_EnumerableItem ME WHERE ME.ID = PN) PN,
(SELECT Name FROM Management_EnumerableItem ME WHERE ME.ID = Winder) Winder,
(SELECT Name FROM Management_EnumerableItem ME WHERE ME.ID = CouplingType) CouplingType,
(SELECT Name FROM Management_EnumerableItem ME WHERE ME.ID = Type) Type,
(SELECT Name FROM Management_EnumerableItem ME WHERE ME.ID = ILayer) ILayer,
(SELECT Name FROM Management_EnumerableItem ME WHERE ME.ID = OLayer) OLayer
FROM Product_Coupling_Serial

So,what is the best practice for the best performance of selects like that (fastest way)?

Answer :

Create a covering index on Id column and Include Name on the table Management_EnumerableItem.

CREATE INDEX IDX_Name on Management_EnumerableItem (Id) Include (Name) with (fillfactor=90);

and write it like this.

SELECT A.NAME,
B.NAME,
C.NAME,
D.NAME,
E.NAME,
F.NAME,
G.NAME,

FROM Product_Coupling_Serial P
LEFT OUTER JOIN Management_EnumerableItem A
ON A.ID = P.DN
LEFT OUTER JOIN Management_EnumerableItem B
ON B.ID = P.PN
LEFT OUTER JOIN Management_EnumerableItem C
ON C.ID = P.Winder
LEFT OUTER JOIN Management_EnumerableItem D
ON D.ID = P.CouplingType
LEFT OUTER JOIN Management_EnumerableItem E
ON E.ID = P.[Type]
LEFT OUTER JOIN Management_EnumerableItem F
ON F.ID = P.[ILayer]
LEFT OUTER JOIN Management_EnumerableItem G
ON G.ID = P.[OLayer]

Thanks

Please consider trying out this query, and let me know how it performs.

WITH EnumerableItemNames AS (
   SELECT
      PcsID = pcs.ID,
      v.ColName,
      ItemName = me.Name
   FROM
      dbo.Product_Coupling_Serial pcs
      CROSS APPLY (VALUES
         ('DN', pcs.DN),
         ('PN', pcs.PN),
         ('Winder', pcs.Winder),
         ('CouplingType', pcs.CouplingType),
         ('Type', pcs.[Type]),
         ('ILayer', pcs.ILayer),
         ('OLayer', pcs.OLayer)
      ) v (ColName, EnumerableItemID)
      LEFT JOIN dbo.Management_EnumerableItem me
         ON v.EnumerableItemID = me.ID
)
SELECT
   p.DN, p.PN, p.Winder, p.CouplingType, p.[Type], p.ILayer, p.OLayer
FROM
   EnumerableItemNames ein
   PIVOT (
      Max(ein.ItemName)
      FOR ein.ColName IN (DN, PN, Winder, CouplingType, [Type], ILayer, OLayer)
   ) p
;

While it could theoretically be a total failure in performance, I also think it could end up performing better. If it isn’t better, you still might be able to get it there by inserting the results of the CTE (without the LEFT JOIN to Management_EnumerableItem) into a temp table with some carefully-chosen indexes, then doing a second query using the temp table. It does depend a bit on how exactly you’re querying the table, how many rows it has, how wide they are, how many result rows there are, and so on.

If you need more columns from the Product_Coupling_Serial table, then you could try including those in the CTE, but that may affect the performance badly by bloating the memory required for the pivot operation, so you could also just join back to it at the end:

// CTE here...
SELECT
   pcs.Year,
   pcs.ProductCode,
   pcs.QRText,
   pcs.UniqueSerial,
   ...
   p.DN, p.PN, p.Winder, p.CouplingType, p.[Type], p.ILayer, p.OLayer
FROM
   EnumerableIDs ei
   PIVOT (
      Max(ei.ItemName)
      FOR ei.ColName IN (DN, PN, Winder, CouplingType, [Type], ILayer, OLayer)
   ) p
   INNER JOIN dbo.Product_Coupling_Serial pcs
      ON ei.PcsID = pcs.ID
;

Although you can rewrite the query with a string of left joins, the performance will be the same as SQL Server will optimise the subqueries in your code into joins. (This may depend on the SQL Server version, and also on what other complexity you’re adding into the final query. You can confirm using the execution plan.)

So the way you’ve done it is probably the best as it’s the most easily understood.

You’ve already defined the primary key on Management_EnumerableItem as ID, which will by default created the clustered index on that column. Since the query you’ve written can use the clustered index on that table, it has no cause to use a non-clustered index.

There could, however, be performance benefits to creating clustered indexes on each of the joined columns in the Product_Coupling_Serial table. So:

CREATE INDEX IX_DN ON Product_Coupling_Serial (DN);
CREATE INDEX IX_PN ON Product_Coupling_Serial (PN);
CREATE INDEX IX_Winder ON Product_Coupling_Serial (Winder);
CREATE INDEX IX_CouplingType ON Product_Coupling_Serial (CouplingType);
CREATE INDEX IX_Type ON Product_Coupling_Serial ([Type]);
CREATE INDEX IX_ILayer ON Product_Coupling_Serial (ILayer);
CREATE INDEX IX_OLayer ON Product_Coupling_Serial (OLayer);

However, it would be best just to run your original query through SSMS to view the actual execution plan, which will return any missing indexes, to verify that those indexes will be used. But if you’re feeling lucky you can just run the SQL I included above.

Next you’ll want to rewrite your query in minimal form. SQL Server is a set based interpreted language, and the query optimizer is written in such a way that the minimal solution is either best, or gets automatically interpreted to the best solution in most cases. In your case the minimal form is:

SELECT
E1.NAME as DN,
E2.NAME as PN,
E3.NAME as Winder,
E4.NAME as CouplingType,
E5.NAME as [Type],
E6.NAME as ILayer,
E7.NAME as OLayer
FROM Product_Coupling_Serial P
LEFT JOIN Management_EnumerableItem E1 ON E1.ID = P.DN
LEFT JOIN Management_EnumerableItem E2 ON E2.ID = P.PN
LEFT JOIN Management_EnumerableItem E3 ON E3.ID = P.Winder
LEFT JOIN Management_EnumerableItem E4 ON E4.ID = P.CouplingType
LEFT JOIN Management_EnumerableItem E5 ON E5.ID = P.[Type]
LEFT JOIN Management_EnumerableItem E6 ON E6.ID = P.ILayer
LEFT JOIN Management_EnumerableItem E7 ON E7.ID = P.OLayer;

This should evaluate to the best possible execution plan with minimal code. If it does not you’ll want to explore CTEs (Common Table Expressions) next, but it is unlikely you’ll find anything faster than what I wrote. If you can upgrade your server to 2012 another possibility is the use of a Columnstore index, but since you listed 2088 R2 (which I assume is a typo for 2008 R2) you’ll not be able to use that feature until you upgrade to 2012 or above.

You should create separate indexes on Product_Coupling_Serial for each column you use on your joins (DN, PN, Winder).Then both the subquery and the left joins will be faster.

You should consider the index size of the table and insert speed.

I don’t think SQL server will use a covering index in this scenario because each column is queried separately.

Leave a Reply

Your email address will not be published.