Question :
I have a query of the following form:
SELECT ...
FROM ColumnstoreTable cs
CROSS APPLY (
SELECT *
FROM (VALUES
('A', cs.DataA)
, ('B', cs.DataB)
, ('C', cs.DataC)
) x(Col0, Col1)
) someValues
This takes every row from a Columnstore-backed subquery (ColumnstoreTable
) and multiplies those rows. This is essentially an UNPIVOT
. The real query is larger than this. This part of the query feeds into other processing.
The problem here is that this CROSS APPLY
is implemented as a loop join which is a reasonable choice. Unfortunately, loop joins do not support batch mode.
This part of the query is very performance critical and I suspect that running it in batch mode could be very beneficial to performance.
How can I rewrite this query so that I do not transition out of batch mode?
I did try using a temporary table instead of VALUES
, but that did not change the fact that there is no equality join condition to hash join on.
Answer :
One approach might be to use a #temp table for the values and also introduce a dummy equijoin column to allow for a hash join. For example:
-- Create a #temp table with a dummy column to match the hash join
-- and the actual column you want
CREATE TABLE #values (dummy INT NOT NULL, Col0 CHAR(1) NOT NULL)
INSERT INTO #values (dummy, Col0)
VALUES (0, 'A'),
(0, 'B'),
(0, 'C')
GO
-- A similar query, but with a dummy equijoin condition to allow for a hash join
SELECT v.Col0,
CASE v.Col0
WHEN 'A' THEN cs.DataA
WHEN 'B' THEN cs.DataB
WHEN 'C' THEN cs.DataC
END AS Col1
FROM ColumnstoreTable cs
JOIN #values v
-- Join your dummy column to any numeric column on the columnstore,
-- multiplying that column by 0 to ensure a match to all #values
ON v.dummy = cs.DataA * 0
Performance and query plan
This approach yields a query plan like the following, and the hash match is performed in batch mode:
If I replace the SELECT
statement with a SUM
of the CASE
statement in order to avoid having to stream all those rows to the console and then run the query on a real 100MM row columnstore table I have lying around, I see fairly good performance to generate the requisite 300MM rows:
CPU time = 33803 ms, elapsed time = 4363 ms.
And the actual plan shows good parallelization of the hash join.
Notes about hash join parallelization when all rows have same value
The performance of this query depends heavily on each thread on the probe side of the join having access to the full hash table (as opposed to a hash partitioned version, which would map all rows to a single thread given that there is only one distinct value for the dummy
column).
Fortunately, this is true in this case (as we can see by the lack of a Parallelism
operator on the probe side) and should reliably be true because batch mode builds a single hash table that is shared across threads. Therefore, each thread can take their rows from the Columnstore Index Scan
and match them to that shared hash table. In SQL Server 2012, this functionality was much less predictable because a spill caused the operator to restart in Row mode, both losing the benefit of batch mode and also requiring a Repartition Streams
operator on the probe side of the join which would cause thread skew in this case. Allowing spills to remain in batch mode is a major improvement in SQL Server 2014.
To my knowledge, row mode does not have this shared hash table capability. However, in some cases, typically with an estimate of fewer than 100 rows on the build side, SQL Server will create a separate copy of the hash table for each thread (identifiable by the Distribute Streams
leading into the hash join). This can be very powerful, but is much less reliable than batch mode since it depends on your cardinality estimates and SQL Server is trying to evaluate the benefits versus the cost of building a full copy of the hash table for each thread.
UNION ALL: a simpler alternative
Paul White pointed out that another, and potentially simpler, option would be to use UNION ALL
to combine the rows for each value. This is likely your best bet assuming that it’s easy for you to build up this SQL dynamically. For example:
SELECT 'A' AS Col0, c.DataA AS Col1
FROM ColumnstoreTable c
UNION ALL
SELECT 'B' AS Col0, c.DataB AS Col1
FROM ColumnstoreTable c
UNION ALL
SELECT 'C' AS Col0, c.DataC AS Col1
FROM ColumnstoreTable c
This also yields a plan that is able to utilize batch mode and provides even better performance than the original answer. (Although in both cases the performance is fast enough that any selecting or writing the data to a table quickly becomes the bottlneck.) The UNION ALL
approach also avoids playing games like multiplying by 0. Sometimes it’s best to think simple!
CPU time = 8673 ms, elapsed time = 4270 ms.