How to make use of Batch Mode with an UNPIVOT (a loop join)?

Posted on

Question :

I have a query of the following form:

SELECT ...
FROM ColumnstoreTable cs
CROSS APPLY (
    SELECT *
    FROM (VALUES
        ('A', cs.DataA)
        , ('B', cs.DataB)
        , ('C', cs.DataC)
    ) x(Col0, Col1)
) someValues

This takes every row from a Columnstore-backed subquery (ColumnstoreTable) and multiplies those rows. This is essentially an UNPIVOT. The real query is larger than this. This part of the query feeds into other processing.

The problem here is that this CROSS APPLY is implemented as a loop join which is a reasonable choice. Unfortunately, loop joins do not support batch mode.

This part of the query is very performance critical and I suspect that running it in batch mode could be very beneficial to performance.

How can I rewrite this query so that I do not transition out of batch mode?

I did try using a temporary table instead of VALUES, but that did not change the fact that there is no equality join condition to hash join on.

Answer :

One approach might be to use a #temp table for the values and also introduce a dummy equijoin column to allow for a hash join. For example:

-- Create a #temp table with a dummy column to match the hash join
-- and the actual column you want
CREATE TABLE #values (dummy INT NOT NULL, Col0 CHAR(1) NOT NULL)
INSERT INTO #values (dummy, Col0)
VALUES (0, 'A'),
        (0, 'B'),
        (0, 'C')
GO

-- A similar query, but with a dummy equijoin condition to allow for a hash join
SELECT v.Col0, 
    CASE v.Col0 
        WHEN 'A' THEN cs.DataA 
        WHEN 'B' THEN cs.DataB 
        WHEN 'C' THEN cs.DataC
        END AS Col1
FROM ColumnstoreTable cs
JOIN #values v
    -- Join your dummy column to any numeric column on the columnstore,
    -- multiplying that column by 0 to ensure a match to all #values
    ON v.dummy = cs.DataA * 0

Performance and query plan

This approach yields a query plan like the following, and the hash match is performed in batch mode:

enter image description here

If I replace the SELECT statement with a SUM of the CASE statement in order to avoid having to stream all those rows to the console and then run the query on a real 100MM row columnstore table I have lying around, I see fairly good performance to generate the requisite 300MM rows:

CPU time = 33803 ms, elapsed time = 4363 ms.

And the actual plan shows good parallelization of the hash join.

enter image description here

Notes about hash join parallelization when all rows have same value

The performance of this query depends heavily on each thread on the probe side of the join having access to the full hash table (as opposed to a hash partitioned version, which would map all rows to a single thread given that there is only one distinct value for the dummy column).

Fortunately, this is true in this case (as we can see by the lack of a Parallelism operator on the probe side) and should reliably be true because batch mode builds a single hash table that is shared across threads. Therefore, each thread can take their rows from the Columnstore Index Scan and match them to that shared hash table. In SQL Server 2012, this functionality was much less predictable because a spill caused the operator to restart in Row mode, both losing the benefit of batch mode and also requiring a Repartition Streams operator on the probe side of the join which would cause thread skew in this case. Allowing spills to remain in batch mode is a major improvement in SQL Server 2014.

To my knowledge, row mode does not have this shared hash table capability. However, in some cases, typically with an estimate of fewer than 100 rows on the build side, SQL Server will create a separate copy of the hash table for each thread (identifiable by the Distribute Streams leading into the hash join). This can be very powerful, but is much less reliable than batch mode since it depends on your cardinality estimates and SQL Server is trying to evaluate the benefits versus the cost of building a full copy of the hash table for each thread.

UNION ALL: a simpler alternative

Paul White pointed out that another, and potentially simpler, option would be to use UNION ALL to combine the rows for each value. This is likely your best bet assuming that it’s easy for you to build up this SQL dynamically. For example:

SELECT 'A' AS Col0, c.DataA AS Col1
FROM ColumnstoreTable c
UNION ALL
SELECT 'B' AS Col0, c.DataB AS Col1
FROM ColumnstoreTable c
UNION ALL
SELECT 'C' AS Col0, c.DataC AS Col1
FROM ColumnstoreTable c

This also yields a plan that is able to utilize batch mode and provides even better performance than the original answer. (Although in both cases the performance is fast enough that any selecting or writing the data to a table quickly becomes the bottlneck.) The UNION ALL approach also avoids playing games like multiplying by 0. Sometimes it’s best to think simple!

CPU time = 8673 ms, elapsed time = 4270 ms.

enter image description here

Leave a Reply

Your email address will not be published. Required fields are marked *