Question :
I am not getting index seek in the following scenario.
Instead of inserting <Number>xxx</Number>
in xml column like in this post Why is the secondary selective index not used when the where clause filters on `value()`?, insert 100k rows with this xml <SomeText>NiceText</SomeText>
and similar amount of rows of this <SomeText>MoreText</SomeText>
. Doesnt need to be 100k. Just needs to be many.
Then add index
create selective xml index SIX_T on dbo.T(XMLDoc) for
(
pathXQUERY = '/SomeText' as xquery 'xs:string' maxlength(8) singleton
);
And secondary index
create xml index SIX_T_pathXQUERY on dbo.T(XMLDoc)
using xml index SIX_T for (pathXQUERY);
Then do a count
select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1;
Notice that it doesnt use index seek and is “slow”. Can take several seconds with millions of rows.
If i insert same values in a standard column and add an index to it and do a
select count(id)
from dbo.T as T where SomeTextColumn = 'MoreText'
i get results instantly.
All tests done on sql server 18.3.1
The question is, how can i make counting by xml as fast as counting by column?
Thanks
Answer :
Data differences
With a low amount of records to seek on, the optimizer is able to use the SIX_T_pathXQUERY
index:
and filter on moretext
with a seek predicate:
An interesting part here is that it executes the key lookup to get the path_1_id values that are not null.
Since that is a filter definition on the nonclustered xml index…
…while not being present in the index itself.
As a result, for the optimizer to consider using the index it knows that it has to complete these steps:
- Filter on the XML value with the secondary index on the internal
table - Match these returned values with the
SIX_T
clustered index on the
internal table and filter onpath_1_id is not null
sincepath_1_id
is
not included in the secondary index - Match these values with the actual table,
dbo.T
to return the ID to count on
Tipping points
When the expected rows to be returned are higher, it favours using the selective clustered XML Index, to be able to run a merge join & no key lookup instead:
with a residual predicate:
To filter on the xml column and path_1_id
Comparing the plans
You could (but you shouldn’t) use the USE PLAN
hint to force the plan with the seek on the XML column and see what would happen if we where to search on these values.
With the execution time =
CPU time = 218 ms, elapsed time = 215 ms.
And the execution time for the scan + merge join plan:
CPU time = 62 ms, elapsed time = 58 ms.
In short, I believe that the scan with residual predicate + merge join choice was the right choice by the optimizer.
No way around it
While I might be wrong, I don’t think there is a way to improve the count query with regular XML Indexes.
We also cannot change these internal tables or even query them:
SELECT * FROM
[sys].[xml_sxi_table_1463676262_256000];
It even gives a missing index hint on the merge join query plan, which you cannot create:
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [sys].[xml_sxi_table_1463676262_256000] ([pathXQUERY_1_value])
INCLUDE ([path_1_id])
To improve the queries, you would have to look into non-xml index solutions.
Edit
How do i use USE PLAN hint? You have example code for this query?
I would advise against this, but for testing purposes it would be fine.
Step 1: You would have to get the actual execution plan of the query and get
the xml:
If you don’t have the query with the low estimates, enter a value that does not exist like:
select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "bbb"]') = 1
Step 2: Replace all '
s with ''
s in the execution plan xml, we will need this later.
Step 3: Paste the plan between OPTION( USE PLAN '')
SELECT count(*)
FROM dbo.T as T
WHERE T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1
OPTION(USE PLAN
'');
Step 4: I had to change utf-16 to utf-8 to get the use plan hint to work
From:
OPTION(USE PLAN
'<?xml version="1.0" encoding="utf-16"?>
To:
OPTION(USE PLAN
'<?xml version="1.0" encoding="utf-8"?>
Step 5: Run the query.
My query now looks like this:
select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1
OPTION(USE PLAN
'<?xml version="1.0" encoding="utf-8"?>
<ShowPlanXML xmlns_xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns_xsd="http://www.w3.org/2001/XMLSchema" Version="1.481" Build="14.0.3223.3" >
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="1" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" CardinalityEstimationModelVersion="70" StatementSubTreeCost="0.00986014" StatementText="select count(*)
from dbo.T as T
where T.XMLDoc.exist(''/SomeText[. eq "bbb"]'') = 1" StatementType="SELECT" QueryHash="0x412154B6AD55BBFC" QueryPlanHash="0x280B174BF20902E3" RetrievedFromCache="true" SecurityPolicyApplied="false">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan DegreeOfParallelism="1" CachedPlanSize="40" CompileTime="394" CompileCPU="301" CompileMemory="648">
<MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="419404" EstimatedPagesCached="52425" EstimatedAvailableDegreeOfParallelism="2" MaxCompileMemory="3655432" />
<QueryTimeStats CpuTime="0" ElapsedTime="0" />
<RelOp AvgRowSize="11" EstimateCPU="0" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.00986014">
<OutputList>
<ColumnReference Column="Expr1017" />
</OutputList>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1017" />
<ScalarOperator ScalarString="CONVERT_IMPLICIT(int,[Expr1020],0)">
<Convert DataType="int" Style="0" Implicit="true">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1020" />
</Identifier>
</ScalarOperator>
</Convert>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="11" EstimateCPU="1.1E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Aggregate" NodeId="1" Parallel="false" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="0.00986014">
<OutputList>
<ColumnReference Column="Expr1020" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="1" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
</RunTimeInformation>
<StreamAggregate>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1020" />
<ScalarOperator ScalarString="Count(*)">
<Aggregate AggType="countstar" Distinct="false" />
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="9" EstimateCPU="1E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Aggregate" NodeId="3" Parallel="false" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="0.00985904">
<OutputList />
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
</RunTimeInformation>
<StreamAggregate>
<DefinedValues />
<GroupBy>
<ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
</GroupBy>
<RelOp AvgRowSize="11" EstimateCPU="4.18E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Inner Join" NodeId="4" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.00985804">
<OutputList>
<ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
</RunTimeInformation>
<NestedLoops Optimized="false">
<OuterReferences>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
</OuterReferences>
<RelOp AvgRowSize="16" EstimateCPU="4.18E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Inner Join" NodeId="5" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.00657038">
<OutputList>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
</RunTimeInformation>
<NestedLoops Optimized="false">
<OuterReferences>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
</OuterReferences>
<RelOp AvgRowSize="15" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" EstimatedRowsRead="1" LogicalOp="Index Seek" NodeId="6" Parallel="false" PhysicalOp="Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="150002">
<OutputList>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="1" ActualLogicalReads="3" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
</RunTimeInformation>
<IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
</DefinedValue>
</DefinedValues>
<Object Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Index="[SIX_T_pathXQUERY]" Filtered="true" Alias="[SomeText:1]" IndexKind="SecondarySelectiveXML" Storage="RowStore" />
<SeekPredicates>
<SeekPredicateNew>
<SeekKeys>
<Prefix ScanType="EQ">
<RangeColumns>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pathXQUERY_1_value" />
</RangeColumns>
<RangeExpressions>
<ScalarOperator ScalarString="N''bbb''">
<Const ConstValue="N''bbb''" />
</ScalarOperator>
</RangeExpressions>
</Prefix>
</SeekKeys>
</SeekPredicateNew>
</SeekPredicates>
</IndexScan>
</RelOp>
<RelOp AvgRowSize="461" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Clustered Index Seek" NodeId="8" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="150002">
<OutputList />
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="0" ActualExecutions="0" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="0" ActualLogicalReads="0" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
</RunTimeInformation>
<IndexScan Lookup="true" Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues />
<Object Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Index="[SIX_T]" Alias="[SomeText:1]" TableReferenceId="-1" IndexKind="SelectiveXML" Storage="RowStore" />
<SeekPredicates>
<SeekPredicateNew>
<SeekKeys>
<Prefix ScanType="EQ">
<RangeColumns>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
</RangeColumns>
<RangeExpressions>
<ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[pk1] as [SomeText:1].[pk1]">
<Identifier>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
</Identifier>
</ScalarOperator>
<ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[row_id] as [SomeText:1].[row_id]">
<Identifier>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
</Identifier>
</ScalarOperator>
</RangeExpressions>
</Prefix>
</SeekKeys>
</SeekPredicateNew>
</SeekPredicates>
<Predicate>
<ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[path_1_id] as [SomeText:1].[path_1_id] IS NOT NULL">
<Logical Operation="IS NOT NULL">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="path_1_id" />
</Identifier>
</ScalarOperator>
</Logical>
</ScalarOperator>
</Predicate>
</IndexScan>
</RelOp>
</NestedLoops>
</RelOp>
<RelOp AvgRowSize="11" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" EstimatedRowsRead="1" LogicalOp="Clustered Index Seek" NodeId="9" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="200002">
<OutputList>
<ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="0" ActualExecutions="0" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="0" ActualLogicalReads="0" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
</RunTimeInformation>
<IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
</DefinedValue>
</DefinedValues>
<Object Database="[adventureworks]" Schema="[dbo]" Table="[T]" Index="[PK__T__3214EC27EF7043C5]" Alias="[T]" IndexKind="Clustered" Storage="RowStore" />
<SeekPredicates>
<SeekPredicateNew>
<SeekKeys>
<Prefix ScanType="EQ">
<RangeColumns>
<ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
</RangeColumns>
<RangeExpressions>
<ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[pk1] as [SomeText:1].[pk1]">
<Identifier>
<ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
</Identifier>
</ScalarOperator>
</RangeExpressions>
</Prefix>
</SeekKeys>
</SeekPredicateNew>
</SeekPredicates>
</IndexScan>
</RelOp>
</NestedLoops>
</RelOp>
</StreamAggregate>
</RelOp>
</StreamAggregate>
</RelOp>
</ComputeScalar>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>');
The question is, how can i make counting by xml as fast as counting by
column?
Depending on how flexible you need to be with your xPath expression you could use property promotion.
create function dbo.GetSomeText(@X xml) returns varchar(8) with schemabinding as
begin
return @X.value('(/SomeText/text())[1]', 'varchar(8)');
end;
go
alter table dbo.T add SomeTextColumn as dbo.GetSomeText(XMLDoc);
go
create index T_IX_SomeText on dbo.T(SomeTextColumn);
go
select count(T.ID)
from dbo.T as T
where T.SomeTextColumn = 'MoreText';