select hash_id, hash_value, substring(T.cri, 1, 7), substring(T.cri, 9, D2.Pos-9), substring(T.cri, D2.Pos + 1, D3.Pos - D2.Pos - 1), substring(T.cri, D3.Pos + 1, D4.Pos - D3.Pos - 1), substring(T.cri, D4.Pos+1, len(T.cri) - 6 - D4.Pos), substring(right(cri,5),1,1), substring(right(cri,3),1,1), substring(right(cri,1),1,1), ref_count, compartment_id, cg_id, cri_version_id from hash_key as T WITH(index(IX_hashkey_hashid_covering_refcount),nolock) cross apply (select charindex('?', T.cri, 31)) as D2(Pos) cross apply (select charindex('?', T.cri, D2.Pos+1)) as D3(Pos) cross apply (select charindex('?', T.cri, D3.Pos+1)) as D4(Pos) where hash_id between 1 and 10000
create table hash_key_binary ( hash_id bigint not null identity, hash_value nvarchar(100) not null, cri nvarchar(100) not null, ref_count_checksum int not null default(0), compartment_id int , cg_id int , cri_version_id int, primary key (hash_value, cg_id), constraint fk_hash_compartment foreign key (compartment_id) references compartment(compartment_id) on delete no action, constraint fk_hash_compartmentgroup foreign key (cg_id) references compartment_group(group_id) on delete no action )
These are the indexes present on this table :
CREATE NONCLUSTERED INDEX IX_hashkey_hashid_covering_refcount ON hash_key (hash_id ASC) INCLUDE (ref_count) CREATE NONCLUSTERED INDEX IX_hashkey_cri ON hash_key (cri ASC)
I think your biggest issue is the schema.
- you should try to see if you can reduce the row footprint. For example, is nvarchar really needed? Using Varchar will cut the size in half. This may cut your processing time in half too. Is BigInt really needed for HashID? you have only 65M rows, a regular INT is more than enough and again half the size. Each byte you save will gain you 65Mb of data.
- you should see if you could use separate fields instead of using text manipulations, that would simplify the query, improves integrity and most probably speed.
- primary key on a nvarchar(100) is not a great idea, that should be on your identity column instead. And then make it clustered. I’m sure that will remove the need for the index hint.
- instead of making (hash_value, cg_id) a primary key, set it as an unique index
- make sure you have a good maintenance plan to reduce index fragmentation
If this query is the main concern for your system, you should be clustering on
hash_id and use a secondary index to enforce the primary key.
However, that is a suboptimisation. Likely, you are burning a lot of CPU cycles doing all those string operations. There might be a way to pre+calculate the results of the different values for
cri if the input values all fall into a small subset.
Also, depending on the distribution of the hash key, you may not be getting perfect parallelism on a query like this. A simple rewrite could dramatically change how many cores you can utilise in parallel.
If you have the opportunity to add columns to the table, why not create persisted, computed columns containing the values for D1, D2, D3? They can be calculated very fast at
INSERT time, which will help you amortise the cost of the
SELECT batch request.
Please add more context to my questions above and I will add details on potential rewrites of this query.