What is the cost of having many columns as included columns in a nonclustred index?

Posted on

Question :

What is the cost of having many columns as included columns in a nonclustred index in SQL Server?

Answer :

The more columns you include to index the larger this index becomes. It affects all operations with this index (inserts, updates, selects). It takes more space in buffer pool and when you use index more data needs to be processed (including maintenance tasks like integrity checks and backups). Also when you update column included to this index SQL Server needs to update this index too (it means more reads, writes and even more locks).

Nikita has almost everything covered in her answer.

However the answer is also dependent on how many columns and what type of columns you may decide to add.

If you are adding columns with data types occupying less space may be the overall cost may not look that harmful. Please note I am just saying this because we do not know how many columns you are talking about.

But if there are columns like nvarchar(max) which stores an entire story about something, you can imagine adding those as included column can make things go worse quickly and probably should be avoided.

Now how to perform analysis for such things , comes a handy great tool sp_blitzindex which you can download from brentozar.com.

He has all the great post discussing how to use as a starter and analyze the index usage which will give you the space in addition to overall index utilization which will help you narrow down the search and probably a better answer on what is the cost.

“How can I look at the overall cost?”

Above was in the comment and I think the question deserve some explicit attention. But I don’t think David will like the “answer”…

What do you mean by cost? Something we can quantify, I take it. Which leads to the question of what we want to measure and what unit? Response time? I/O? CPU? Memory usage? Log records produced? Size of backups? Money?

Imagine we can put a simple answer to above, we then have the next challenge:

Do we have a repeatable load? Without something we can run over and over, how can we measure the cost with vs without these additional columns? That load should reflect your load, or it won’t be relevant for your installation.

Then we have the next question. Should that load include everything? Every component of your application that accesses your data? The day-to-day work? The batch operations you run nightly? Reindex? DBCC CHECKDB? Etc.

Say we do indeed have answers to all above, then it is time to pick whatever tool we want to use. This can include performance monitor, Extended Event tracing, sp_blitzindex, various 3rd party applications and scripts. Which to choose is partly dependent on the answers to above questions.

Quite a challenge!

Leave a Reply

Your email address will not be published. Required fields are marked *