I work on a team where we have millions of records of data, but not 100s of millions anything crazy.
We have had recent DB issues where we have hung connections due to Page Latch IOs, where the DB is loading data from disk into memory. This is crashing out DBs, and this has happened a few times in the past year.
A little bit about that way we have our data:
We have a “Carton” table and “Items” table. Items are contained inside cartons, so the carton is the parent entity and items are child (one to many relationship). These tables both have many attributes associated with them.
Okay, now moving forward. We look at the query optimizer and it tells us to create missing indexes. Here is an example:
CREATE INDEX missing_index_69 ON dbo.items (carton_number) INCLUDE (item_number, entered_upc, item_desc, department, item_qty, item_cost_amt, cost_multi_qty, item_retail_amt, retail_multi_qty, item_sell_amt, sell_multi_qty, mdse_div_nbr, sams_msde_cat_nbr, base_retail_uom_cd, item_sale_type_co, scan_cnt, donation_item_id, price_embedded, weighted, cvp, gtin, wm_discretion, category, sub_category, created_date, last_change_date, created_user_id, last_change_user_id, replenished_by_unit, uom, deleted, qty_prev, qty_delta, bucket_id, bucket_color, hazmat, hazwaste, cvp_value, claim_value, wmtd_claim_eligible, smart_label_id)
This issue arose when we did a outer fetch to get the items associated with the carton, and we looked for the items via the carton_number (Carton_number is primary key for Carton table, and is a Foreign-Key for the items, also with an index on it!)
Yes, we do Select All so that is why it is requesting to add all those fields, but I digress.
My problem is, this assumes that every single time I add a new column to the Item table, that I will have to update the index to add it in the include column. We have many queries where we search by different criteria (by smart_label_id, created_date, carton_state, etc.) For each one of those queries with indexes on them, do I really have to include all the columns, and update EACH ONE of those queries anytime I add a new column?? That seems extremely excessive, especially since we have many queries and adding a new column would require a LOT of refactoring.
Is this normal? It is my understanding that SQL server can handle billions of records with a lot of attributes, and I am surprised that we are running into this issue with the amount of data we have.
The missing index code is extremely simple in what recommendations in suggests. One thing is that it doesn’t care about selectivity.
The reason to include columns is to cover a query. I.e., so you don’t have to do a bookmark lookup for each row. If you have a rather high selectivity, then these lookups don’t matter much, and it is extreme overkill to include bunch and bunch of columns. I.e., skip the included columns and just focus on the WHERE cause and the index keys.
So, don’t give those “missing index” recommendations much credit. At all. But they can serve as starting points for your own analysis.
To cluster the “child table” om the FK is another often forgotten trick. As already mentioned by Laughing Vergil.