We have a data warehouse with a fairly large record count (10-20 million rows) and often run queries that count records between certain dates, or count records with certain flags, e.g.
SELECT f.IsFoo, COUNT(*) AS WidgetCount FROM Widgets AS w JOIN Flags AS f ON f.FlagId = w.FlagId WHERE w.Date >= @startDate GROUP BY f.IsFoo
The performance isn’t awful, but can be relatively sluggish (perhaps 10 seconds on a cold cache).
Recently I discovered that I can use
GROUP BY in indexed views and so tried out something similar to the following
CREATE VIEW TestView WITH SCHEMABINDING AS SELECT Date, FlagId, COUNT_BIG(*) AS WidgetCount FROM Widgets GROUP BY Date, FlagId; GO CREATE UNIQUE CLUSTERED INDEX PK_TestView ON TestView ( Date, FlagId );
As a result the performance of my first query is now < 100ms, and the resulting view & index is < 100k (although our row count is large, the range of dates and flag IDs means that this view only contains 1000-2000 rows).
I thought that perhaps this would criple the performance of writes to the Widget table, but no – the performance of inserts and updates into this table is pretty much unaffected as far as I could tell (plus, being a data warehouse this table is updated infrequently anyway)
To me, this seems way too good to be true – is it? What do I need to be careful with when using indexed views in this way?
As you’ve noted, the view itself only materializes a small number of rows – so even if you update the whole table, the additional I/O involved with updating the view is negligible. You probably already felt the biggest pain you’re going to feel when you created the view. The next closest will be if you add a gazillion rows to the base table with a bunch of new IDs that require new rows in the view.
This isn’t too good to be true. You’re using indexed views exactly how they were meant to be used – or at least one of the most effective ways: to pay for future query aggregations at write time. This works best when the result is much smaller than the source and of course when the aggregations are requested more often than the underlying data is updated (more common in DW than OLTP, generally).
Unfortunately many people think indexing a view is magic – an index won’t make all views more efficient, especially views that simply join tables and/or produce the same number of rows as the source (or even multiply). In these cases the I/O from the view is the same or even worse than the original query, not only because there are the same or more rows, but often they are storing and materializing more columns, too. So materializing those in advance doesn’t provide any gains, since – even with SSDs – I/O, network, and client processing/rendering still remain the primary bottlenecks in returning large resultsets to the client. The savings you get in avoiding the join at runtime just aren’t measurable in comparison to all the other resources you’re still using.
Like non-clustered indexes, just be careful to not over-do it. If you add 10 different indexed views to one table, you’re going to see more impact to the write portion of your workload, especially if the grouping column(s) are not (in) the clustering key.
Gosh, I’ve been meaning to blog about this topic.
Aarons answers covered this question well. Two things to add:
- Aggregation indexed views can lead to cross-row contention and deadlocks. Normally, two inserts do not deadlock (except for rather rare conditions such as lock escalation or lock hash collisions). But if both inserts address the same group in the view they will contend. The same point stands for anything else that takes locks (DML, lock hints).
- Indexed views that do not aggregate can be useful as well. They allow you to index on columns from multiple tables. That way you can efficiently filter on one table and order by a column from a joined table. That pattern can convert full-table join to tiny constant-time queries.
I have used both aggregation and join views with extreme benefit.
All in all your use case seems like a perfect case. Indexed views are a technique far underutilized.