We need to store a lot of logs in MS SQL Server 2016 Database for 1-1.5 year.
Income: 50 Gb of logs per month (~ 50 millions rows per month), we store all data in 6 months and shrink 50% of this data for older records.
- Required different indexes for last 1st month, 2nd month and others.
- Required data compression for big XML/Text data older than 2 months.
- For now – Required data shrink for data older than 6 months (just remove 50-60% records with big data (XML/Text) from table), but it will be better just heavily compress rows and don’t remove it.
Main Use-Case: search for 100-200 (sometime 1000) rows by some IDs or string filters in SQL IN manner (sometime LIKE).
Exist two approaches for partition:
- “Old style” – Separated Tables for each period (I think without common View in our case), month as example (Log_0117, Log_0217, …).
- Pros: Fully independent management and schema (for evolution), independent indexes and query/plans caching, as expected – more performance and efficiency, simple to implement, less magic and more controlable.
- Cons: Many-many tables (8 for every period), complex queries (especially for cross-period queries).
- “Modern style” –
Partitioned tables and indexes.
- Pros: Easy management, just one table for all periods, in theory simpler queries.
- Cons (as I found by some articles): Strong requirements for partitioned indexes (by normal, all indexes must contain partition-column (therefore more memory required), such indexes require rebuild after partition switching, hard to change schema/indexes for different periods, more magic and less controlable.
My questions for community:
- Is it really builtin Partitioned tables/indexes make developing more simple?
- Does it have really inconvenient limitations now in MS SQL Server 2016 (as Enterprise, as or everything not so bad?
Columnstore Indexes – another interesting option from @DavidBrowne, but as described at MSDN:
Columnstore indexes give high performance gains for queries that use full table scans, and are not well-suited for queries that seek into the data, searching for a particular value.
Unfortunately, we mostly search for 100-200 records in logs, rather than calculate complex statistics, but I save it in the mind for future, maybe will be helpfull.
UPD2: @DanGuzman, I meant Aligned indexes when said about some requirements – https://technet.microsoft.com/en-us/library/ms187526(v=sql.105).aspx:
Partitioning NonClustered Indexes
When partitioning a unique nonclustered index,
the index key must contain the partitioning column. When partitioning
a nonunique, nonclustered index, SQL Server adds the partitioning
column by default as a nonkey (included) column of the index to make
sure the index is aligned with the base table. SQL Server does not add
the partitioning column to the index if it is already present in the
It’s also heavily recommended not to have Non-Aligned indexes 1) partitioning big tables – indexes , 2) Performance Impacts of Partitioned Aligned Indexes.
It means – if we want to make queries faster just for last month, we need to: 1) cover all table by this index (but it requires really more disk/memory) or 2) organize long maintenance window for partition switching and indexes rebuilding.
Is it really builtin Partitioned tables/indexes make developing more
Compared to developing against independent tables that do not have view abstraction, partitioned tables are much easier to query because partitioning is transparent to application queries.
Partitioned table sliding window maintenance involves partition
SWITCH in the pre-SQL 2016 world) plus a partition function
SPLIT/MERGE. Older data/index partitions can be compressed during the maintenance. One should also update partition stats after a
SWITCH. DDL scripts for these operations are easily automated and less complex than scripts that create/drop/compress separate tables and indexes dynamically.
Does it have really inconvenient limitations now in MS SQL Server 2016
(as Enterprise, as or everything not so bad?
The biggest limitation with table partitioning is that the partitioning column must be part of all unique indexes and constraints. As long as the tables you plan to partition table fit within these considerations, partitioning provides a number of manageability benefits without changes to app code. Note that Microsoft recommends partitioned tables over partitioned views when possible. Beginning with SQL Server 2016 SP1, partitioned tables are available in Standard Edition.
Columnstore compression often provides much higher compression that page or row compression on a rowstore. You can also create partitioned non-clustered indexes (compressed or not) on a partitioned columnstore to improve performance of queries that return relatively few rows. I suggest you consider both table partitioning and columnstore.
I’m not sure I fully understand why you would want to make queries faster only for last month. I’m guessing you just don’t want the space overhead for indexes on data that are infrequently used. Given the low cost of storage nowadays, I wouldn’t bother introducing additional complexity to save some index space. I suspect compression, especially columnstore, will eliminate the need to jump through such hoops.
If you are still unsure of the path forward, I suggest you develop prototypes of the candidate designs and choose the one that best addresses your needs with the least complexity.