I’m trying to work out the best indexing strategy for a slightly weird pattern I’m forced to use with Azure SQL.
- Tables are dropped, re-created & loaded with new data daily from ADF
- Tables are read only once loaded.
- Tables can be 20 million+ rows & 50+ columns wide.
- Table schema may change each day, hence drop & re-create
As I don’t know the full schema in advance. I’m adding a surrogate identity column to each table & creating a Clustered Primary key on this column.
However key creation takes up to 15 mins & the table is locked during creation – this is a problem…
So – maybe I leave these tables as heaps? Forget the surrogate key & create non-clustered indexes instead as creation won’t lock the table.
Any nasty implications of heaps I’m missing? Can I do without a clustered indexkey?
I hope this makes sense – it’s a little difficult to get across without making the question 2 pages long….
By dropping the table you should avoid the weakness of SQL Server not deallocating storage when doing DELETE in some cases for a heap.
By not doing UPDATE you should avoid the weakness of forwarding pointers in the heap.
That leaves the potential advantages for your SELECT queries on this table, by making in a clustered table instead of a heap. In order to address that topic, we would need the queries and indexes that you have (or consider). There’s no generic rule here, it is all about the access pattern. As far as I can see, there’s no “magic” advantage in general making it a clustered table when it comes to querying, so you could work this case by case (SELECT by SELECT and how you handle perf tuning for those, regarding index creation etc).
I might be forgetting something obvious, of course…