Question :
I have 30 million rows inside a sql server database where I’ve added a simple tagging column. One character allowing null. I did add an index for the column.
I use it to keep track of which rows have been processed for a data extraction. During the initial development I will be restarting the extraction process from scratch (several times) and need to reset the value to ‘1’.
Being a neophyte database developer (who am I kidding…), my first attempt used a simple update script to reset the value of ‘2’ (the I’ve processed this row) to a ‘1’. The first time I ran the script it executed in about 5 minutes. The last time I executed – it took 45 minutes. There has to be a better way.
Is it possible for me to speed this up by completely removing the column and recreating it with a default value of ‘1’? Will the speed stay consistent?
— UPDATE —
The next time I need to do this I’m going to try the following:
ALTER TABLE acmewidgets DROP COLUMN LegacyExtract;
ALTER TABLE acmewidgets
ADD LegacyExtract CHAR NOT NULL DEFAULT (1);
Answer :
Used the following and it executes in 41 seconds !
USE AcmeDatabase
DROP INDEX IF EXISTS IX_LegacyExtract ON AcmeWidgetTable
ALTER TABLE AcmeWidgetTable DROP CONSTRAINT IF EXISTS DF_Legacy_Extract
ALTER TABLE AcmeWidgetTable DROP COLUMN IF EXISTS LegacyExtract
GO
ALTER TABLE AcmeWidgetTable CHAR NOT NULL CONSTRAINT DF_Legacy_Extract DEFAULT (0)
GO
CREATE NONCLUSTERED INDEX [IX_LegacyExtract] ON [dbo].[AcmeWidgetTable]
(
[LegacyExtract] ASC
)
Use a database snapshot. For a single column this may be excessive. For multiple, potentially complex reversions it can work well.
When the DB is in the desired initial state take a snapshot. This appears to consumers as a new database. It will be a read-only copy of the data as it was when the snapshot was created. Initially it holds no pages of its own, so creation is quick and takes little space. As the base DB is mutated changed pages’ “before” images are written to the snapshot. So the snapshot DB always has the data as it was when it was created and the base DB has the new data. The size of the snapshot depends on the amount of changes written. At the end of the session restore the snapshot over the base DB, thus re-initializing the data.
Since the snapshot is readable it is also useful for verifying the changes by comparing the “before” (snapshot) data to the “after” (base DB) data.
Depends on the type of data you have and if it’s write intensive or not but I’ve used this approach a lot.
- Select MAX(id) From TABLEX;
- Create Table TABLEX_2 LIKE TABLEX;
- Insert Into TABLEX_2 Select COLUMN1,COLUMN2,….,MYNEWVALUE From TABLEX Where ID <= MAX_ID_IN_STEP_1;
- RENAME TABLEX TO TABLEX_OLD,TABLEX_2 TO TABLEX;
- Insert Into TABLEX Select COLUMN1,COLUMN2,….,MYNEWVALUE From TABLEX_OLD Where ID > MAX_FROM_STEP_1;
Optionally you can also do: ALTER TABLE TABLEX_2 DISABLE/ENABLE KEYS before and after the big insert.
The benefit of this approach is that RENAME is atomic