Question :
Database in SQL Server 2014 has one table and is consuming around 17-20 GB of disk space each day. Assuming that level of usage remains fairly consistent and not accounting for lower usage on weekends/holidays.
That would mean I am looking at requiring around 7.3 TB of space to house 1 year of table data. I want to figure out long term strategy is going to be for this, including archiving (when/where/how). The database currently contains just a tad over 1 billion rows.
I am trying to write a stored procedure to archive data in other table but I can do it once with my code. Later if I want I have to create one more table each time.
create PROC [dbo].[ArchiveData_auditlog1]
(
@CutOffDate datetime = NULL
)
AS
BEGIN
SET NOCOUNT ON
IF @CutOffDate IS NULL
BEGIN
SET @CutOffDate = DATEADD(mm, -8, CURRENT_TIMESTAMP)
END
ELSE
BEGIN
IF @CutOffDate > DATEADD(mm, -3, CURRENT_TIMESTAMP)
BEGIN
RAISERROR ('Cannot delete orders from last three months', 16, 1)
RETURN -1
END
END
BEGIN TRAN
SELECT * into [dbo].[archive_auditlog2]
FROM [dbo].[AuditLog_test]
WHERE timestamp < @CutOffDate
IF @@ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while copying data to dbo.archive.account', 16, 1)
RETURN -1
END
DELETE [dbo].[AuditLog_test]
WHERE timestamp < @CutOffDate
IF @@ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR ('Error occured while deleting data from dbo.archive.account', 16, 1)
RETURN -1
END
IF @@TRANCOUNT > 0
BEGIN
COMMIT TRAN
RETURN 0
END
end
My question is what will be best way to fix problem, table has no dependency on other tables and has primary key on Id. I want to archive data based on timestamp column.
If I have to do partitioning how can I partition this big table, how can I migrate data to archive server.
Answer :
What do you mean by archive? Do you want to delete the data?
Moving it to another db or server will still take up the same space. It may improve performance but you’re just moving the problem around.
Do you need the data in its original form? Can you aggregate the data to reduce the number of rows?
If you do move the data to another table you may find a clustered columnstore may give better compression and save disk space.
Personally i would be using ssis and not a sp using linked servers. As you have a primary key id be using that as a way of identifying what has been inserted and then can be deleted from the original table. An index on the column you are using for date range would be extremely beneficial. Id also consider using smaller batches such as one days worth at a time.