How should deletions be handled in the database?

Posted on

Question :

I’d like to implement an “undelete” feature in a web application such that a user can change her mind and recover a deleted record. Thoughts on how to implement this? Some options I’ve considered are actually deleting the record in question and storing the changes in a separate audit table, or not deleting the record and using a boolean “deleted” column to mark it as deleted. The latter solution would require additional application logic to ignore the “deleted” records under normal circumstances, but would make it much easier to implement recovering the records on the application side.

Answer :

Yeah, I would definitely go for the second option, but I would add one more field a date field.

So you add :

delete       boolean
delete_date  timestamp

It would let you give a time for the undelete action.

If time is less than an hour one can undelete.

To really delete the entry deleted just create a stored procedure that will clean every entry with delete set to true and time greater than one hour and put it as a cron tab that runs every 24hours

The hour is just an example.

In our applications we don’t really delete anything at a users request anyway (our clients are in regulated environments where deleting anything can potentially lead to legal issues).

We keep the older versions in a separate audit table (so for the table some_table where is also a table called some_table_audit) which is identical apart from having an additional version identifier (a timestamp if your DB suports time values granular enough, an integer version number or UUID that is a foreign key to a general audit table, or so on), and update the audit table automatically by trigger (so we don’t need to make all code that updates the records aware of the audit requirement).

This way:

  • the delete operation is just a simple delete – no need to add any extra code to that (though you might want to record who requested what rows to be deleted, even if they are not actually deleted)
  • inserts and updates are similarly simple
  • you can implement undelete or revert by just returning the “normal” row to an old version (the audit trigger will fire again so the audit trail table will reflect this change too)
  • you can offer the chance to review or revert to any past version not just undelete the last one
  • you do not have to add “is marked as deleted?” checks to every code point that refers to the table in question, or “update audit copy” logic to every code point that deletes/updates rows (though you need to decide what to do with deleted rows in the audit table: we do have a deleted/not flag for each version there so there isn’t a hole in the history if records are deleted and later undeleted)
  • keeping the audit copies in a separate table means you can partition them off into different filegroups easily.

If using a timestamp instead of (or as well as) an integer version number, you can use this to delete the older copies after a set amount of time if needed. But disk space is relatively cheap these days so unless we have reason to drop old data (i.e. data protection regulations that say you should delete client data after X months/years) we wouldn’t.

This answer has been around a few years and a couple of key things that could affect this sort of planning have changed since then. I’ll not go into massive detail, but breifly for the benefit of people reading this today:

  • SQL Server 2016 introduced “system versioned temporal tables” which do a lot of this work for you, and more besides as some nice syntactic sugar is provided to make historic queries easier to construct & maintain, and they coordinate a subset of schema changes between the base and history tables. They are not without their caveats, but they are a powerful tool for this sort of purpose. Similar features are also available in other DB systems.

  • Changes to data protection legislation, particulaly the introduction of GDPR, can significantly alter the matter of when data should be hard deleted. You have to weigh up the balance of not deleting data that might be useful (or, indeed, legally required) for auditing purposes at a later date against needing to respect peoples rights (both generally and as specifically set out in relevant legislation) when considering your designs. This can be an issue with system versioned temporal tables as you can’t modify the history to purge personal data without schema short term changes to turn off the history tracking while you make changes.

With a boolean deleted column , you’ll start to have problems if your table starts to grow and gets really big . I suggest you move deleted columns once a week ( more or less depending on your specs ) to a different table . That way you have a nice small active table and a big one containing all records gathered over time.

I’d go with the separate table. Ruby on Rails has an acts_as_versioned plugin, which basically saves a row to another table with the postfix _version before it updates it. While you don’t need that exact behavior, it should also work for your case (copy before deleting).

Like @Spredzy I’d also recommend adding a delete_date column to be able to programatically purge records that haven’t been restored after X hours/days/whatever.

The solution we use internally for this matter is to have a status column with some hard coded values for some specific states of the object: Deleted, Active, Inactive, Open, Closed, Blocked – each status with some meaning used in the application. From db point of view we don’t remove objects, we just change the status and keep history for each change in the object table.

When you say that “The latter solution would require additional application logic to ignore the ‘deleted’ records”, the simple solution is to have a view which filters them out.

Leave a Reply

Your email address will not be published. Required fields are marked *