We are tidying up some aspects of our database, and looking to delete approximately 25% of the data (by volume).
I am aware that disk space will not be reclaimed unless we unload/reload.
So my question is will this purge improve backup times, or is the backup process based on the file size of the database on disk (which won't have changed in this case)?
asked 29 Aug '15, 18:52
Purging old data is large and (often) difficult topic.
If the purge process creates empty pages in the *.db file, then an archive backup may run faster: "By default, archive backups skip some free pages, which can result in smaller and potentially faster backups. Free page elimination has no affect on the back up of transaction log files because transaction log files do not contain free pages. So, databases with large transaction log files may not benefit as much from free page elimination as databases with small transaction log files."
An image backup (by far the most popular kind) will not benefit from empty pages in the *.db file.
Caveat Emptor: I have not done any comparative performance tests of archive-versus-image backups... but you should, if backup speed is a concern. That's a big if... folks sometimes apply their efforts to bottlenecks they understand rather than bottlenecks that exist :)
The purge process will inflate the transaction log, adversely affecting both image and archive backups... that effect can be mitigated for future runs by telling the backup process to rename-and-recreate the transaction log.
In some cases, leaving the empty pages alone for future re-growth is a good strategy, since the process of growing the *.db file in little chunks takes time and can result in immediate file fragmentation.
Purging is often a good idea, and it is just as often implemented poorly (present company excepted, of course :)
In addition to Breck's excellent summary (sailor cap or not), it should be noted that a 25% purge of the data does not directly guarantee 25% of the database will consist of free pages. You would need to reorganize (at least) the largest tables first (after the purge) to approximate (as in no guarantee) that.