We are tidying up some aspects of our database, and looking to delete approximately 25% of the data (by volume).

I am aware that disk space will not be reclaimed unless we unload/reload.

So my question is will this purge improve backup times, or is the backup process based on the file size of the database on disk (which won't have changed in this case)?

asked 29 Aug '15, 18:52

alexryder's gravatar image

alexryder
1163410
accept rate: 0%

If space and/or time required for backup are an issue, you might also have a closer look at the WITH CHECKPOINT LOG clause options of the BACKUP statement (or the according dbbackup -k options) - in case you happen to backup a database while transactions are running, the effect of the varions options might make a difference.

FWIW: I do not know whether the pages of an empty checkpoint log (i.e. once a checkpoint has been made and no further modifications have been made so far) are added to the database backup. If so, then I guess the named options would have an influence here, whether or not there are running transactions while the backup is in progress.

(31 Aug '15, 03:25) Volker Barth

Purging old data is large and (often) difficult topic.

If the purge process creates empty pages in the *.db file, then an archive backup may run faster: "By default, archive backups skip some free pages, which can result in smaller and potentially faster backups. Free page elimination has no affect on the back up of transaction log files because transaction log files do not contain free pages. So, databases with large transaction log files may not benefit as much from free page elimination as databases with small transaction log files."

An image backup (by far the most popular kind) will not benefit from empty pages in the *.db file.

Caveat Emptor: I have not done any comparative performance tests of archive-versus-image backups... but you should, if backup speed is a concern. That's a big if... folks sometimes apply their efforts to bottlenecks they understand rather than bottlenecks that exist :)

The purge process will inflate the transaction log, adversely affecting both image and archive backups... that effect can be mitigated for future runs by telling the backup process to rename-and-recreate the transaction log.

In some cases, leaving the empty pages alone for future re-growth is a good strategy, since the process of growing the *.db file in little chunks takes time and can result in immediate file fragmentation.

Purging is often a good idea, and it is just as often implemented poorly (present company excepted, of course :)

permanent link

answered 30 Aug '15, 08:44

Breck%20Carter's gravatar image

Breck Carter
25.7k428592852
accept rate: 20%

edited 30 Aug '15, 08:50

1

Just to add something possibly too obvious: Of course the freed space could speed up later database file backups, in case new data will be stored on free pages and so will not need to be stored on additional pages that would increase the file size and therefore increase the time required for a full backup...

(31 Aug '15, 03:12) Volker Barth
Replies hidden

Missed that... Does it mean have to turn in my Captain Obvious hat? :)...

(31 Aug '15, 07:53) Breck Carter

Thanks for the info

(31 Aug '15, 16:44) alexryder

In addition to Breck's excellent summary (sailor cap or not), it should be noted that a 25% purge of the data does not directly guarantee 25% of the database will consist of free pages. You would need to reorganize (at least) the largest tables first (after the purge) to approximate (as in no guarantee) that.

permanent link

answered 31 Aug '15, 11:15

Nick%20Elson%20SAP%20SQL%20Anywhere's gravatar image

Nick Elson S...
7.3k35107
accept rate: 31%

edited 31 Aug '15, 11:17

Can you please eloborate on what do you mean by reorganize the table

(31 Aug '15, 16:46) alexryder
Replies hidden
1

Databases can become internally fragmented and disorganized. This is different from external file fragmentation that "defragmenting" utilities can deal with.

Internal disorganization affects table pages and index pages in different ways. The REORGANIZE statement can be used to reorganize individual tables and indexes, and the database rebuild process can recreate the entire database in a state of perfect organization; the latter does a better job than the former but the difference is probably moot.

For the purposes of this discussion, reorganizing a table with a lot of deleted rows will (more or less) pack all the data into fewer pages, turning (much of) the free space into completely free (empty) pages... which the archive backup will ignore.

The row deletion process can wreak havoc on the internal organization of both tables and indexes (the data for which are stored in different sets of pages), so reorganization can be a big deal after a massive purge.

Or not... the disorganization may or may not affect performance, there are no absolutes. For example, if you re-insert a lot of rows, SQL Anywhere may do a pretty good job of restoring the inner goodness of the data layout.

Disclaimer: The vagueness and equivocation is intended... or as Nick said, "no guarantee" :)

(31 Aug '15, 20:33) Breck Carter
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×72

question asked: 29 Aug '15, 18:52

question was seen: 401 times

last updated: 01 Sep '15, 01:53