We're experimenting with developing our own 'data warehouse' with SQL Anywhere (v 12 right now) using sqla native sql for our ETL process. We looked into using the Infomaker pipeline, but feeling nervous about the future of the product so we wrote our own "pipeline". Our database is growing quite large, and I'm not sure if it's totally because we continue to add new data to the db as we develop or if part of it is because of the method we are using each night to refresh. Basically, this is our "pipeline". Loop through a table on the warehouse containing names and locations of tables to be refreshed:

  1. Drop table if exists [proxy table or view name name] from remote database.
  2. Create existing table [proxy table or view name] at [remote name].
  3. Truncate 'local' warehouse table.
  4. Insert into warehouse table from proxy table.
  5. Drop proxy table.

We assumed that more often that not, empty space from the truncated table would be used but is it possible that it's looking for new space with each new run? We're up to 10gb but actually that is coming from 6 other databases and I'm not totally shocked at that size. The process runs quite fast so we're happy with the performance, but don't want to fill up the server unnecessarily. Of course we will continue to fine tune what data we're bringing over but if this is creating bad use of space, we may need to re-think our pipeline process. tia.

Becky Snyder Bradley University

asked 25 Nov '14, 13:56

rsnyder's gravatar image

rsnyder
436121429
accept rate: 0%

edited 25 Nov '14, 18:01

Volker%20Barth's gravatar image

Volker Barth
40.1k361549819

1

When you say "we are up to 10GB" - is this is database size (.db file) or the transaction log size (.log file)? If it is mostly the transaction log size, you could setup a backup strategy that includes transaction log renames and deleting renamed transaction log files from the original database once they have been backed up.

(26 Nov '14, 10:09) Ian McHardy

So basically the question is whether a daily repetition of truncate table/insert statements will lead to undesired fragmentation? - If so, I would think that SQL Anywhere should reclaim most of the free space from deleted rows for new ones. Of course you can monitor that with builtin tools like the sa_table_page_usage system procedure and/or properties like "FreeSpace" and the like.


FWIW: You might also be able to store the remote data in a local not transactional temporary table and then use a MERGE statement to update the "real" warehouse table. That way you might have much less modifications on the real table (preventing fragmentation in the system dbspace) at the cost of a possibly more intense usage of the temporary dbspace...

IMHO, it's a common approach to store remote data in a temporary table before it is applied/joined with local data simply because joins etc. on remote data are often very slow and inefficient. If that is the main reason you have chosen to always truncate and re-insert remote data, I'd guess a "temporary local copy" would be more efficient.


In case the remote data is coming from different database systems, you might also consider the "data import from MS SQL Server test series" Breck has published on his great blog a while ago:

In-Memory Downloading from SQL Server (5)

permanent link

answered 25 Nov '14, 17:46

Volker%20Barth's gravatar image

Volker Barth
40.1k361549819
accept rate: 34%

edited 25 Nov '14, 18:00

Comment Text Removed
1

Yes, I should have specified that some data is coming from different database systems. Thanks for your comments and suggestions. Off to find Breck's blog entry!

(26 Nov '14, 08:16) rsnyder
Replies hidden

FWIW, in case comparing "old" and "new" data from remote systems is part of your ETL steps, the following questions might be helpful:

(27 Nov '14, 04:28) Volker Barth

Off to find Breck's blog entry!

Just in order to prevent a - probably unlikely - misunderstanding: The link to Breck's blog series is contained in the answer above and the linked article leads to its prequels...

(01 Dec '14, 09:44) Volker Barth
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×15

question asked: 25 Nov '14, 13:56

question was seen: 1,688 times

last updated: 01 Dec '14, 09:44