A large number of rows are to be copied from one SQL Anywhere 11.0.1.2276 database to another database in the same folder on the same computer via simple INSERT t SELECT * FROM proxy_t statements. Millions of rows, gigabytes of data, hours of processing time using two SQL Anywhere engines.

What are the performance pros and cons of starting both databases in one SQL Anywhere engine?

asked 29 Sep '10, 12:25

Breck%20Carter's gravatar image

Breck Carter
25.8k428592853
accept rate: 20%

... compared to what - starting both databases in their own db engine? IMHO, I don't think the latter has any advantage - but I haven't had such a scenario yet:)

(30 Sep '10, 12:12) Volker Barth
2

As Proxy table access has been improved with v12: Any chance to use a SA 12 engine to do the copying (without reloading the SA 11 dbs, of course)?

(30 Sep '10, 12:14) Volker Barth

@Volker(1): yes, as opposed to each database in its own engine.

(02 Oct '10, 11:22) Breck Carter

One major con to using a single server would probably be with respect to the number of workers available. Since you are using 11.0.1, the number of workers is fixed (i.e. there is no MPL support prior to SA 12). Now consider the following:

  1. you connect to the server and begin the insert-select
  2. the local connection gets a worker assigned to it to process the request and that worker will remain assigned to the connection for the duration of the request (i.e. for the entire insert-select)
  3. the remote data access layer now makes a remote connection and executes the select portion
  4. the remote connection will now get its own worker assigned whenever it is active (i.e. whenever it returns the next block of rows)
  5. if the server decides to execute the "select" portion in parallel, then it could utilize several workers to fetch the rows from the "remote"

So, for the duration of the insert-select, the "remote query" will utilize at least 2 workers and possibly many more than 2 workers. Having the remote database on a separate server will change things so that the local server only utilizes one worker for the entire insert and the remote server would utilize 1 or more workers for the select. If the local server is also servicing other connections, then pushing the select to a different server and leaving the additional workers available for the other "local" connections would be very beneficial. At the same time, if the select portion of the query can be executed in parallel, then having a remote server that is not as busy as the local server and having a full set of workers available to handle the select would also be very beneficial.

So, my opinion is that having two servers with each having its own set of workers (and cache etc.) is probably the better approach here; even if both servers are running on the same machine.


Breck says: A wonderful discussion! Now... what if there are no client-server connections to the local database at all; i.e., what if the entire process is being executed by a DatabaseStart event, and client-server connections are forbidden? (this is a "batch" process that is part of an embedded database upgrade process).

permanent link

answered 04 Oct '10, 14:57

Karim%20Khamis's gravatar image

Karim Khamis
5.6k53870
accept rate: 40%

edited 04 Oct '10, 18:56

Comment Text Removed

Ping! ( I've asked a further question in your answer :)

(04 Oct '10, 17:04) Breck Carter

The event would still use a temporary connection locally and a "real" connection would still get created to the remote. With respect to workers, the event connection would still tie up a worker and the remote connection would also tie up one or more workers depending on whether the select could be executed in parallel or not. So I am not sure much changes other than the fact that the connections would both get dropped once the event completes.

(04 Oct '10, 17:10) Karim Khamis

My biggest concern with respect to workers is the fact that because one worker is guaranteed to be tied up by the local connection for the entire duration of the insert-select, I am worried that the server may decide that it does not have enough workers to run the select in parallel when in other cases it would run the query in parallel.

(04 Oct '10, 17:12) Karim Khamis

By the way, when you say client connections are forbidden, do you mean to both databases? If so, then things won't work at all since as I said above, the connection to the remote database is and must be a "real" client connection. If that is one of the restrictions, then you have no choice but to go to two different servers. Unless ofcourse, I am missing the point entirely.

(04 Oct '10, 17:21) Karim Khamis

Yes, technically speaking the proxy connection from local to remote is a "client connection" and it is allowed. In fact, the mechanism for preventing client-server connections is [gosh, I can't remember]... but hey, the code works, I'm looking for performance improvements :)

(05 Oct '10, 04:49) Breck Carter

BTW, you have 1000+ reputation points now, you can edit your reply if you want. My edit of your reply is an experiment, to see what an in-place conversation looks like... until now, editing of other people's stuff has been rare on SQLA.

(05 Oct '10, 04:51) Breck Carter
More comments hidden
showing 4 of 6 show all flat view

Both DBs in the same folder will probably lead to IO as your main problem. Spreading IO accross multiple devices for this task will bring you probably more, than thinking about optimisations regarding the db engine. E.g. use for each db file its own (SAN, Raid, ...) And by the way, increase the DB file size of the target before starting the copy ;-)

permanent link

answered 30 Sep '10, 09:38

Martin's gravatar image

Martin
8.6k116151237
accept rate: 14%

Comment Text Removed

@Martin: This is an embedded application where the new database must end up in the same folder as the previous one. There is no "DBA control" over this process, and no knowledge of drives etc on the host computer; in particular, there is no guaranteed that a second physical drive even exists. Plus, I am guessing that the savings gained by using separate physical drives would be lost by the required final move of the new database back to the original folder (or the initial move of the old database to some other drive).

(02 Oct '10, 11:28) Breck Carter

@Martin: Is "separate physical drive" important on a good quality SAN or NAS? Perhaps this is a question for SQLA... :)

(02 Oct '10, 11:29) Breck Carter

@Martin: Thanks VERY much for the "increase DB file size" reminder. I preach that from the pulpit all the time, kinda embarrassing to forget about it :)

(02 Oct '10, 11:32) Breck Carter
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×242

question asked: 29 Sep '10, 12:25

question was seen: 690 times

last updated: 04 Oct '10, 18:56