A large number of rows are to be copied from one SQL Anywhere 11.0.1.2276 database to another database in the same folder on the same computer via simple INSERT t SELECT * FROM proxy_t statements. Millions of rows, gigabytes of data, hours of processing time using two SQL Anywhere engines. What are the performance pros and cons of starting both databases in one SQL Anywhere engine? asked 29 Sep '10, 12:25 Breck Carter |
One major con to using a single server would probably be with respect to the number of workers available. Since you are using 11.0.1, the number of workers is fixed (i.e. there is no MPL support prior to SA 12). Now consider the following:
So, for the duration of the insert-select, the "remote query" will utilize at least 2 workers and possibly many more than 2 workers. Having the remote database on a separate server will change things so that the local server only utilizes one worker for the entire insert and the remote server would utilize 1 or more workers for the select. If the local server is also servicing other connections, then pushing the select to a different server and leaving the additional workers available for the other "local" connections would be very beneficial. At the same time, if the select portion of the query can be executed in parallel, then having a remote server that is not as busy as the local server and having a full set of workers available to handle the select would also be very beneficial. So, my opinion is that having two servers with each having its own set of workers (and cache etc.) is probably the better approach here; even if both servers are running on the same machine. Breck says: A wonderful discussion! Now... what if there are no client-server connections to the local database at all; i.e., what if the entire process is being executed by a DatabaseStart event, and client-server connections are forbidden? (this is a "batch" process that is part of an embedded database upgrade process). answered 04 Oct '10, 14:57 Karim Khamis Comment Text Removed
Ping! ( I've asked a further question in your answer :) The event would still use a temporary connection locally and a "real" connection would still get created to the remote. With respect to workers, the event connection would still tie up a worker and the remote connection would also tie up one or more workers depending on whether the select could be executed in parallel or not. So I am not sure much changes other than the fact that the connections would both get dropped once the event completes. My biggest concern with respect to workers is the fact that because one worker is guaranteed to be tied up by the local connection for the entire duration of the insert-select, I am worried that the server may decide that it does not have enough workers to run the select in parallel when in other cases it would run the query in parallel. By the way, when you say client connections are forbidden, do you mean to both databases? If so, then things won't work at all since as I said above, the connection to the remote database is and must be a "real" client connection. If that is one of the restrictions, then you have no choice but to go to two different servers. Unless ofcourse, I am missing the point entirely. Yes, technically speaking the proxy connection from local to remote is a "client connection" and it is allowed. In fact, the mechanism for preventing client-server connections is [gosh, I can't remember]... but hey, the code works, I'm looking for performance improvements :) BTW, you have 1000+ reputation points now, you can edit your reply if you want. My edit of your reply is an experiment, to see what an in-place conversation looks like... until now, editing of other people's stuff has been rare on SQLA. More comments hidden
|
Both DBs in the same folder will probably lead to IO as your main problem. Spreading IO accross multiple devices for this task will bring you probably more, than thinking about optimisations regarding the db engine. E.g. use for each db file its own (SAN, Raid, ...) And by the way, increase the DB file size of the target before starting the copy ;-) answered 30 Sep '10, 09:38 Martin Comment Text Removed
@Martin: This is an embedded application where the new database must end up in the same folder as the previous one. There is no "DBA control" over this process, and no knowledge of drives etc on the host computer; in particular, there is no guaranteed that a second physical drive even exists. Plus, I am guessing that the savings gained by using separate physical drives would be lost by the required final move of the new database back to the original folder (or the initial move of the old database to some other drive). @Martin: Is "separate physical drive" important on a good quality SAN or NAS? Perhaps this is a question for SQLA... :) @Martin: Thanks VERY much for the "increase DB file size" reminder. I preach that from the pulpit all the time, kinda embarrassing to forget about it :) |
... compared to what - starting both databases in their own db engine? IMHO, I don't think the latter has any advantage - but I haven't had such a scenario yet:)
As Proxy table access has been improved with v12: Any chance to use a SA 12 engine to do the copying (without reloading the SA 11 dbs, of course)?
@Volker(1): yes, as opposed to each database in its own engine.