The work table is this:

SELECT
  client_ref AS account_ref
INTO #latest_account
FROM accounts
WHERE client_ref<>'';

The ultimate query (simplified into performing COUNTs instead of listing the fields) is this:

SELECT
  count(ci2.COLLECTIONS_ITEM_ID) as ci_count,
  count(vcig.ALLOCATED_GROUP_ID) as vcig_count
FROM #latest_account AS l
JOIN collections_item AS ci2 ON ci2.collections_account_id=l.account_ref
JOIN v_collections_item_group AS vcig on vcig.collections_item_id = ci2.collections_item_id

The view is defined as such:

SELECT
  CI.COLLECTIONS_ITEM_ID,
  SLAG.AUTOKEY as ALLOCATED_GROUP_ID,
  SLAG.COMPONENT_BALANCE as DEBT_BALANCE,
  SLAG.INTEREST_BALANCE as INTEREST_BALANCE,
  SLAG.GROUP_BALANCE as TOTAL_BALANCE
FROM COLLECTIONS_ITEM as CI
JOIN DBA.EVENT_TRANS as ET on ET.PROCEDURE_CODE = 'DR' and ET.JOB_NO = CI.COLLECTIONS_ACCOUNT_ID and ET.SOURCE_ITEM_ID = CI.COLLECTIONS_ITEM_ID
JOIN SL_ALLOCATED_GROUP as SLAG on SLAG.ORIGINAL_TRANS_ID = ET.AUTOKEY
WHERE ET.TRANS_TYPE = 'C'

If I change the first query to TOP xxx I get these timings:

 20,000 = 1.81secs, ci_count=88,494
 30,000 = 2.67secs, ci_count=124,607
 31,000 = 2.61secs, ci_count=127,804
 32,000 = aborted after 5 minutes
 31,800 = 2.75secs, ci_count=130,126
 31,825 = aborted after 5 minutes
I had to run the complete job yesterday for the end-users report (278,000 records in #latest_account) - here is my log: 18/08/2014 09:02:20 !Executing: SELECT client_ref AS account_ref...
18/08/2014 09:02:42 !277,896 rows affected.
18/08/2014 09:02:42 !Executing: CREATE INDEX ix ON #latest_account (CLIENT_REF, ACCOUNT_REF)
18/08/2014 09:02:43 !277,896 rows affected.
18/08/2014 09:02:43 !Executing: DROP TABLE IF EXISTS #report
18/08/2014 09:02:43 !-1 rows affected.
18/08/2014 09:02:43 !Executing: SELECT a.client_ref AS ... INTO #report
19/08/2014 03:45:26 !524,073 rows affected.

Does anyone have any ideas why this is taking so long?

asked 19 Aug '14, 08:14

da69plc's gravatar image

da69plc
36347
accept rate: 0%

edited 19 Aug '14, 08:57

Mark%20Culp's gravatar image

Mark Culp
22.7k9129266

If I change the first query to TOP xxx

When you add the TOP xxx clause, what ORDER BY do you add? (A TOP clause without an ORDER BY should at least issue a warning...)

Does the same variation in execution time appear when you enhance the TOP xxx clause with a START AT yyy (say, what happens when you use "TOP 1000 START AT 31000"?

Do you know whether the desired data is usually cached or not? - If you query data that is already cached vs. data that has to be read via I/O, that might explain different access plans... - have you looked at those?

(19 Aug '14, 08:28) Volker Barth

Please show us the SELECT TOP query with the ORDER BY... if you don't have an ORDER BY, don't use TOP.

If you still experience this behavior with TOP and ORDER BY, capture the Graphical Plan With Statistics for the fast and slow queries and send the two .saplan files to me at my dot name at gmail dot com.

(19 Aug '14, 12:17) Breck Carter

TOP part is completely irrelevant.
I reason I added the TOP clause was me trying to debug...I thought the problem was a run-away JOIN but I ended up finding a "magic" number at which the query would run (<31825 records) - the query I want to run is without a TOP and is for 524,073 records!

Re START AT/cached data:
As I can't even begin to understand why an extra 25 records would make a difference, I did try outputting TOP 100000 into latest_account and then DELETE TOP 65000 to see if it was a particular record/set of records but it still performed the same, me having to abort after 5 minutes. The whole .db is 13gb and the server has 16gb of RAM, running ANYWHERE 11.01 on Windows 2008

I will email the .saplan files as soon as I can Breck...

Thanks guys.

(23 Aug '14, 17:45) da69plc
Replies hidden

> TOP part is completely irrelevant.

Indeed... the query with the TOP is "the first query", and it's the second query that has the problem... when posting a complex question, it is always a good idea to write carefully and clearly to avoid confusion... pretend you're a professor speaking to dimwitted students :)

(23 Aug '14, 21:25) Breck Carter

(see the attached plans fast.saplan and slow.saplan)

It looks like the optimizer has picked two different plans based on the estimated number of rows in #latest_account. When the estimated number of rows is 32,000 it puts #latest_account (l) higher in the tree, and does a full sequential scan on ci2... that's bad, because ci2 has 1141845 rows.

When the estimated number of rows in #latest_account is 31,000 it pushed that table down to the bottom, does a sequential scan on that table (which is OK because it is small) and does an index scan on ci2... much better than a sequential scan.

The optimizer picks plans dynamically, when the statement is executed, so the numbers of rows in different tables can have a big effect. Sometimes, it does the wrong thing.

One BRUTE FORCE technique that sometimes works is to force the optimizer to always use an index that it uses in the fast plan.

If you click on the ci2 box in the fast plan, you will see it says this...

Index Scan

Scan ci2 using index COLLECTIONS_ACCOUNT

Try changing this:

JOIN collections_item AS ci2

to this, to force that index to always be used:

JOIN collections_item AS ci2 WITH ( FORCE INDEX ( COLLECTIONS_ACCOUNT ) )

That technique is dangerous, because you might be telling it to do a bad thing in other cases (like maybe when #latest_account has millions of rows).

Other problems...

  1. Your plans show only the estimates, not the actual results. Try clicking on the Statistics level dropdown to pick Detailed and node statistics in the future.

  2. Your query seems to be using a view. It's hard to understand the plan without seeing the view. ...you did show us the view earlier :)

  3. It is also hard to understand the plan without seeing the tables. There might be a completely different way to write the query, to make it always run fast, maybe even faster than the fast query.

  4. A later EBF of 11.0.1 might help, or better yet, a newer version of SQL Anywhere... if you upgrade, be sure try taking out the FORCE INDEX clause, it may be making things worse.

fast
----
DROP TABLE IF EXISTS #latest_account;
SELECT TOP 31000
  file_no AS account_ref
INTO #latest_account
FROM collections_account
WHERE client_ref<>'';

SELECT
  count(ci2.COLLECTIONS_ITEM_ID) as ci_count,
  count(vcig.ALLOCATED_GROUP_ID) as vcig_count
FROM #latest_account AS l
JOIN collections_item AS ci2 ON ci2.collections_account_id=l.account_ref
JOIN v_collections_item_group AS vcig on vcig.collections_item_id = ci2.collections_item_id;


slow
----
DROP TABLE IF EXISTS #latest_account;
SELECT TOP 32000
  file_no AS account_ref
INTO #latest_account
FROM collections_account
WHERE client_ref<>'';

SELECT
  count(ci2.COLLECTIONS_ITEM_ID) as ci_count,
  count(vcig.ALLOCATED_GROUP_ID) as vcig_count
FROM #latest_account AS l
JOIN collections_item AS ci2 ON ci2.collections_account_id=l.account_ref
JOIN v_collections_item_group AS vcig on vcig.collections_item_id = ci2.collections_item_id
permanent link

answered 23 Aug '14, 21:15

Breck%20Carter's gravatar image

Breck Carter
27.4k424585836
accept rate: 21%

edited 23 Aug '14, 21:20

FORCE doesn't work - it gives this error... Could not execute statement.
Syntax error near 'FORCE' on line 5
SQLCODE=-131, ODBC 3 State="42000"
Line 1, column 1

As I've never forced hints before, I have tried the WITH before and after the ON and even without an ON...

Here is a bit more information about the tables involved:

DESCRIBE collections_item:
Column,Type,Nullable,Primary Key
'COLLECTIONS_ITEM_ID','integer',0,1
'COLLECTIONS_ACCOUNT_ID','integer',0,0
'INVOICE_REF','varchar(30)',1,0
'SOURCE_REF','varchar(30)',1,0
'SOURCE_DATE','date',1,0
'SOURCE_TIME','time',1,0
'INVOICE_DATE','date',0,0
'ORDER_REF','varchar(30)',1,0
'NARRATIVE','varchar(50)',1,0
'NET_VALUE','numeric(19,4)',1,0
'VAT_RATE','numeric(18,2)',1,0
'VAT_VALUE','numeric(19,4)',1,0
'DELAY_DAYS','integer',1,0
'EFFECTIVE_DATE','date',1,0
'VAT_CODE','varchar(5)',1,0
'GROSS_VALUE','numeric(19,4)',0,0
'TIMESTAMP','timestamp',1,0
'DISPUTED','char(1)',1,0
'DISPUTE_CODE','varchar(10)',1,0
'STATUS','varchar(12)',1,0
'REMOVE_BALANCE','char(1)',1,0
'LITIGATION_SWITCH_ADDED','char(1)',1,0
'REMOVE_DEBT','numeric(19,4)',1,0
'REMOVE_INTEREST','numeric(19,4)',1,0
'DUE_DATE','date',1,0
'DISPUTE_DATE','date',1,0
'DISPUTE_NOTE','varchar(150)',1,0
'EXPECTED_CLIENT_BALANCE','numeric(19,4)',1,0
'EXPECTED_CLIENT_BALANCE_DATE','date',1,0

1,141,845 rows with index (FOREIGN) COLLECTIONS_ACCOUNT ON collections_account_id

DESCRIBE event_trans;
Column,Type,Nullable,Primary Key
'AUTOKEY','integer',0,1
'EVENT_HISTORY','integer',0,0
'PROCEDURE_CODE','varchar(3)',0,0
'JOB_NO','integer',0,0
'REFERENCE','varchar(25)',1,0
'NARRATIVE','varchar(45)',0,0
'DEBIT','numeric(18,2)',1,0
'CREDIT','numeric(18,2)',1,0
'TRANS_TYPE','varchar(3)',0,0
'NEW_BALANCE','numeric(19,4)',1,0
'ALLOC_METHOD_USED','varchar(3)',1,0
'PAYMENT_TYPE','varchar(4)',1,0
'SOURCE_CURRENCY','varchar(3)',1,0
'SOURCE_DEBIT','numeric(19,4)',1,0
'SOURCE_CREDIT','numeric(19,4)',1,0
'TIMESTAMP','timestamp',1,0
'PRINCIPAL_BALANCE','numeric(19,4)',1,0
'FEES_BALANCE','numeric(19,4)',1,0
'COSTS_BALANCE','numeric(19,4)',1,0
'INTEREST_BALANCE','numeric(19,4)',1,0
'EXCHANGE_RATE','numeric(18,6)',1,0
'SORT_CODE','varchar(8)',1,0
'BANK_ACCOUNT','varchar(8)',1,0
'ALLOCATION_DONE','char(1)',1,0
'IMPORT_EXPORT_STATUS','varchar(1)',1,0
'PAYER_NAME','varchar(30)',1,0
'PAYMENT_IN_FULL','char(1)',1,0
'EFFECTIVE_DATE','date',1,0
'SOURCE_ITEM_ID','integer',1,0
'APPLY_TO','integer',1,0
'INVOICE_DATE','date',1,0
'CREDIT_TO_TRANS_TYPE','varchar(3)',1,0
'DISPUTED','char(1)',1,0
'CREATE_NEW_GROUP','varchar(1)',1,0
'REMIT_DONE','char(1)',1,0
'REMIT_COMMISSIONABLE','numeric(19,4)',1,0
'REMIT_COMMISSION_RATE','numeric(18,2)',1,0
'REMIT_DATE','date',1,0
'REMIT_COMMISSION','numeric(19,4)',1,0
'REMIT_AMOUNT','numeric(19,4)',1,0
'REMIT_RUN_ID','integer',1,0
'REMIT_VAT_RATE','numeric(18,2)',1,0
'REMIT_VAT_ON_COMMISSION','numeric(19,4)',1,0
'REMIT_AMOUNT_GROSS','numeric(19,4)',1,0
'REMIT_AMOUNT_NET','numeric(19,4)',1,0
'REMIT_RESIDUAL_INVOICE','numeric(19,4)',1,0
'MANUALLY_ALLOCATED','char(1)',1,0
'SKIP_BILL_CHARGES','char(1)',1,0
'INPUT_TRANS_ID','integer',1,0
'INCLUDES_VAT_RATE','numeric(18,2)',1,0
'INCLUDES_VAT_AMOUNT','numeric(19,4)',1,0
'INTEREST_RATE_CODE','varchar(5)',1,0
'COLLECTIONS_ONLY_FLAG','char(1)',1,0
'LAST_COLLECTOR','varchar(3)',1,0
'LAST_COLLECTOR_ACTION_DATE','date',1,0

9,097,271 rows with index (FOREIGN) job ON procedure_code,job_no and INDEX i_source_item_id ON source_item_id

latest_account will currently have 650k rows but will continue to grow I am using 2960 EBF which AFAIK is the last one for 11.01.

I am trying to generate better plans (Details and node statistics) and will post then when it eventually comes back...

(26 Aug '14, 04:15) da69plc
Replies hidden

When you tell about a syntax error, it may be of help if you show the exact statement that raises the error (and possibly those variants that you have tried).

Given that, I guess Breck's sample has somewhat mixed up the FORCE INDEX vs. WITH hint clauses and should read:

JOIN collections_item AS ci2 FORCE INDEX (COLLECTIONS_ACCOUNT)

or

JOIN collections_item AS ci2 WITH (INDEX (COLLECTIONS_ACCOUNT))
(26 Aug '14, 04:39) Volker Barth

I am using 2960 EBF which AFAIK is the last one for 11.01.

Nope, there are several newer ones, up to 11.0.1.3158 (for Windows) - cf. Breck's EBF overview here.

(26 Aug '14, 04:42) Volker Barth

Re Syntax error - sorry I ran out of characters! Didn't think to post a second comment - doh...but you were right - using FORCE INDEX executed the statement - but I'm still waiting for the result set (15 mins so far) - looks like I going to need to re-write it!

Re EBFS - I have tried to get access to the latest patches for 11.0.1 to see if there are any that may help but the website is refusing me access even though I registered with service.sap.com ages ago...

PS. I am also still waiting for "Detailed and node statistics to finish"... (4 hours so far)

(26 Aug '14, 08:53) da69plc

I have found this helps: CREATE INDEX ix ON #latest_account (ACCOUNT_REF)

without the need to FORCE. Maybe the optimizer floated the temp table back to the top since it had an index or it could join to it . Nevertheless, it has reduced the time taken down from 18 hours to ~14 mins and then on the second run, 3.5 mins!).

I am interested in getting the seeing the patch change logs / updating our server to the latest one though. What do I need to do to do this?

(26 Aug '14, 10:12) da69plc
Replies hidden

CREATE INDEX ix ON #latest_account (ACCOUNT_REF)

Hm, that shows we have not yet really discussed the contents of that one-column table: As you are simply selecting without grouping etc. given your original description (copied below), will the temporary table contain duplicate values?

SELECT
  client_ref AS account_ref
INTO #latest_account
FROM accounts
WHERE client_ref<>'';

If so, it might be of help to just copy unique values, and possibly to pre-order them with the help of an ORDER BY in the SELECT, and you could then use a "real" PK definition instead of an index...

(26 Aug '14, 10:40) Volker Barth

I am interested in getting...

If you have problems to access the SAP Marketplace or the EBF page, I'd recommend to have a look at FAQs here on the tag "ebf" or to ask a separate question...

(26 Aug '14, 10:43) Volker Barth

The one-column #latest_account is actually a 2 column table...I cut/simplified the query down to the bare minimum to illustrate the slowness problem - here's the proper query in production:

-- FOR A CLIENT (WHO HAS MULTIPLE CLIENT CODES), PULL OUT THE 'LATEST' ACCOUNT BY THEIR REF
SELECT
  o.CLIENT_REF,
  -- PREFER OPEN ACCOUNTS OVER CLOSED OR LAST CLOSED ONE
  (SELECT TOP 1 OUR_REF FROM collections_account WHERE CLIENT_REF=o.CLIENT_REF AND COLLECTIONS_CLIENT_CODE IN ('2412','2455','2456','2426','2427','2428','2441','2442','2443','2451','2448') ORDER BY IF DATE_CLOSED IS NULL THEN 0 ELSE 1 ENDIF, FILE_NO DESC) AS ACCOUNT_REF
INTO #latest_account
FROM collections_account AS o
WHERE ISNULL(o.CLIENT_REF,'')<>'' AND o.COLLECTIONS_CLIENT_CODE IN ('2412','2455','2456','2426','2427','2428','2441','2442','2443','2451','2448') AND o.STATUS<>'CL6'
GROUP BY CLIENT_REF

...so as you can see, I do use GROUP BY... I couldn't add a PRIMARY index to a TEMPORARY TABLE and I don't bother with the ORDER BY as AFAIK that is only used to display the end-result set and wouldn't be needed as later, I use it as the main FROM table with an ORDER BY (again I simplified the second part into a simple COUNT query to illustrate the problem).

(27 Aug '14, 03:57) da69plc
Replies hidden

Ah, I see. Well, the question whether account_ref is unique is still open - if so, you might add an UNIQUE index instead of a normal one which will give the optimizer more clues.

FWIW, you are right that ORDER BY is primarily used for display purposes but if you use SELECT ... INTO (or INSERT ... SELECT) it helps to insert new values in sorted order which may (or may not) lead to a clustered storage of these new rows - without having to use a further index to do so.

I couldn't add a PRIMARY index to a TEMPORARY TABLE

Of course you are correct, that is not possible for local temporary tables (including those created automatically by SELECT INTO #temp).

(27 Aug '14, 04:27) Volker Barth
Comment Text Removed

> Breck's sample has somewhat mixed up the FORCE INDEX vs. WITH hint clauses

I do that EVERY time... why isn't it in the list of characteristic errors? :)

Plus, posting code without testing it is an open invitation to Murphy's Law... sigh.

(27 Aug '14, 08:33) Breck Carter

why isn't it in the list of characteristic errors?

Possibly as we all have learnt (particularly from Glenn, I bet) that query hints should better be avoided, if possible:

Two different syntactical variants of index hints help to assure you have to look after the exact syntax every time and will ask yourself if you reaaaaaally want/need to use them:)

(27 Aug '14, 08:49) Volker Barth

Volker, thanks for your comments...

with CREATE INDEX it takes 3m 25s to SELECT INTO the final work file with CREATE UNIQUE INDEX it takes 3m 23s but as (only I would know) the account_ref is unique I'll use a UNIQUE index...

re: ...may or may not lead to a clustered storage...without having to use a further index... As it's not a definite yes, I'll stick with my CREATE INDEX [that takes 1 second] ;)

(29 Aug '14, 08:59) da69plc
More comments hidden
showing 5 of 12 show all flat view
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×241
×13

question asked: 19 Aug '14, 08:14

question was seen: 5,274 times

last updated: 29 Aug '14, 08:59