The work table is this:
The ultimate query (simplified into performing COUNTs instead of listing the fields) is this:
The view is defined as such:
If I change the first query to TOP xxx I get these timings: 20,000 = 1.81secs, ci_count=88,494 30,000 = 2.67secs, ci_count=124,607 31,000 = 2.61secs, ci_count=127,804 32,000 = aborted after 5 minutes 31,800 = 2.75secs, ci_count=130,126 31,825 = aborted after 5 minutesI had to run the complete job yesterday for the end-users report (278,000 records in #latest_account) - here is my log: 18/08/2014 09:02:20 !Executing: SELECT client_ref AS account_ref... 18/08/2014 09:02:42 !277,896 rows affected. 18/08/2014 09:02:42 !Executing: CREATE INDEX ix ON #latest_account (CLIENT_REF, ACCOUNT_REF) 18/08/2014 09:02:43 !277,896 rows affected. 18/08/2014 09:02:43 !Executing: DROP TABLE IF EXISTS #report 18/08/2014 09:02:43 !-1 rows affected. 18/08/2014 09:02:43 !Executing: SELECT a.client_ref AS ... INTO #report 19/08/2014 03:45:26 !524,073 rows affected. Does anyone have any ideas why this is taking so long? |
(see the attached plans fast.saplan and slow.saplan) It looks like the optimizer has picked two different plans based on the estimated number of rows in #latest_account. When the estimated number of rows is 32,000 it puts #latest_account (l) higher in the tree, and does a full sequential scan on ci2... that's bad, because ci2 has 1141845 rows. When the estimated number of rows in #latest_account is 31,000 it pushed that table down to the bottom, does a sequential scan on that table (which is OK because it is small) and does an index scan on ci2... much better than a sequential scan. The optimizer picks plans dynamically, when the statement is executed, so the numbers of rows in different tables can have a big effect. Sometimes, it does the wrong thing. One BRUTE FORCE technique that sometimes works is to force the optimizer to always use an index that it uses in the fast plan. If you click on the ci2 box in the fast plan, you will see it says this... Index Scan Scan ci2 using index COLLECTIONS_ACCOUNT Try changing this: JOIN collections_item AS ci2 to this, to force that index to always be used: JOIN collections_item AS ci2 WITH ( FORCE INDEX ( COLLECTIONS_ACCOUNT ) ) That technique is dangerous, because you might be telling it to do a bad thing in other cases (like maybe when #latest_account has millions of rows). Other problems...
fast ---- DROP TABLE IF EXISTS #latest_account; SELECT TOP 31000 file_no AS account_ref INTO #latest_account FROM collections_account WHERE client_ref<>''; SELECT count(ci2.COLLECTIONS_ITEM_ID) as ci_count, count(vcig.ALLOCATED_GROUP_ID) as vcig_count FROM #latest_account AS l JOIN collections_item AS ci2 ON ci2.collections_account_id=l.account_ref JOIN v_collections_item_group AS vcig on vcig.collections_item_id = ci2.collections_item_id; slow ---- DROP TABLE IF EXISTS #latest_account; SELECT TOP 32000 file_no AS account_ref INTO #latest_account FROM collections_account WHERE client_ref<>''; SELECT count(ci2.COLLECTIONS_ITEM_ID) as ci_count, count(vcig.ALLOCATED_GROUP_ID) as vcig_count FROM #latest_account AS l JOIN collections_item AS ci2 ON ci2.collections_account_id=l.account_ref JOIN v_collections_item_group AS vcig on vcig.collections_item_id = ci2.collections_item_id FORCE doesn't work - it gives this error...
Could not execute statement. As I've never forced hints before, I have tried the WITH before and after the ON and even without an ON... Here is a bit more information about the tables involved:
1,141,845 rows with index (FOREIGN) COLLECTIONS_ACCOUNT ON collections_account_id
9,097,271 rows with index (FOREIGN) job ON procedure_code,job_no and INDEX i_source_item_id ON source_item_id latest_account will currently have 650k rows but will continue to grow I am using 2960 EBF which AFAIK is the last one for 11.01. I am trying to generate better plans (Details and node statistics) and will post then when it eventually comes back...
(26 Aug '14, 04:15)
da69plc
Replies hidden
When you tell about a syntax error, it may be of help if you show the exact statement that raises the error (and possibly those variants that you have tried). Given that, I guess Breck's sample has somewhat mixed up the FORCE INDEX vs. WITH hint clauses and should read:
or
(26 Aug '14, 04:39)
Volker Barth
Nope, there are several newer ones, up to 11.0.1.3158 (for Windows) - cf. Breck's EBF overview here.
(26 Aug '14, 04:42)
Volker Barth
Re Syntax error - sorry I ran out of characters! Didn't think to post a second comment - doh...but you were right - using FORCE INDEX executed the statement - but I'm still waiting for the result set (15 mins so far) - looks like I going to need to re-write it! Re EBFS - I have tried to get access to the latest patches for 11.0.1 to see if there are any that may help but the website is refusing me access even though I registered with service.sap.com ages ago... PS. I am also still waiting for "Detailed and node statistics to finish"... (4 hours so far)
(26 Aug '14, 08:53)
da69plc
I have found this helps: CREATE INDEX ix ON #latest_account (ACCOUNT_REF) without the need to FORCE. Maybe the optimizer floated the temp table back to the top since it had an index or it could join to it . Nevertheless, it has reduced the time taken down from 18 hours to ~14 mins and then on the second run, 3.5 mins!). I am interested in getting the seeing the patch change logs / updating our server to the latest one though. What do I need to do to do this?
(26 Aug '14, 10:12)
da69plc
Replies hidden
Hm, that shows we have not yet really discussed the contents of that one-column table: As you are simply selecting without grouping etc. given your original description (copied below), will the temporary table contain duplicate values? SELECT client_ref AS account_ref INTO #latest_account FROM accounts WHERE client_ref<>''; If so, it might be of help to just copy unique values, and possibly to pre-order them with the help of an ORDER BY in the SELECT, and you could then use a "real" PK definition instead of an index...
(26 Aug '14, 10:40)
Volker Barth
If you have problems to access the SAP Marketplace or the EBF page, I'd recommend to have a look at FAQs here on the tag "ebf" or to ask a separate question...
(26 Aug '14, 10:43)
Volker Barth
The one-column #latest_account is actually a 2 column table...I cut/simplified the query down to the bare minimum to illustrate the slowness problem - here's the proper query in production:
...so as you can see, I do use GROUP BY... I couldn't add a PRIMARY index to a TEMPORARY TABLE and I don't bother with the ORDER BY as AFAIK that is only used to display the end-result set and wouldn't be needed as later, I use it as the main FROM table with an ORDER BY (again I simplified the second part into a simple COUNT query to illustrate the problem).
(27 Aug '14, 03:57)
da69plc
Replies hidden
Ah, I see. Well, the question whether account_ref is unique is still open - if so, you might add an UNIQUE index instead of a normal one which will give the optimizer more clues. FWIW, you are right that ORDER BY is primarily used for display purposes but if you use SELECT ... INTO (or INSERT ... SELECT) it helps to insert new values in sorted order which may (or may not) lead to a clustered storage of these new rows - without having to use a further index to do so.
Of course you are correct, that is not possible for local temporary tables (including those created automatically by SELECT INTO #temp).
(27 Aug '14, 04:27)
Volker Barth
Comment Text Removed
> Breck's sample has somewhat mixed up the FORCE INDEX vs. WITH hint clauses I do that EVERY time... why isn't it in the list of characteristic errors? :) Plus, posting code without testing it is an open invitation to Murphy's Law... sigh.
(27 Aug '14, 08:33)
Breck Carter
Possibly as we all have learnt (particularly from Glenn, I bet) that query hints should better be avoided, if possible:
Two different syntactical variants of index hints help to assure you have to look after the exact syntax every time and will ask yourself if you reaaaaaally want/need to use them:)
(27 Aug '14, 08:49)
Volker Barth
Volker, thanks for your comments... with CREATE INDEX it takes 3m 25s to SELECT INTO the final work file with CREATE UNIQUE INDEX it takes 3m 23s but as (only I would know) the account_ref is unique I'll use a UNIQUE index... re: ...may or may not lead to a clustered storage...without having to use a further index... As it's not a definite yes, I'll stick with my CREATE INDEX [that takes 1 second] ;)
(29 Aug '14, 08:59)
da69plc
More comments hidden
|
When you add the TOP xxx clause, what ORDER BY do you add? (A TOP clause without an ORDER BY should at least issue a warning...)
Does the same variation in execution time appear when you enhance the TOP xxx clause with a START AT yyy (say, what happens when you use "TOP 1000 START AT 31000"?
Do you know whether the desired data is usually cached or not? - If you query data that is already cached vs. data that has to be read via I/O, that might explain different access plans... - have you looked at those?
Please show us the SELECT TOP query with the ORDER BY... if you don't have an ORDER BY, don't use TOP.
If you still experience this behavior with TOP and ORDER BY, capture the Graphical Plan With Statistics for the fast and slow queries and send the two .saplan files to me at my dot name at gmail dot com.
TOP part is completely irrelevant.
I reason I added the TOP clause was me trying to debug...I thought the problem was a run-away JOIN but I ended up finding a "magic" number at which the query would run (<31825 records) - the query I want to run is without a TOP and is for 524,073 records!
Re START AT/cached data:
As I can't even begin to understand why an extra 25 records would make a difference, I did try outputting TOP 100000 into latest_account and then DELETE TOP 65000 to see if it was a particular record/set of records but it still performed the same, me having to abort after 5 minutes. The whole .db is 13gb and the server has 16gb of RAM, running ANYWHERE 11.01 on Windows 2008
I will email the .saplan files as soon as I can Breck...
Thanks guys.
> TOP part is completely irrelevant.
Indeed... the query with the TOP is "the first query", and it's the second query that has the problem... when posting a complex question, it is always a good idea to write carefully and clearly to avoid confusion... pretend you're a professor speaking to dimwitted students :)