Actually, set sTemp='' consumes 13.6% of exec time and if line 86.4% asked 01 Dec '09, 18:26 Zote |
I have rewrited using your tips. Thank you all!
answered 08 Dec '09, 12:19 Zote 1
Did you run some tests and get give us a hint of the performance improvements ?
(19 Jan '12, 07:03)
Thomas Dueme...
Replies hidden
...a fine suggestion, though possibly a bit late...:)
(19 Jan '12, 07:30)
Volker Barth
|
As usual, there are few "right" answers, only tradeoffs. The direction to move to optimize this function depends on both your requirements and knowledge of the domain of input strings the function is to process. For example, if a large proportion of the inputs are 100% numeric, and punctuation is not permitted, then a cheap way to optimize the execution is
But the isnumeric() function only tests if conversion to a numeric is possible, not an integer - instead you can use a REGEXP search on the string to look for anything that is not the digits 0-9. Another cheap test would be to attempt the conversion to a BIGINT if the input string is shorter than 19 characters (assuming a single-byte charset). Attempt the CAST, if it succeeds return val, otherwise process the string as you are. Another possibility is to not consider the string character-by-character. If non-numeric characters are rare, use a substring search to find the next "chunk" of numbers; this will result in fewer concatentation operations, which will mean fewer memory allocations. answered 01 Dec '09, 20:23 Glenn Paulley we have a similar function but with a test like: substring(val,iCount,1) like '[0123456789]' we find it pretty fast. Defintely worth looking at the profiling stats in real usage to see what takes the time. @Glenn: Besides the other obvious optimizations: Is "IN( '0','1','2','3','4','5','6','7','8','9')" cheaper than "LIKE [0-9]"? - I would think the latter is more readable at least. I don't have regex, its sa-9. @Zote: See the PATINDEX function for LIKE-wise tests outside of queries - it has been there since SA 5 (at least). |
You could try to find the next occurrence of an unwanted character with If nothing unwanted is found in the string you can return immediately. HTH answered 02 Dec '09, 11:45 Thomas Dueme... |
Another approach is to convert the string's characters into a set of row values and use a WHERE clause to eliminate the non-numeric ones, then put the result back together using the LIST aggregate function.
answered 08 Dec '09, 19:08 Bruce Hay |
I tried Bruce's variant with the LIST aggregate and I was surprised that it is actually faster (with my test data):
I timed the two variants using fetchtst on 11.0.1.2001 using some fabricated strings from systabcol:
The following file was used by fetchtst -ga:
Calling the function 63180 times took 39.2s for the onlyNumber() implementation and 3.2s for the BH_onlyNumber() variant. These results may not hold if your data is distributed in a different way or if you are not calling many times from a single query. answered 11 Mar '10, 18:05 Ivan T. Bowman |
I missed a couple days so didn't see this. I think the patindex is definitely the way to go. Although I don't like the loop at the end.
This might change based upon whether we expected lots of non numerics and whether non numerics might show up grouped together or not which is what I designed for above. If we're just looking at something like SSN 999-99-9999 then the set iCurPos should just be set iCurPos = iCurPos + iNonNum; Jon answered 10 Dec '09, 17:02 Jon Greisz |
Similar to Bruce's approach, this solution may prove useful or inspire solutions to other problems: CREATE OR REPLACE FUNCTION onlyNumber(IN val LONG VARCHAR) RETURNS LONG VARCHAR DETERMINISTIC BEGIN DECLARE ret LONG VARCHAR; SELECT LIST( IF isNumeric(SUBSTRING(val,row_num,1)) = 1 THEN SUBSTRING(val,row_num,1) //preserve digit ELSE '' //remove non-digit END IF ,'' ORDER BY row_num) INTO ret FROM sa_rowgenerator(1,LENGTH(val)); //1 "row" for each character RETURN ret; END; answered 18 Jan '12, 15:41 Tyson Lewis |
Are the number characters always together, or is there something between number chars? What kind of input will the procedure get? You might be able to use a regular expression to extract the numbers.