Using SA 11.0.1.2299, I want to strip some characters from a string - independent of their position. An example: I want to get rid of several punction characters like hyphen, slash and the like in formatted numbers to get the "raw numbers" - say for telephone numbers. Of course, I can do that with several calls of the REPLACE function like
But this is not very readable (though I could put it in a function). Question: Are there better ways - e.g. by means of REGEXP_SUBSTR()? |
I think Jon is on the right track. Depending on your input, there could be some performance gains by filtering out all bad characters which are similar as it is found in the string by using the replace() function. Doing so limits the number of times through the loop to the number of distinct characters in your character set (as opposed to the length of your string). The other optimization that I would do is to strip off the initial characters from the input string that are known to be valid so that they are not looked at again. Note that the database server has optimizations so that long strings are not actually copied when such operations are performed and hence this operation is not as expensive as you might think. Here is a revised version which also generalizes the set of valid characters that are desired:
Example usage:
|
Is it only stripping out non-numerics from a number, if so then the solution we worked on here would probably work. Here is my version from that thread.
This might change based upon whether we expected lots of non numerics and whether non numerics might show up grouped together or not which is what I designed for above. If we're just looking at something like SSN 999-99-9999 then the set iCurPos should just be set iCurPos = iCurPos + iNonNum; Thanks for the VERY quick response! I guess this would work though I will have to leave more than just digits. Guess I could add "[^0-9A-Z]" as pattern. - However, I would still get to know if regular expressions can't do that better... |
I personally like the use of regex for string matching. But ASA is missing a full blown REGEXP_REPLACE function. At least im not able to come up with a regular expression that would do it with REGEXP_SUBSTRING in one run. But in a loop it is possbile to strip the string. I have prepared a iSQL script for copy an paste. I think it should be easy to build a Function out of it.
Output of this script is
I think as always it depends on the string and the pattern which solution is faster. One remark you have to avoid that the normal pattern Thanks for that nice proposal - it underlines my assumption that REGEXP_SUBSTR() is generally suited for this task but needs to be looped in order to replace/strip more than one occurrence of the chars to filter out. - For the quite simple task discussed here, Mark's suggestion seems easier to understand. But for different tasks (like formatting/checking e-mail addresses), the RegExp search seems a lot more flexible. 1
@TDuemesnil: please post a new "question" entitled "Product Suggestion: Please implement REGEXP_REPLACE". Include a business case for why you want it. FWIW here's a description of the Oracle version: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions130.htm#SQLRF06302 |
To clarify: I appreciate the hint to the "onlyNumbers" question but would still like an answer if regular expressions could be used to strip characters, too - I think with some kind of assertion.