Using SA 11.0.1.2299, I want to strip some characters from a string - independent of their position.

An example: I want to get rid of several punction characters like hyphen, slash and the like in formatted numbers to get the "raw numbers" - say for telephone numbers.

Of course, I can do that with several calls of the REPLACE function like

DECLARE strNr varchar(30);
SET strNr = '+49-2345/67 89';
SELECT REPLACE(REPLACE(REPLACE(REPLACE(strNr, ' ', ''), '-', ''), '/', ''), '+', '');

But this is not very readable (though I could put it in a function).

Question: Are there better ways - e.g. by means of REGEXP_SUBSTR()?

asked 19 Jan '10, 20:24

Volker%20Barth's gravatar image

Volker Barth
29.6k293444650
accept rate: 32%

To clarify: I appreciate the hint to the "onlyNumbers" question but would still like an answer if regular expressions could be used to strip characters, too - I think with some kind of assertion.

(20 Jan '10, 08:44) Volker Barth

I think Jon is on the right track. Depending on your input, there could be some performance gains by filtering out all bad characters which are similar as it is found in the string by using the replace() function. Doing so limits the number of times through the loop to the number of distinct characters in your character set (as opposed to the length of your string).

The other optimization that I would do is to strip off the initial characters from the input string that are known to be valid so that they are not looked at again. Note that the database server has optimizations so that long strings are not actually copied when such operations are performed and hence this operation is not as expensive as you might think.

Here is a revised version which also generalizes the set of valid characters that are desired:

CREATE FUNCTION FilterChars( in @sInput long varchar,
                             in @valid  char(256) default '0-9' )
RETURNS long varchar
DETERMINISTIC
BEGIN
    declare @pattern   long varchar;
    declare @iNotValid integer;
    declare @sOutput   long varchar;

set @pattern   = '%[^' || @valid || ']%';
    set @sOutput   = '';
    set @iNotValid = patindex( @pattern, @sInput );

while @iNotValid > 0 loop
        // pick up everything to the left of the non numeric
        set @sOutput = @sOutput || left( @sInput, @iNotValid - 1 );

// strip off the invalid character and any similar chars in the string
        set @sInput = replace( substr( @sInput, @iNotValid+1 ),
                               substr( @sInput, @iNotValid, 1 ), '' );

// find the next invalid character
        set @iNotValid = patindex( @pattern, @sInput );
    end loop;

// return our string plus any remaining chars
    return @sOutput || @sInput;
END;

Example usage:

select FilterChars( 'abc123def' )  // returns '123'
select FilterChars( 'abc123def', 'a-z' )  // returns 'abcdef'
permanent link

answered 20 Jan '10, 18:51

Mark%20Culp's gravatar image

Mark Culp
22.6k9129265
accept rate: 40%

Is it only stripping out non-numerics from a number, if so then the solution we worked on here would probably work. Here is my version from that thread.

 ALTER FUNCTION onlyNumber(in val long varchar)     
    RETURNS long varchar     
    DETERMINISTIC  
 BEGIN         
    declare sOut long varchar;         
    declare iCurPos integer;         
    declare iNonNum integer;

set sOut    = '';     
    set iCurPos = 1;     
    set iNonNum = patindex('%[^0-9]%', val);

while iNonNum <> 0 loop        
       // pick up everything to the left of the non numeric        
       set sOut = sOut || substr(val, iCurPos, iNonNum - 1);

// check starting at next numeric        
       set iCurPos = iCurPos + iNonNum + patindex('%[0-9]%', substr(val, iNonNum + 1)) - 1;        
       set iNonNum = patindex('%[^0-9]%', substr(val, iCurPos));     
    end loop;

// return our string plus any remaining chars     
    return sOut || substr(val, iCurPos);   
    END;

This might change based upon whether we expected lots of non numerics and whether non numerics might show up grouped together or not which is what I designed for above. If we're just looking at something like SSN 999-99-9999 then the set iCurPos should just be set iCurPos = iCurPos + iNonNum;

permanent link

answered 19 Jan '10, 20:32

Jon%20Greisz's gravatar image

Jon Greisz
571979
accept rate: 0%

Thanks for the VERY quick response! I guess this would work though I will have to leave more than just digits. Guess I could add "[^0-9A-Z]" as pattern. - However, I would still get to know if regular expressions can't do that better...

(19 Jan '10, 20:39) Volker Barth

Hi, see this question!

permanent link

answered 20 Jan '10, 00:59

Zote's gravatar image

Zote
1.7k364050
accept rate: 43%

Thanks for the hint - it's the one Jon's response is refering to, too.

(20 Jan '10, 08:32) Volker Barth

I personally like the use of regex for string matching. But ASA is missing a full blown REGEXP_REPLACE function.

At least im not able to come up with a regular expression that would do it with REGEXP_SUBSTRING in one run.

But in a loop it is possbile to strip the string. I have prepared a iSQL script for copy an paste. I think it should be easy to build a Function out of it.

BEGIN 
    declare cSearch     varchar(1024);
    declare cPattern    varchar(512);
    declare cResult     varchar(1024);
    declare cFound      varchar(1024);
    declare nFound      integer;
    declare nOccurrence integer;

set cSearch = '+49-2345/67 89';
    set cPattern = '(?<=(^|[^[:digit:]+]))[[:digit:]+]+';
    set nOccurrence = 1;
    set cFound = REGEXP_SUBSTR( cSearch, cPattern, 1, nOccurrence );

message 'Loop ', nOccurrence, ' ', cFound type info to client ;
    while cFound is not null and length(cFound) > 0 loop
        set cResult = string( cResult, cFound );
        set nOccurrence = nOccurrence + 1;
        set cFound = REGEXP_SUBSTR( cSearch, cPattern, 1, nOccurrence );
        message 'in Loop ', nOccurrence, ' ', cFound type info to client ;
    end loop;

message 'Final ', cResult, '<!' type info to client;

END

Output of this script is

Loop 1 +49
Loop 2 2345
Loop 3 67
Loop 4 89
Loop 5 
Final +4923456789<!
Execution time: 0.016 seconds

I think as always it depends on the string and the pattern which solution is faster.

One remark you have to avoid that the normal pattern '[[:digit:]+]+' to match to fast. So you have to add a assertion that only makes sure you are at the begin of the string '^' or after a not matched character '[^[:digit:]+]'.

permanent link

answered 21 Jan '10, 12:36

Thomas%20Duemesnil's gravatar image

Thomas Dueme...
2.5k213460
accept rate: 15%

Thanks for that nice proposal - it underlines my assumption that REGEXP_SUBSTR() is generally suited for this task but needs to be looped in order to replace/strip more than one occurrence of the chars to filter out. - For the quite simple task discussed here, Mark's suggestion seems easier to understand. But for different tasks (like formatting/checking e-mail addresses), the RegExp search seems a lot more flexible.

(21 Jan '10, 13:49) Volker Barth
1

@TDuemesnil: please post a new "question" entitled "Product Suggestion: Please implement REGEXP_REPLACE". Include a business case for why you want it. FWIW here's a description of the Oracle version: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions130.htm#SQLRF06302

(21 Jan '10, 15:06) Breck Carter
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×18
×3

question asked: 19 Jan '10, 20:24

question was seen: 13,996 times

last updated: 21 Jan '10, 12:36