The forum will experience an outage sometime between February 10 at 7:00pm EST and February 12 at 11:59 EST for installation of security updates. The actual time and duration of the outage are unknown but attempts will be made to minimize the downtime. We apologize for any inconvenience.
create function bludata.onlyNumber(in val long varchar)
returns long varchar
begin
  declare sTemp long varchar;
  declare iCount integer;
  set iCount=1;
  set sTemp='';
  while iCount <= length(val) loop
    if substring(val,iCount,1) in( '0','1','2','3','4','5','6','7','8','9') then
      set sTemp=sTemp+substring(val,iCount,1)
    end if;
    set iCount=iCount+1
  end loop;
  return sTemp
end

Actually, set sTemp='' consumes 13.6% of exec time and if line 86.4%
I need to improve because this stored function is executed more than 10k times/minute.

asked 01 Dec '09, 18:26

Zote's gravatar image

Zote
1.7k364050
accept rate: 43%

Are the number characters always together, or is there something between number chars? What kind of input will the procedure get? You might be able to use a regular expression to extract the numbers.

(01 Dec '09, 20:10) Ben S

I have rewrited using your tips. Thank you all!

ALTER FUNCTION "bludata"."onlyNumber"(in val long varchar)
RETURNS long varchar
DETERMINISTIC
BEGIN
    declare sTemp long varchar;
    declare iCount integer;
    declare currentSize integer;
    declare currentChar char(1);

if patindex('%[^0-9]%', val) = 0 then
        return val;
    end if;

set iCount = 1;
    set sTemp = '';
    set currentSize = length(val);

while iCount <= currentSize loop
        set currentChar = substring(val, iCount, 1);
        if isNumeric(currentChar) = 1 then
            set sTemp = sTemp + currentChar;
        end if;

set iCount = iCount + 1;
    end loop;

return sTemp;
END
permanent link

answered 08 Dec '09, 12:19

Zote's gravatar image

Zote
1.7k364050
accept rate: 43%

1

Did you run some tests and get give us a hint of the performance improvements ?

(19 Jan '12, 07:03) Thomas Dueme...
Replies hidden

...a fine suggestion, though possibly a bit late...:)

(19 Jan '12, 07:30) Volker Barth

As usual, there are few "right" answers, only tradeoffs.

The direction to move to optimize this function depends on both your requirements and knowledge of the domain of input strings the function is to process.

For example, if a large proportion of the inputs are 100% numeric, and punctuation is not permitted, then a cheap way to optimize the execution is

create function bludata.onlyNumber(in val long varchar)
returns long varchar
begin
  declare sTemp long varchar;
  declare iCount integer;
  if isnumeric(val) = 1 then return val endif;
  set iCount=1;
  set sTemp='';
  while iCount <= length(val) loop
    if substring(val,iCount,1) in( '0','1','2','3','4','5','6','7','8','9') then
      set sTemp=sTemp+substring(val,iCount,1)
    end if;
    set iCount=iCount+1
  end loop;
  return sTemp
end

But the isnumeric() function only tests if conversion to a numeric is possible, not an integer - instead you can use a REGEXP search on the string to look for anything that is not the digits 0-9.

Another cheap test would be to attempt the conversion to a BIGINT if the input string is shorter than 19 characters (assuming a single-byte charset). Attempt the CAST, if it succeeds return val, otherwise process the string as you are.

Another possibility is to not consider the string character-by-character. If non-numeric characters are rare, use a substring search to find the next "chunk" of numbers; this will result in fewer concatentation operations, which will mean fewer memory allocations.

permanent link

answered 01 Dec '09, 20:23

Glenn%20Paulley's gravatar image

Glenn Paulley
10.7k568104
accept rate: 43%

we have a similar function but with a test like:

substring(val,iCount,1) like '[0123456789]'

we find it pretty fast. Defintely worth looking at the profiling stats in real usage to see what takes the time.

(01 Dec '09, 22:27) Justin Willey

@Glenn: Besides the other obvious optimizations: Is "IN( '0','1','2','3','4','5','6','7','8','9')" cheaper than "LIKE [0-9]"? - I would think the latter is more readable at least.

(02 Dec '09, 08:57) Volker Barth

I don't have regex, its sa-9.

(02 Dec '09, 10:43) Zote

@Zote: See the PATINDEX function for LIKE-wise tests outside of queries - it has been there since SA 5 (at least).

(02 Dec '09, 12:37) Volker Barth

You could try to find the next occurrence of an unwanted character with PATINDEX( '%[^0-9]%', val) instead of examin each charater one after another.

If nothing unwanted is found in the string you can return immediately.

HTH

permanent link

answered 02 Dec '09, 11:45

Thomas%20Duemesnil's gravatar image

Thomas Dueme...
2.4k203357
accept rate: 16%

Nice tip! Thank you

(08 Dec '09, 12:14) Zote

Another approach is to convert the string's characters into a set of row values and use a WHERE clause to eliminate the non-numeric ones, then put the result back together using the LIST aggregate function.

create function bludata.onlyNumber( val long varchar )
returns long varchar
begin
    declare @result long varchar;
    if isnumeric( val ) = 1 and patindex('%[^0-9]%',val) = 0 then
        return val;
    end if;
    select list( ch,'' order by row_num ) into @result
    from (
        select substr(val,row_num,1) as ch,row_num
        from sa_rowgenerator(1,length(val)) 
        where ch between '0' and '9') as T;
    return @result;
end
permanent link

answered 08 Dec '09, 19:08

Bruce%20Hay's gravatar image

Bruce Hay
2.6k1510
accept rate: 48%

I tried Bruce's variant with the LIST aggregate and I was surprised that it is actually faster (with my test data):

create function BH_onlyNumber(in val long varchar)
returns long varchar
begin
    declare @result long varchar;
    if isnumeric( val ) = 1 and patindex('%[^0-9]%',val) = 0 then
        return val;
    end if;
    select list( ch,'' order by row_num ) into @result
    from (
        select substr(val,row_num,1) as ch,row_num
        from sa_rowgenerator(1,length(val)) 
        where ch between '0' and '9') as T;
    return @result;
end

I timed the two variants using fetchtst on 11.0.1.2001 using some fabricated strings from systabcol:

drop table if exists  TestStrings;
create table TestStrings( s long varchar );
insert into TestStrings
select column_name s from systabcol
union all select column_name||object_id from systabcol
union all select object_id||column_name from systabcol;
commit;

The following file was used by fetchtst -ga:

select max( OnlyNumber( string(s,row_num) ) )
from TestStrings, sa_rowgenerator(1,10)
go
select max( BH_OnlyNumber( string(s,row_num) ) )
from TestStrings, sa_rowgenerator(1,10)
go

Calling the function 63180 times took 39.2s for the onlyNumber() implementation and 3.2s for the BH_onlyNumber() variant. These results may not hold if your data is distributed in a different way or if you are not calling many times from a single query.

permanent link

answered 11 Mar '10, 18:05

Ivan%20T.%20Bowman's gravatar image

Ivan T. Bowman
2.8k22732
accept rate: 39%

I missed a couple days so didn't see this. I think the patindex is definitely the way to go. Although I don't like the loop at the end.

  ALTER FUNCTION onlyNumber(in val long varchar)
     RETURNS long varchar
     DETERMINISTIC
  BEGIN    
     declare sOut long varchar;    
     declare iCurPos integer;    
     declare iNonNum integer;

set sOut    = '';
     set iCurPos = 1;
     set iNonNum = patindex('%[^0-9]%', val);

while iNonNum <> 0 loop
        // pick up everything to the left of the non numeric
        set sOut = sOut || substr(val, iCurPos, iNonNum - 1);

// check starting at next numeric
        set iCurPos = iCurPos + iNonNum + patindex('%[0-9]%', substr(val, iNonNum + 1)) - 1;
        set iNonNum = patindex('%[^0-9]%', substr(val, iCurPos));

end loop;

// return our string plus any remaining chars
     return sOut || substr(val, iCurPos);

END

This might change based upon whether we expected lots of non numerics and whether non numerics might show up grouped together or not which is what I designed for above. If we're just looking at something like SSN 999-99-9999 then the set iCurPos should just be set iCurPos = iCurPos + iNonNum;

Jon

permanent link

answered 10 Dec '09, 17:02

Jon%20Greisz's gravatar image

Jon Greisz
571979
accept rate: 0%

Similar to Bruce's approach, this solution may prove useful or inspire solutions to other problems:

CREATE OR REPLACE FUNCTION onlyNumber(IN val LONG VARCHAR)
RETURNS LONG VARCHAR
DETERMINISTIC
BEGIN
    DECLARE ret LONG VARCHAR;
    SELECT LIST(
        IF isNumeric(SUBSTRING(val,row_num,1)) = 1 THEN 
            SUBSTRING(val,row_num,1) //preserve digit
        ELSE 
            '' //remove non-digit
        END IF
    ,'' ORDER BY row_num) INTO ret 
    FROM sa_rowgenerator(1,LENGTH(val)); //1 "row" for each character
    RETURN ret;
END;
permanent link

answered 18 Jan '12, 15:41

Tyson%20Lewis's gravatar image

Tyson Lewis
2.2k1641
accept rate: 22%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×238
×90
×26

question asked: 01 Dec '09, 18:26

question was seen: 1,920 times

last updated: 19 Jan '12, 07:30