I have one column that have the following values.

 a,b,c,d,e,f,g,h,i,j
 1,2,3,4,5,6,7,8,9,10
 r4,32,13,49,k1,67,10,56,12,11

I need to return 7th element for those rows

the result should be:

1st row: g
2nd row: 7
3rd row: 10
...

but I found it hard to do in sybase IQ/Sybase ASE. I don't think sybase supports regular expression. What is the best way to handle the situation?

thanks

asked 21 Mar '14, 10:47

sqlgeek's gravatar image

sqlgeek
1477713
accept rate: 0%

edited 21 Mar '14, 11:09

Mark%20Culp's gravatar image

Mark Culp
22.7k9129266


Try using sa_split_list(). E.g.

select row_value
  from sa_split_list( 'a,b,c,d,e,f,g,h,i,j' )
where line_num = 7

returns 'g'.

You can wrap this in a UDF like this:

create function nth_item_in_list( in list long varchar, in num int )
returns long varchar
begin
    declare @selected long varchar;
    select row_value
      into @selected
      from sa_split_list( list )
    where line_num = num;
    return @selected;
end;

and then you can use:

select nth_item_in_list( mytablecol, 7 )
  from mytable;

HTH

permanent link

answered 21 Mar '14, 11:05

Mark%20Culp's gravatar image

Mark Culp
22.7k9129266
accept rate: 40%

edited 21 Mar '14, 11:07

You could also use OPENSTRING to advantage here. In my testing with ~10K rows, it executed in 0.145s compared to 3.313s for the regexp_substr approach.

drop table if exists T_S;
create table T_S(x int, str long varchar);
insert into T_S(x,str)
select row_num / 10 x, list(row_num) as str
from sa_rowgenerator(0,100000)
group by x
order by 1;
commit;

Then to use the OPENSTRING:

select T_S.*, c7
from T_S cross apply 
openstring( value T_S.str )
with ( 
    c1 varchar(255) 
    , c2 varchar(255) 
    , c3 varchar(255) 
    , c4 varchar(255) 
    , c5 varchar(255) 
    , c6 varchar(255) 
    , c7 varchar(255) 
) D
order by x

The above formulation relies on us getting exactly the 7th column (known at query time) and if there are additional columns beyond 7 they are just ignored. If you don't know ahead of time how many columns there are or which column you want to retrieve, you could instruct OPENSTRING that rows are delimited by ',' (and that columns are delimited by some out of band character), then use ROWID(D) to select which element from the list you want.

I find that OPENSTRING generally performs better than sa_split_list but it is not a drop in replacement in all cases.

You could still wrap the above in a UDF if you like but you would lose some of the performance benefits -- even if inlined, it would be a subquery expression instead of a join. If you write a procedure carefully you could use it in the FROM clause with CROSS APPLY or OUTER APPLY and expect it to be inlined.

permanent link

answered 22 Mar '14, 14:26

Ivan%20T.%20Bowman's gravatar image

Ivan T. Bowman
2.8k22732
accept rate: 39%

edited 22 Mar '14, 14:27

While I would prefer Mark's solution with sa_split_list, in case you want to use a regex, the following should do:

select val_list, regexp_substr(value_list, '(?<=([^,]*,){6})[^,]*') as seventh_element
from 
(select list(row_num) as value_list from sa_rowgenerator(-10, 30, 4)) as test

The sa_rowgenerator will build a list of each 4th integer from -10 to 30, and the (rather funky) pattern will search for a string that is preceded with a sixfold group of text delimited by a comma, where the pattern is limited by the next comma.

The result set is:

value_list, seventh_element
'-10,-6,-2,2,6,10,14,18,22,26,30','14'

permanent link

answered 21 Mar '14, 12:00

Volker%20Barth's gravatar image

Volker Barth
30.0k294448654
accept rate: 32%

1

Thanks Volker for posting the regexp solution - I figured there was a way of doing it using regexp_substr!

I was curious to know how regexp_substr performed as compared to my sa_split_list solution and it would appear that using regexp_substr is a winner by a long shot.

I tested the performance of each using the following sequence on a recent Linux SQLA v16 production build:

create table foo( i int default autoincrement, s long varchar );

create function random_list( in num_el int )
returns long varchar
not deterministic
begin
    declare @list long varchar;
    select list( substr( 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', cast( rand() * 62 as int ), 1 ) )
      into @list
      from sa_rowgenerator( 1, num_el, 1 ) ;
    return @list;
end;

insert into foo( s )
select random_list( 10 )
  from sa_rowgenerator( 1, 10000, 1 );
commit;

create function nth_item_in_list( in list long varchar, in num int )
returns long varchar
begin
    declare @selected long varchar;
    select row_value
      into @selected
      from sa_split_list( list )
    where line_num = num;
    return @selected;
end;

create function nth_item_in_list_re( in list long varchar, in num int )
returns long varchar
begin
    declare @selected long varchar;
    set @selected = regexp_substr( list, '(?<=([^,]*,){' || (num-1) || '})[^,]*' );
    return @selected;
end;

create function nth_item_in_list_re2( in list long varchar, in num int )
returns long varchar
begin
    declare @selected long varchar;
    set @selected = regexp_substr( list, '(?<=([^,]*,){' || (num-1) || '})[^,]*' );
    set @selected = '' || @selected || '';  -- stop inlining of this function
    return @selected;
end;

commit;
begin
   declare @tlen int;
   declare @t1 timestamp;
   declare @t2 timestamp;
  set @t1 = now(*);

select sum( length(
-- pick one of the following:
     nth_item_in_list( foo.s, 7 )
--   regexp_substr( foo.s, '(?<=([^,]*,){6})[^,]*' )
--   nth_item_in_list_re( foo.s, 7 )
--   nth_item_in_list_re2( foo.s, 7 )
    ))
    into @tlen
    from foo;

set @t2 = now(*);
  select @t1, @t2, datediff( millisecond, @t1, @t2 ) as millisec, @tlen
end;

Using regexp_substr() or nth_item_in_list_re() gave roughly the same performance - which is what I expected since the UDF will be inlined - and in my testing is 100 times faster!! Nice!

Using the nth_item_in_list_re2() function (which uses regexp_substr but will not be inlined) still performed roughly 3 times faster than the sa_split_list solution.

The difference in performance between the '_re' and '_re2' test runs gives an indication of the cost of calling a UDF 10000 times and the benefits of inlining whereever possible.

(21 Mar '14, 14:00) Mark Culp
Replies hidden

That's a very helpful experiment! Though I would think that the sa_split_list solution is surely way easier to understand - at least it took me a while to put the regex pattern together:)

(21 Mar '14, 15:02) Volker Barth

Thanks guys!

permanent link

answered 21 Mar '14, 13:20

sqlgeek's gravatar image

sqlgeek
1477713
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×34
×12

question asked: 21 Mar '14, 10:47

question was seen: 3,435 times

last updated: 22 Mar '14, 14:27