SAP SQL Anywhere

Question

I have a tab delimited nvarchar column with many values. The problem is that we sometimes have duplicates in that column. How can I update the column and remove all duplicates?

Today it look like this:

ID--Material--Information--
---------------------------
1--Test1------Engine 12b \x09 Diam 32 \x09 Rear left \x09 Engine 12b
2--Test1------Engine 24b \x09 Diam 32 \x09 Rear right \x09 Diam 32 \x09 Rear right  
3--Test2------Engine 36b \x09 Diam 32 \x09 Rear left \x09 Rear left

I wanna remove duplicates in column informations so it look like this:


ID--Material--Information--
---------------------------
1--Test1------Engine 12b \x09 Diam 32 \x09 Rear left
2--Test1------Engine 24b \x09 Diam 32 \x09 Rear right   
3--Test2------Engine 36b \x09 Diam 32 \x09 Rear left

Accepted Answer

Here's a complete sample (FWIW, it's nice if you provide a simple test schema by yourself...):

create table Test
(
    ID int not null default autoincrement primary key,
    Material varchar(30) not null,
    Information varchar(1000) not null
);
insert Test values
   (1, 'Test1', 'Engine 12b \t Diam 32 \t Rear left \t Engine 12b'),
   (2, 'Test1', 'Engine 24b \t Diam 32 \t Rear right \t Diam 32 \t Rear right'),  
   (3, 'Test2', 'Engine 36b \t Diam 32 \t Rear left \t Rear left');

select * from Test;

Now use sa_split_list to build a query that turns the "blanks-and-tab-delimited" column into separate rows (while preserving the order) - note that cross apply is handy to use a column as parameter for a procedure:

select distinct id, line_num, row_value
from Test cross apply sa_split_list(Information, ' \t ')
order by 1, 2;

-- will return
id,line_num,row_value
1,1,Engine 12b
1,2,Diam 32
1,3,Rear left
1,4,Engine 12b
2,1,Engine 24b
2,2,Diam 32
2,3,Rear right
2,4,Diam 32
2,5,Rear right
3,1,Engine 36b
3,2,Diam 32
3,3,Rear left
3,4,Rear left

Now, use that as a derived query (dt1) to remove duplicate values (and still preserving the order, i.e. for duplicate values, the first appearance is preserved):

select id, row_value, min(line_num)
from
   (select distinct id, line_num, row_value
    from Test cross apply sa_split_list(Information, ' \t ')) dt
group by id, row_value
order by 1, 3;

-- will return 
id,row_value,min_line_num
1,Engine 12b,1
1,Diam 32,2
1,Rear left,3
2,Engine 24b,1
2,Diam 32,2
2,Rear right,3
3,Engine 36b,1
3,Diam 32,2
3,Rear left,3

Now, use that again as a derived query dt2 to re-build the list:

select id, list(row_value, ' \t ' order by min_line_num) as Deduplicated_Information
from
   (select id, row_value, min(line_num) as min_line_num
    from
       (select distinct id, line_num, row_value
        from Test cross apply sa_split_list(Information, ' \t ')) dt1
    group by id, row_value) dt2
group by id
order by 1;

-- will return
id,Deduplicated_Information
1,Engine 12b \t Diam 32 \t Rear left
2,Engine 24b \t Diam 32 \t Rear right
3,Engine 36b \t Diam 32 \t Rear left

Now, build a join with the original table to check the results:

select Test.*, Deduplicated_Information
from Test inner join
   (select id, list(row_value, ' \t ' order by min_line_num) as Deduplicated_Information
    from
      (select id, row_value, min(line_num) as min_line_num
       from
          (select distinct id, line_num, row_value
           from Test cross apply sa_split_list(Information, ' \t ')) dt1
       group by id, row_value) dt2
    group by id) dt3
   on Test.ID = dt3.id
order by 1;

-- will return
ID,Material,Information,Deduplicated_Information
1,Test1,Engine 12b \t Diam 32 \t Rear left \t Engine 12b,Engine 12b \t Diam 32 \t Rear left
2,Test1,Engine 24b \t Diam 32 \t Rear right \t Diam 32 \t Rear right,Engine 24b \t Diam 32 \t Rear right
3,Test2,Engine 36b \t Diam 32 \t Rear left \t Rear left,Engine 36b \t Diam 32 \t Rear left

and use that join to update the real table:

update Test
set Information = Deduplicated_Information
from Test inner join
   (select id, list(row_value, ' \t ' order by min_line_num) as Deduplicated_Information
    from
      (select id, row_value, min(line_num) as min_line_num
       from
          (select distinct id, line_num, row_value
           from Test cross apply sa_split_list(Information, ' \t ')) dt1
       group by id, row_value) dt2
    group by id) dt3
   on Test.ID = dt3.id;

select * from Test;

-- will return
ID,Material,Information
1,Test1,Engine 12b \t Diam 32 \t Rear left
2,Test1,Engine 24b \t Diam 32 \t Rear right
3,Test2,Engine 36b \t Diam 32 \t Rear left

Answer 2

1

Before giving you this answer, I honestly stole it from this page. I do not want to sound rude, but this was my search request.

Ok, returning to the proposal. Here it is:

SELECT DISTINCT(row_value)
FROM sa_split_list('Engine 12b \x09 Diam 32 \x09 Rear left \x09 Engine 12b', ' \x09 ');

Result:

row_value
-----------
Engine 12b
Diam 32
Rear left

I think, this will be a quite incorrect to do everything on the SQL server. I would suggest you to change your software code so it will use SET collections instead of LIST, when you serialize tokens (or deserialize) to/from the database. Plus, this kind of normalization could be implemented once for the column, so you will never have such question anymore.

At least this is how I would solve this task.

permanent link

answered 13 Oct '15, 03:38

Vlad
2.5k●9●11●27
accept rate: 16%

1

Just to add:

With LIST() you can turn the distinct rows into the single column "Information" again (though that might violate normalization rules, as Vlad has pointed out).

Based on that, you can certainly combine sa_split_list() and list() to update the columns containing duplicate tokens in one single update statement.

(13 Oct '15, 04:07) Volker Barth

A good suggestion, I forgot to search for the function LIST. I am always thinking that sometimes it is better to fix everything in the application code, rather than to use SQL.

(13 Oct '15, 04:17) Vlad

flat view

SAP SQL Anywhere

Remove duplicates in a tab delimited string?

Follow this question

Related questions