Hi all,

I got a couple of customer records that need to bechecked for duplicate house numbers in the address / street field. Due to a frontend error, several entries have been created that have duplicate house numbers like this:

'Sesame Street 312 312' or 'ArlingtonRoad 7 7' or 'Queens Boulevard 22 22'

My task is to identify those entries with a duplicate street number like the ones above. Tried several combinations with substr, stuff, charindex but did not find a solution...

Thanks! Markus

asked 07 Nov '14, 06:20

Kusie's gravatar image

Kusie
16112
accept rate: 0%


This might not be the easiest way, and it requires v11 or above as it does make use of regular expressions, particularly of REGEX_SUBSTR.

I'm assuming the house numbers consist of digits only and follow any other non-digit parts, otherwise you will have to adapt the pattern.

The REGEXP pattern "\s[0-9]+" looks for a blank followed by one or more digits, and the "positive lookahead zero-width assertion" expression "(?=\s[0-9]+)" makes sure there's one more pattern following the current position.

The first WHERE condition finds all entries with at least 2 trailing blocks of digits (whether identical or not) and is intended to speed up the query (though I don't know if it really does), the second condition compares both blocks. You might omit the fist condition.

select *,
   regexp_substr(Streetname, '\s[0-9]+$') as LastNumberBlock,
   regexp_substr(Streetname, '\s[0-9]+(?=\s[0-9]+$)') as SecondLastNumberBlock
from MyTable
where Streetname regexp '.*(\s[0-9]+){2}' 
   and regexp_substr(Streetname, '\s[0-9]+$')
     = regexp_substr(Streetname, '\s[0-9]+(?=\s[0-9]+$)')

That's a starting point, I'm sure a regex expert can certainly do better...

permanent link

answered 07 Nov '14, 08:42

Volker%20Barth's gravatar image

Volker Barth
29.5k291441646
accept rate: 32%

edited 07 Nov '14, 08:46

Hi,

many thanks for taking the time to look into my problem. Unfortunately we´re using Sybase IQ (Sybase IQ/15.4.0.3027) and regular expressions or regexp functions are not supported - at leats that´s what I was told by our DBAs. Sorry, should have given that info in the first place...

Markus

(07 Nov '14, 08:54) Kusie
Replies hidden
1

This forum concentrates on SQL Anywhere. You might have move luck with IQ specific question here: SAP IQ SCN

(07 Nov '14, 09:59) Reimer Pods

Solve the task in the dumb and direct way - create the application that reads the entire table and checks every record.

By the way, an idea came to my mind. Can you split strings by the space character, and compare last two tokens. If they are equal - mark the row.

Does SAP IQ have sa_split_list or something similar?

Or simply find all row with more than 2 spaces in that column. If the list is not big enough, give it to your DBAs.

(08 Nov '14, 08:16) Vlad

Here is some sample data:

create table Taddr( id int, address long varchar);
insert into Taddr values(1,'Sesame Street 312 312')
   , (2,'Queens Boulevard 22 22')
   , (3,'Quinpool Road 512');
commit;

Provided that you know that the house number is numeric and last in the address, you could use something like the following:

select id, address, locate(address,' ',-1 ) loc_last_space
    , substr( address, loc_last_space ) last_num
    , locate( address, last_num ) loc_first_num
from TAddr 
where loc_last_space > loc_first_num and isnumeric(last_num)=1

The above finds the last space in the string (locate with negative offset returns last occurrence), uses that to find the last word in the address, then finds the first occurrence of the last word. If the first occurrence is earlier, then there is a duplicate. I check too that the last word is numeric. You could refine this to check that the first occurrence immediately precedes the last space so that you don't incorrectly identify the same number earlier in the string as a duplicate.

If your data doesn't necessarily have the house number as the last word, it is a harder problem. The following would work with SQLA (since about 9.0 I believe) by first splitting words on space boundaries (using openstring; sa_split_list works well too) then looking for duplicated numbers:

select *
from TAddr
where exists ( 
    select 1 from openstring( value address ) 
    with ( num long varchar ) 
    options( delimited by '\0' row delimited by ' ') O
    where isnumeric(num)=1
    group by num
    having count(*) > 1 );

I suspect this would work in IQ but perhaps slowly as OPENSTRING is implemented only in the system dbspace and the query would likely run in partial passthrough mode. Depending on the size of the data it might be satisfactory, or you might get better performance by copying the table or a portion of it to the system dbspace.

permanent link

answered 20 Nov '14, 08:31

Ivan%20T.%20Bowman's gravatar image

Ivan T. Bowman
2.8k22732
accept rate: 39%

edited 20 Nov '14, 08:34

Fantastic, that set me on the right track! Thanks for taking the time to answer and help with my problem. best regards, Markus

(21 Nov '14, 08:46) Kusie
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×12

question asked: 07 Nov '14, 06:20

question was seen: 839 times

last updated: 21 Nov '14, 08:46