Hi all, I got a couple of customer records that need to bechecked for duplicate house numbers in the address / street field. Due to a frontend error, several entries have been created that have duplicate house numbers like this: 'Sesame Street 312 312' or 'ArlingtonRoad 7 7' or 'Queens Boulevard 22 22' My task is to identify those entries with a duplicate street number like the ones above. Tried several combinations with substr, stuff, charindex but did not find a solution... Thanks! Markus |
This might not be the easiest way, and it requires v11 or above as it does make use of regular expressions, particularly of REGEX_SUBSTR. I'm assuming the house numbers consist of digits only and follow any other non-digit parts, otherwise you will have to adapt the pattern. The REGEXP pattern "\s[0-9]+" looks for a blank followed by one or more digits, and the "positive lookahead zero-width assertion" expression "(?=\s[0-9]+)" makes sure there's one more pattern following the current position. The first WHERE condition finds all entries with at least 2 trailing blocks of digits (whether identical or not) and is intended to speed up the query (though I don't know if it really does), the second condition compares both blocks. You might omit the fist condition.
That's a starting point, I'm sure a regex expert can certainly do better... Hi, many thanks for taking the time to look into my problem. Unfortunately we´re using Sybase IQ (Sybase IQ/15.4.0.3027) and regular expressions or regexp functions are not supported - at leats that´s what I was told by our DBAs. Sorry, should have given that info in the first place... Markus
(07 Nov '14, 08:54)
Kusie
Replies hidden
1
This forum concentrates on SQL Anywhere. You might have move luck with IQ specific question here: SAP IQ SCN
(07 Nov '14, 09:59)
Reimer Pods
Solve the task in the dumb and direct way - create the application that reads the entire table and checks every record. By the way, an idea came to my mind. Can you split strings by the space character, and compare last two tokens. If they are equal - mark the row. Does SAP IQ have sa_split_list or something similar? Or simply find all row with more than 2 spaces in that column. If the list is not big enough, give it to your DBAs.
(08 Nov '14, 08:16)
Vlad
|
Here is some sample data:
Provided that you know that the house number is numeric and last in the address, you could use something like the following:
The above finds the last space in the string (locate with negative offset returns last occurrence), uses that to find the last word in the address, then finds the first occurrence of the last word. If the first occurrence is earlier, then there is a duplicate. I check too that the last word is numeric. You could refine this to check that the first occurrence immediately precedes the last space so that you don't incorrectly identify the same number earlier in the string as a duplicate. If your data doesn't necessarily have the house number as the last word, it is a harder problem. The following would work with SQLA (since about 9.0 I believe) by first splitting words on space boundaries (using openstring; sa_split_list works well too) then looking for duplicated numbers:
I suspect this would work in IQ but perhaps slowly as OPENSTRING is implemented only in the system dbspace and the query would likely run in partial passthrough mode. Depending on the size of the data it might be satisfactory, or you might get better performance by copying the table or a portion of it to the system dbspace. Fantastic, that set me on the right track! Thanks for taking the time to answer and help with my problem. best regards, Markus
(21 Nov '14, 08:46)
Kusie
|