At the recent excellent Technical Summit held in Waterloo, I was challenged to make the case for a BinaryReplace() built-in function.
This would be a function that replaced a character string with another character string without reference to collation semantics etc (unlike the existing Replace() function - see this question for details of how Replace() interprets data. ByteReplace() might be a better name or it could perhaps be implemented as an extension to Replace().
What is is the problem?
Imagine you have a database containing data imported from another system, which uses a different character set or is from a different locale. You find that a particular character is incorrectly represented - a frequent example in the UK is the Pound Sterling sign - £. This often appears as ú. No problem you think - write a little routine that runs through all the affected tables and columns doing something like
update person set notes=replace(notes,'ú','£')
This is q£ite fast, b£t oh dear, yo£ soon find yo£ have q£ite a problem on yo£r hands. ALL yo£r £s (letter after t) have been changed to £s.
So what you (I've rolled back) have to do is write a letter by letter comparison user defined function (not very elegant example below) which now takes hours and hours to run. The same issue arises with trying to replace capital letters with lower case or vice versa (assuming a case insensitive database) and any operation involving accented characters that are treated alike by a collation.
There may be a very obvious solution I'm missing - possibly involving casting everything to binary beforehand (which can work for making case sensitive comparisons) - but it has so far evaded me. When I have had to deal with very large scale problems it's been faster to dump the data out, use a grep tool and read it all in again.
create function BinaryReplace(in x long varchar,in targetascii smallint, in replacementascii smallint) returns long varchar deterministic begin declare rv long varchar; declare l integer; declare i integer; declare c char(1); set l=length(x); set i=0; set rv=''; while i < l loop set i=i+1; set c=substr(x,i,1); if ascii(c) = targetascii then set c=char(replacementascii) end if; set rv=rv+c end loop; return rv end;
asked 09 Dec '14, 13:30
Your request was noted at the summit and added into our set of enhancement requests. I can't make promises but it would be nice if it made it into Zermatt. I also noticed that there is no binary version of LOCATE either.
answered 09 Dec '14, 13:40