AFAIK, SQL Anywhere can handle strings both as binary and character data and cast them accordingly. From the docs:
Sometimes it's handy to cast character data to binary in order to use binary comparison (which can also be done with other means, say collation tailoring and the like). Question: How do the builtin string functions handle binary arguments? Problem: The parameter declaration within the documentation does usually just use the word "string" and does not tell the datatype. Datatypes are only specified for return values and some functions return character and/or binary types, for example SUBSTRING:
Other functions do only return character data, such as REPLACE.
Can I assume these functions will cast them to character data (and therefore use collation semantics and the like) when the function does list its parameters as (long) varchar/(long) nvarchar and does not list them as (long) varbinary? E.g. REPLACE() seems to treat character and binary arguments identically (and follows character semantics, here in a case-insensitive manner because the database is case-insensitive):
|
First let me say that I agree that the documentation is a little light on the details of the parameters that are accepted by each function and the behaviour of each function as it relates to the types of input parameters. I have discussed this with the documentation team in the past, and I spent several days last fall generating a method to semi-automatically compiling a report of all input combinations to all of the functions and the resultant outputs (but the information about the exact translation that occurs being inputs and outputs is not generated). The issue is complex and so far we have not found a simple way to represent the necessary information in the documentation. The answer to the question "How do string functions work with binary data?" is "It depends on which function is being used". I.e. there is not a one-answer-fits-all-functions answer. There are basically two classes of functions (hence two answers) though:
When binary input is given where a char type input is expected the binary data is automatically casted to char type (using an identity mapping - i.e. no conversion in the data is performed). For example, providing binary inputs to REPLACE results in the binary data being treated as character data in the database CHAR character set and characters are replaced one-by-one accordingly. As such what you get out will depend entirely on the character set being used (and of course the inputs). E.g. begin declare @bin long binary; declare @fr long binary; declare @to long binary; set @bin = hextobin( '0102030405060708090a0b0c0d0e0f' ); set @fr = hextobin( '01' ); set @to = hextobin( '40' ); select bintohex( @bin ), bintohex( replace( @bin, @fr, @to ) ) end; The output from the above is: '0102030405060708090A0B0C0D0E0F','4002030405060708090A0B0C0D0E0F'and is what I would have expected since I am using a single byte character set. I would agree that a binary replace (e.g. byte_replace()) function could be useful... although I'm sure a crafty programmer might be able to come up with a solution using the existing capabilities. This is left as an exercise for the reader! ;-) 1
Many thanks for that comprehensive answer - it is very useful. This is left as an exercise for the reader! ;-) indeed that's what we have done, but as a user-defined function it's gruesomely slow compared to the built-in equivalent. In occasional usage that's not a problem, but when you are converting millions of rows of data a lot of coffee gets drunk!
(05 Jun '14, 18:33)
Justin Willey
Replies hidden
1
Thanks for that thorough information - maybe that could be used to add a separate topic to the docs, linked to "string functions"? As a side note: So for builtin functions, SA has a concept of overloading/polymorphism whereas user-defined functions and procedures do not support that? Otherwise it would be difficult to understand how the same function (name) can handle data of different types...
(06 Jun '14, 03:43)
Volker Barth
Replies hidden
To help understand the business case, for what purpose are you doing binary data manipulation in the server and how often do you do it?
(06 Jun '14, 07:47)
Mark Culp
Regarding your side note: Correct, most builtin functions have multiple binding rules and the server runs through the list and picks the first one that fits the input types and then calls the associated routine for that rule.
(06 Jun '14, 07:52)
Mark Culp
This normally happens when we are trying to sort out data imported from other systems where there are character translation issues. eg an accented character appears as something else, or the GBP symbol has been replaced with something else. The problem with using Replace() is that in many cases the collation settings will mean that it can't be used eg (a real example): update person set notes=replace(notes,'ú','£') can't be used because every u in the field will also be changed as the collation sequence helpfully reckons that 'ú' is the same as 'u'. One can of course code a UDF to go through every string character by character - which is fine for updating a single value, but en masse it's very slow. An option on Replace() to ignore the collation would do the trick nicely.
(06 Jun '14, 08:00)
Justin Willey
1
In my case, it's usually been an (occasional) attempt to do a binary replace, i.e. when the character semantics would find undesired matches (differening in case/accents (German umlauts) and the like). That has raised the current question... Whereas it's - well, not easy but doable - to use different kinds of comparisons via collation tailoring and the like, string manipulation with different collations seems hard.
(06 Jun '14, 08:01)
Volker Barth
1
Just on the frequency question - it's not something we would expect to use as part of regular production code - it's a bit too scary for that. More something used in migrations, clean-up scripts, imports etc.
(06 Jun '14, 10:40)
Justin Willey
|
a BinaryReplace() function would be really helpful.