SAP SQL Anywhere

Question

AFAIK, SQL Anywhere can handle strings both as binary and character data and cast them accordingly. From the docs:

CHAR, NCHAR, and BINARY data types
SQL Anywhere internals do not distinguish between fixed- and varying-length string types (CHAR, NCHAR, or BINARY).

Sometimes it's handy to cast character data to binary in order to use binary comparison (which can also be done with other means, say collation tailoring and the like).

Question: How do the builtin string functions handle binary arguments?

Problem: The parameter declaration within the documentation does usually just use the word "string" and does not tell the datatype. Datatypes are only specified for return values and some functions return character and/or binary types, for example SUBSTRING:

Returns
* LONG BINARY
* LONG VARCHAR
* LONG NVARCHAR

Other functions do only return character data, such as REPLACE.

Returns
* LONG VARCHAR
* LONG NVARCHAR

Can I assume these functions will cast them to character data (and therefore use collation semantics and the like) when the function does list its parameters as (long) varchar/(long) nvarchar and does not list them as (long) varbinary?

E.g. REPLACE() seems to treat character and binary arguments identically (and follows character semantics, here in a case-insensitive manner because the database is case-insensitive):

select
  replace(cast('abc' as varchar),   cast('A' as varchar),   cast('X' as varchar))   as char_replace,
  replace(cast('abc' as varbinary), cast('A' as varbinary), cast('X' as varbinary)) as bin_replace

 -- returns (and therefore treats 'a' like 'A')
 -- char_replace,bin_replace
 -- Xbc,Xbc

Accepted Answer

First let me say that I agree that the documentation is a little light on the details of the parameters that are accepted by each function and the behaviour of each function as it relates to the types of input parameters. I have discussed this with the documentation team in the past, and I spent several days last fall generating a method to semi-automatically compiling a report of all input combinations to all of the functions and the resultant outputs (but the information about the exact translation that occurs being inputs and outputs is not generated). The issue is complex and so far we have not found a simple way to represent the necessary information in the documentation.

The answer to the question "How do string functions work with binary data?" is "It depends on which function is being used". I.e. there is not a one-answer-fits-all-functions answer.

There are basically two classes of functions (hence two answers) though:

The first class of functions are those that explicitly handle binary data: these include functions like length(), len(), byte_length(), substr(), substring(), byte_substr(), csconvert(), to_char(), to_nchar(), hash(), encrypt(), decrypt(), isencrypted(), compress(), decompress(), base64_encode(), string(), patindex(), ... and maybe some more that I missed? These functions have explicit code that determines if the input is binary or a char type and then does something (that hopefully makes sense to you) based on that information.
The second class of functions are those that only handle [n][var]char data as input. These functions assume that the input is in database character collation (or database nchar collation) and therefore processes the [n]char inputs according to the corresponding collation rules (e.g. steps through the strings character-by-character where each 'character' is defined by the character set being used).

When binary input is given where a char type input is expected the binary data is automatically casted to char type (using an identity mapping - i.e. no conversion in the data is performed).

For example, providing binary inputs to REPLACE results in the binary data being treated as character data in the database CHAR character set and characters are replaced one-by-one accordingly. As such what you get out will depend entirely on the character set being used (and of course the inputs). E.g.

begin
   declare @bin long binary;
   declare @fr  long binary;
   declare @to  long binary;

  set @bin = hextobin( '0102030405060708090a0b0c0d0e0f' );

  set @fr = hextobin( '01' );
  set @to = hextobin( '40' );

  select bintohex( @bin ), bintohex( replace( @bin, @fr, @to ) )
end;

The output from the above is:

'0102030405060708090A0B0C0D0E0F','4002030405060708090A0B0C0D0E0F'

and is what I would have expected since I am using a single byte character set.

I would agree that a binary replace (e.g. byte_replace()) function could be useful... although I'm sure a crafty programmer might be able to come up with a solution using the existing capabilities. This is left as an exercise for the reader! ;-)

SAP SQL Anywhere

How do builtin string functions work with binary data?

Follow this question

Related questions