SAP Anywhere

Our database is of character encoding 1252 Latin (Western Europe) for CHAR and UTF-8 for NCHAR character sets, everything gets stored just fine and retrieved on our Powerbuilder application, but we also have an interface that exports XML files - if we try to export a record with some UTF-8 characters in them, we get the following error:

There was an error reading the results of SQL Statement. The displayed results may be incorrect or incomplete. Control character in XML output SQLCODE=-911, ODBC 3 State = "HY000"

Our output statement is as follows:

EXECUTE IMMEDIATE 'UNLOAD SELECT XMLGEN( xml version="1.0" encoding="UTF-8" Root{$x}/Root,
   for xml auto, elements) AS x ) TO ''' + filepath_filename + ''' ENCODING ''UTF-8'' format ASCII quotes off escapes off';

Outputs of database queries which do not retrieve UTF characters are working fine.

Any help appreciated.


asked 14 Sep, 11:36

PcrMember's gravatar image

accept rate: 0%

XML datatype appears to use the CHAR datatype so yours will be encoded in 1252. https://dcx.sap.com/index.html#sqla170/en/html/819b6cda6ce2101488ce846acd462106.html*loio819b6cda6ce2101488ce846acd462106

permanent link

answered 14 Sep, 12:00

John%20Smirnios's gravatar image

John Smirnios
accept rate: 38%

That following discussion with Nick Elson might give some clues how to proceed...

(15 Sep, 04:15) Volker Barth

Thanks, but I think there's a misunderstanding - we are not storing the XML at all as a datatype in any table, we are just retrieving text/numeric fields of 2 tables, with some of the fields being NVARCHAR and LONG NVARCHAR, others being mostly CHAR and LONGS, as the XML files - just pure text files with XML markup notation.

And this works fine as long as there are no UTF characters to be retrieved, but as soon as we get some, we are getting the above message?

Our "unload SELECT XMLGEN" line clearly states that encoding should be UTF-8:

EXECUTE IMMEDIATE 'UNLOAD SELECT XMLGEN( <?xml version="1.0" encoding="UTF-8" ?>

and the last output file settings specifies it as well:

ENCODING UTF-8 format ASCII quotes off escapes off'

Here is another thing - when I tried it on a database which has even CHAR as UTF encoding set, everything works just fine. It's only on this database which has CHAR as Latin1252 and NCHAR as UTF-8 which is giving us this problem.

(15 Sep, 05:35) PcrMember
Replies hidden

That's exactly the point: The XML functions like XMLGEN() return XML as data type which is based on VARCHAR - and as such, is restricted to your CHAR char set. Therefore it works with UTF8 as CHAR char set but will fail with single-byte char for non-fitting code points...

(15 Sep, 06:11) Volker Barth

I see, thanks. So there is simply no way for me to fix it like that?

In that case, I will convert the database to both CHAR and NCHAR encoding UTF-8. Are there any implications of it I should know about? I was a bit surprised UTF-8 wasn't a default encoding set for CHAR, but Latin 1252. Is there any reason for that, anything I should be cautious about?

(15 Sep, 06:32) PcrMember

I've just read that initial discussion in a bit more detail down the thread. I can't see immediately how I could change the XML data type to LONG NVARCHAR? I can see the XML domain in SYS.SYSDOMAIN, but not sure how to change it from CHAR to LONG NVARCHAR? Is this even worth trying? :-)

(15 Sep, 06:46) PcrMember

I don't think you can change the XML data type. If you rely on cp1252 as CHAR charset (like we do...), another approach would be to create the XML document and then use csconvert() to convert it to NCHAR (like we have done regularly). You would still be restricted to the cp1252 charset as the XML functions can only handle that (and therefore won't support, say eastern European characters) but would then encode the result accordingly.

This should resolve the issue if your client expects UTF-8 or other encodings.

(15 Sep, 08:00) Volker Barth
Comment Text Removed

I see - that wouldn't really solve the problem. I think I might swtich CHAR encoding set to be UTF-8, unless that would have any repercussions about stability or performance?

Thanks very much :-)

(15 Sep, 09:40) PcrMember

Handling multibyte characters always has a bit of a performance impact relative to singlebyte characters but it is probably very minor. It might even help you end-to-end if your web clients are UTF-8 since you will skip a charset translation from 1252 to UTF-8 for all data going to the client.

You will need to choose between UTF8BIN (smaller performance impact but less flexible sorting/comparisons [for example, you cannot get accent insensitivity]) and UCA (more impact but has linguistic sorting). NCHAR uses UCA by default; I recommend using that for CHAR too.

(15 Sep, 09:50) John Smirnios

Thank you John

(15 Sep, 10:48) PcrMember
showing 2 of 9 show all flat view
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 14 Sep, 11:36

question was seen: 115 times

last updated: 15 Sep, 10:48