Hi all,
we have some Problems converting UTF-8 to db charset.
Here are our db settings:
- SQL Anywhere v12.0.1.3942
- CHAR Collation sequence is ISO_BINENG / ISO_8859-1:1987
- NCHAR Collation sequence is UCA / UTF-8

We use the following (example) code :

begin
declare DeliveredChars   varchar(120);
declare ConvertedChars   nvarchar(120);

set DeliveredChars = 'ä / é / â / ü / ö / Ã-';  // problematic UTF-8 chars

// this row started converting the problematic chars from client charset UTF-8 to db charset - 
// but now it doesn't work anymore...

set ConvertedChars = DeliveredChars;  // translation from UTF-8 to db charset !?

// SQL Anywhere v12.0.1.3942
// CHAR Collation sequence is ISO_BINENG / ISO_8859-1:1987
// NCHAR Collation sequence is UCA / UTF-8

select ConvertedChars,  DeliveredChars from dummy;

// NO TRANSLATION
//         ConvertedChars                 | DeliveredChars
// returns ä / é / â / ü / ö / Ã-   | ä / é / â /  ü / ö / Ã- 
end

Do you have any solutions for this problem?

Kind regards Frank

asked 04 Mar '14, 06:04

Frank_V's gravatar image

Frank_V
31236
accept rate: 0%

edited 04 Mar '14, 07:19

Hello again, has nobody a solution to convert special UTF-8 characters to db charset (in this case to german umlaut characters like ü, ä, ö ...)?

(05 Mar '14, 05:40) Frank_V
Replies hidden

I guess it would be easier to help here if the chars would be assigned via their codepoints and not their graphical representation... - that might make it easier to reproduce the problem.

That's just my 2 cents, I'm no UTF-8 expert (although I surely have to cope with german umlauts everyday...)

(05 Mar '14, 06:06) Volker Barth
2

I agree even more strongly with Volker: The graphical representation of special characters as (a) captured by unspecified client software for display and then (b) delivered for our viewing pleasure via a long trail of forum, interweb and browser products, makes wild guesswork on our part necessary.

Please tell us, in actual hexadecimal byte values, exactly what the input characters are, what the (incorrect) output characters are, and what the correct output characters should be. We're geeks, we don't care about umlauts, we care about bits :)

(06 Mar '14, 09:14) Breck Carter

And what is the character set of the client application's connection? How was this SQL sent to the database server? dbisql?

(08 Mar '14, 16:05) John Smirnios

translation from UTF-8 to db charset

SQL strings are always first interpreted in the database's character set before being executed. See: SQL statements and character sets

SQL Anywhere Server character set conversion causes all SQL statements to be converted to the database character set before parsing and execution. A side-effect of this conversion is that any characters in the SQL statement that cannot be converted to the database character set are converted to a substitution character. A SQL statement with an arbitrary Unicode character can be executed in one of the following ways:

  • Use the UNISTR function to specify the Unicode character values
  • Use a host variable to specify the Unicode character values
  • Use UTF-8 as the database character set

I believe in your test as posted, you really are testing the individual characters Ã, ¤, etc.

has nobody a solution to convert special UTF-8 characters to db charset

Does CSCONVERT do what you need here?

 SELECT CAST(CSCONVERT('ä / é / â / ü / ö / Ã-', 'ISO_8859-1','UTF-8') AS VARCHAR) as result;

 result
 'ä / é / â / ü / ö / í'
permanent link

answered 05 Mar '14, 11:01

Jeff%20Albion's gravatar image

Jeff Albion
10.7k171174
accept rate: 24%

Hello Jeff and Volker,
thanks for answering - we've analysed this solution.
Yes, you're right to use CSCONVERT(), but this solution doesn't work, if you read problematic chars from a text file:

Example:
1) create with an Editor a new text file and save the problematic characters . c:\temp\test.txt ==>> 'ä / é / â'
2) Then read the Textfile with xp_read_file and use CSConvert() function...

--
-- create a Textfile with the problematic chars - for example ä / é / â
--
select cast( csconvert(xp_read_file('c:\\Temp\\test.txt'),'db_charset','utf8') as long varchar) from dummy;    
-- it doesn't work !!!
select cast( csconvert('ä / é / â','db_charset','utf8') as long varchar) from dummy;
-- this works fine !!!

Kind regards

Frank

(06 Mar '14, 09:05) Frank_V
Replies hidden
2

When you created the text file using your editor, what character set did the editor save the text as? When you use xp_read_file to read file contents the data within the file is not converted to db charset - it will remain in whatever format/charset that the file contained. Therefore the csconvert() 'from'-charset needs to be the charset that the editor used when it saved the file.

(06 Mar '14, 09:35) Mark Culp
1

Just to add: That's documented for xp_read_file() well enough in the docs, IMHO:

  • The return value is a LONG BINARY value (and no character data type), so there's no builtin char conversion.

If the data file is in a different character set, you can use the CSCONVERT function to convert it.

You can also use the CSCONVERT function to address character set conversion requirements you have when using the xp_read_file system procedure.

(07 Mar '14, 03:39) Volker Barth
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×10

question asked: 04 Mar '14, 06:04

question was seen: 3,685 times

last updated: 08 Mar '14, 16:05