Hi I found that upper case umlauts in a SELECT does not lead to a result (set). Following SELECT works as expected, i. e. results in some rows displayed. SELECT name FROM persons WHERE name LIKE 'Müller%' Changing only the lower case ü to an upper case Ü, and no rows are displayed any more. Any idea why this behaviour? Regards, Robert SQL Anywhere 17.0.4.2129 macOS 10.12.5 Java 1.8.0_131 asked 14 Jun '17, 11:21 robert Volker Barth |
The SQL Anywhere collation "UTF8" (not to be confused with the UTF8 character set which is used to encode the characters) is very similar to UTF8BIN and is a SQL Anywhere legacy collation. It can only do case equivalency on single-byte characters. To sort UTF-8 encoded characters correctly, you must use UCA. If you don't need Unicode / UTF-8 then you might use a collation such as windows-1252 but that, of course, limits what characters you can store to just the 256 characters in windows-1252. answered 20 Jun '17, 10:36 John Smirnios |
I can't tell about that particular umlaut/case issue, however, you can use a fitting collation independent of your global database setting with the help of the builtin COMPARE function and the "collation tailoring" feature: create table test (ch varchar(5)); insert into test values ('a'); insert into test values ('A'); insert into test values ('ä'); -- add accented char (umlaut) insert into test values ('Á'); -- lists all four chars in a case-insensitive database select * from test where ch = 'A'; -- Test for equality based on UCA collation - lists just 'A' select * from test where compare(ch, 'A', 'UCA(locale=en;case=respect;accent=respect)') = 0; -- Test for equality based on default 1252LATIN1 with case respect -- - list 'A' and 'Ä' here (i.e. does ignore accents) select * from test where compare(ch, 'A', '1252LATIN1(case=respect)') = 0; answered 14 Jun '17, 11:34 Volker Barth Hi Volker Thanks for your answer. I recognise I must be more specific. What I mean is that if I have names written as 'MÜLLER' as well as 'Müller' in the database, I do not get both of them with the WHERE clause like 'Müller'. If I use 'MÜLLER', I get all names written in capitals, but not the ones written as CamelCase. And vice à versa. The database settings are (N)CHAR case sensitivity: Ignore (see screenshot). I tried with your suggestion (your example works of course), but I can't figure out how to apply it on my SELECT, giving all 'Müller' as result, however there are stored in the database. But what bothers me is that although the database is set to case sensitivity ignore, an upper case Umlaut does not "translate" in a lower case Umlaut.
(15 Jun '17, 05:29)
robert
Replies hidden
Please answer the comments to your question. If you are using, for example, UTF8BIN collation rather than UCA, the behaviour is expected. See http://dcx.sybase.com/index.html#sqla170/en/html/3bccdcfe6c5f10148f50801e84c733a2.html
(15 Jun '17, 05:50)
John Smirnios
John, still on sybase.com? :) So you basically say, one needs to use the Unicode Collation Algorithm (UCA) to get proper case conversion for characters beyond the 7-bit English characters, like the umlauts, right?
(15 Jun '17, 06:53)
Volker Barth
Hi John I hopefully did answer it with the above? Datatype correction of my answer: The datatype is VARCHAR(255). Thanks and regards, Robert
(20 Jun '17, 09:28)
robert
The "legacy" collations for SQL Anywhere can do case conversion for single-byte characters. That includes all characters for collations such as windows-1252. In UTF8, accented characters are multibyte characters and require ICU to do case conversion.
(20 Jun '17, 10:31)
John Smirnios
Ah yes, I should've known that, since I have never ever had problems converting umlauts with v5.5 and above using those legacy collations and (VAR)CHAR data types, and of course Germans use those a lot...:) So my statement would only be true for UTF8.
(20 Jun '17, 10:37)
Volker Barth
|
Try waving a dead chicken over the keyboard... in this case, LCASE ( ... ) Alas, I can't reproduce this error on a new V17 database on Windows; all the result sets are the same: CREATE TABLE persons ( name VARCHAR ( 10 ) ); INSERT persons VALUES ( 'Müller' ); INSERT persons VALUES ( 'MÜLLER' ); COMMIT; SELECT @@VERSION, name FROM persons WHERE name LIKE 'Müller%' SELECT @@VERSION, name FROM persons WHERE name LIKE 'MÜLLER%' SELECT @@VERSION, name FROM persons WHERE name LIKE 'MÜller%' SELECT @@VERSION, name FROM persons WHERE name LIKE LCASE ( 'MÜller%' ); SELECT @@VERSION, name FROM persons WHERE LCASE ( name ) LIKE LCASE ( 'MÜller%' ); @@VERSION,name '17.0.4.2053','Müller' '17.0.4.2053','MÜLLER' answered 15 Jun '17, 05:49 Breck Carter LCASE may also not work -- depending on the collation being used. UTF8BIN cannot do case conversion beyond the ASCII range.
(15 Jun '17, 05:51)
John Smirnios
Replies hidden
Hi Breck With SELECT @@VERSION, name FROM persons WHERE name LIKE 'Müller%' I expected to get the 2 records with 'Müller' and 'MÜLLER' but only get 'Müller'.
(20 Jun '17, 09:26)
robert
|
How did you initialize your database? What character set, collation, case sensitivity, etc? Run dbinfo on your database and post the output.
Also, what is the datatype of 'name': CHAR or NCHAR?
Hi Mark
Excuse me for being so late. I was away. I run dbinfo and the following is the header part I got as result:
Best regards, Robert
Hi John
The datatype is CHAR(255)
best regards, Robert
Additional Info about set options: sort_collation: 51 (default: Internal)
and, but has nothing to do with the problem I assume: date_format: dd.MM.yyyy date_order: dmy
Regards, Robert
FWIW, I tested with a fresh v17 database with (almost) identical settings, particularly using CHAR with UTF-8 and UCA collation:
and then used Breck's test script, and it always listed both "Müller" and "MÜLLER", as expected for UCA collations...
What I'm not aware of is the default for "AccentSensitivity" and "PunctuationSensitivity" for the UCA CHAR collation, although I choose the default values, and that are "Ignore" resp. "Primary" - in your DBINFO, those properties are not listed. Don't know whether that makes a difference, but as stated originally, you can easily use collation tailoring to test different collations without to have to create or rebuild databases.
EDIT:
Sorry, I fully missed the difference in the CHAR collation (CHAR collation sequence: UTF8 vs. UCA). So that tells that John's answer w.r.t. "UTF8BIN" does apply here. Ingore my previous paragraph about UCA tailoring...
You can rebuild the database based on a newly created database with a CREATE DATABASE statement like
possibly adding collation tailoring options for your (German?) locale...
As John has told, in case you can do with just 256 different characters (not uncommon with German), you can also use COLLATION 'UCA' ENCODING 'CP1252'.
Hi Volker
Thanks for the dbinfo. That difference seems to be the solution. Do you know hot to add the UCA(...) information to my database(s)? I am not aware I specified it at creation time. Is it possible to add it when rebuilding the database?
Regards, Robert
Thanks for pointing to the 'UCA' possibilities when creating a new database. Will try that. And yes,it's a German locale — Swiss German to be precise;-)
Ah, so you even don't need to store the 'ß', right? :)
Aside: A few days ago German official orthography introduced the uppercase esszet (sharp s), so one might need to try out whether UCA already does handle that new char conversion...
Hm, the forum page apparently cannot yet display that character:)
We hava a complicated developer life with Umlaute ;-)