Upper case umlauts not recognised

I found that upper case umlauts in a SELECT does not lead to a result (set). Following SELECT works as expected, i. e. results in some rows displayed.

SELECT name
FROM persons
WHERE name LIKE 'Müller%'

Changing only the lower case ü to an upper case Ü, and no rows are displayed any more. Any idea why this behaviour?

Regards, Robert

SQL Anywhere 17.0.4.2129 macOS 10.12.5 Java 1.8.0_131

collation sa-17 character-set

asked 14 Jun '17, 11:21

robert
853●47●54●68
accept rate: 0%

edited 14 Jun '17, 11:37

Volker Barth
40.5k●365●556●827

How did you initialize your database? What character set, collation, case sensitivity, etc? Run dbinfo on your database and post the output.

(14 Jun '17, 11:27) Mark Culp

Replies hidden

Also, what is the datatype of 'name': CHAR or NCHAR?

(14 Jun '17, 11:28) John Smirnios

Hi Mark

Excuse me for being so late. I was away. I run dbinfo and the following is the header part I got as result:

SQL Anywhere Information Utility Version 17.0.4.2129
Database  : /Applications/SQLAnywhere17/databases/Hades.db
Log file  : /Applications/SQLAnywhere17/databases/Hades.log
Log mirror: none
Free pages: 5445
Page size : 4096
Encrypted : No
Strings padded with blanks for comparisons: No
CHAR collation sequence: UTF8(CaseSensitivity=Ignore)
CHAR character set encoding: UTF-8
NCHAR collation sequence: UCA(CaseSensitivity=Ignore;AccentSensitivity=Ignore;PunctuationSensitivity=Primary)
NCHAR character set encoding: UTF-8
Database checksums enabled: Yes
Encrypted tables supported: No
Current timeline GUID: 2403a5a0-7023-4916-a7d0-220512338fa3

Best regards, Robert

(20 Jun '17, 09:10) robert

Hi John

The datatype is CHAR(255)

best regards, Robert

(20 Jun '17, 09:11) robert

Additional Info about set options: sort_collation: 51 (default: Internal)

and, but has nothing to do with the problem I assume: date_format: dd.MM.yyyy date_order: dmy

Regards, Robert

(20 Jun '17, 09:34) robert

FWIW, I tested with a fresh v17 database with (almost) identical settings, particularly using CHAR with UTF-8 and UCA collation:

QL Anywhere Information Utility Version 17.0.7.3382
Database  : d:\data\Chartest\Chartest.db
Log file  : d:\data\Chartest\CharTest.log
Log mirror: none
Page size : 4096
Encrypted : No
Strings padded with blanks for comparisons: No
CHAR collation sequence: UCA(CaseSensitivity=Ignore;AccentSensitivity=Ignore;PunctuationSensitivity=Primary)
CHAR character set encoding: UTF-8
NCHAR collation sequence: UCA(CaseSensitivity=Ignore;AccentSensitivity=Ignore;PunctuationSensitivity=Primary)
NCHAR character set encoding: UTF-8
Database checksums enabled: Yes
Encrypted tables supported: No
Current timeline GUID: 520ce3ac-df3b-4948-8cea-cae5409731c6

and then used Breck's test script, and it always listed both "Müller" and "MÜLLER", as expected for UCA collations...

What I'm not aware of is the default for "AccentSensitivity" and "PunctuationSensitivity" for the UCA CHAR collation, although I choose the default values, and that are "Ignore" resp. "Primary" - in your DBINFO, those properties are not listed. Don't know whether that makes a difference, but as stated originally, you can easily use collation tailoring to test different collations without to have to create or rebuild databases.

EDIT:

Sorry, I fully missed the difference in the CHAR collation (CHAR collation sequence: UTF8 vs. UCA). So that tells that John's answer w.r.t. "UTF8BIN" does apply here. Ingore my previous paragraph about UCA tailoring...

You can rebuild the database based on a newly created database with a CREATE DATABASE statement like

CREATE DATABASE ...
COLLATION 'UCA'
NCHAR COLLATION 'UCA' ...

possibly adding collation tailoring options for your (German?) locale...

As John has told, in case you can do with just 256 different characters (not uncommon with German), you can also use COLLATION 'UCA' ENCODING 'CP1252'.

(20 Jun '17, 11:14) Volker Barth

Hi Volker

Thanks for the dbinfo. That difference seems to be the solution. Do you know hot to add the UCA(...) information to my database(s)? I am not aware I specified it at creation time. Is it possible to add it when rebuilding the database?

Regards, Robert

(23 Jun '17, 10:31) robert

Thanks for pointing to the 'UCA' possibilities when creating a new database. Will try that. And yes,it's a German locale — Swiss German to be precise;-)

(04 Jul '17, 23:55) robert

Replies hidden

Ah, so you even don't need to store the 'ß', right? :)

(05 Jul '17, 01:03) Volker Barth

Aside: A few days ago German official orthography introduced the uppercase esszet (sharp s), so one might need to try out whether UCA already does handle that new char conversion...

Hm, the forum page apparently cannot yet display that character:)

(05 Jul '17, 07:14) Volker Barth

We hava a complicated developer life with Umlaute ;-)

(12 Jul '17, 03:22) robert

showing 3 of 11 show all flat view

3 Answers:

active answers oldest answers newest answers popular answers

The SQL Anywhere collation "UTF8" (not to be confused with the UTF8 character set which is used to encode the characters) is very similar to UTF8BIN and is a SQL Anywhere legacy collation. It can only do case equivalency on single-byte characters. To sort UTF-8 encoded characters correctly, you must use UCA.

If you don't need Unicode / UTF-8 then you might use a collation such as windows-1252 but that, of course, limits what characters you can store to just the 256 characters in windows-1252.

permanent link

answered 20 Jun '17, 10:36

John Smirnios
12.0k●3●96●166
accept rate: 37%

flat view

I can't tell about that particular umlaut/case issue, however, you can use a fitting collation independent of your global database setting with the help of the builtin COMPARE function and the "collation tailoring" feature:

create table test (ch varchar(5));

insert into test values ('a');
insert into test values ('A');
insert into test values ('ä'); -- add accented char (umlaut)
insert into test values ('Á');

-- lists all four chars in a case-insensitive database
select * from test
where ch = 'A';

-- Test for equality based on UCA collation - lists just 'A'
select * from test
where compare(ch, 'A', 'UCA(locale=en;case=respect;accent=respect)') = 0;

-- Test for equality based on default 1252LATIN1 with case respect
-- - list 'A' and 'Ä' here (i.e. does ignore accents)
select * from test
where compare(ch, 'A', '1252LATIN1(case=respect)') = 0;

permanent link

answered 14 Jun '17, 11:34

Volker Barth
40.5k●365●556●827
accept rate: 34%

edited 14 Jun '17, 11:36

Hi Volker

Thanks for your answer. I recognise I must be more specific. What I mean is that if I have names written as 'MÜLLER' as well as 'Müller' in the database, I do not get both of them with the WHERE clause like 'Müller'. If I use 'MÜLLER', I get all names written in capitals, but not the ones written as CamelCase. And vice à versa. The database settings are (N)CHAR case sensitivity: Ignore (see screenshot).

I tried with your suggestion (your example works of course), but I can't figure out how to apply it on my SELECT, giving all 'Müller' as result, however there are stored in the database.

But what bothers me is that although the database is set to case sensitivity ignore, an upper case Umlaut does not "translate" in a lower case Umlaut.

(15 Jun '17, 05:29) robert

Replies hidden

Please answer the comments to your question. If you are using, for example, UTF8BIN collation rather than UCA, the behaviour is expected.

See http://dcx.sybase.com/index.html#sqla170/en/html/3bccdcfe6c5f10148f50801e84c733a2.html

(15 Jun '17, 05:50) John Smirnios

John, still on sybase.com? :)

So you basically say, one needs to use the Unicode Collation Algorithm (UCA) to get proper case conversion for characters beyond the 7-bit English characters, like the umlauts, right?

(15 Jun '17, 06:53) Volker Barth

Hi John

I hopefully did answer it with the above?

Datatype correction of my answer: The datatype is VARCHAR(255).

Thanks and regards, Robert

(20 Jun '17, 09:28) robert

The "legacy" collations for SQL Anywhere can do case conversion for single-byte characters. That includes all characters for collations such as windows-1252. In UTF8, accented characters are multibyte characters and require ICU to do case conversion.

(20 Jun '17, 10:31) John Smirnios

Ah yes, I should've known that, since I have never ever had problems converting umlauts with v5.5 and above using those legacy collations and (VAR)CHAR data types, and of course Germans use those a lot...:)

So my statement would only be true for UTF8.

(20 Jun '17, 10:37) Volker Barth

showing 1 of 6 show all flat view

Try waving a dead chicken over the keyboard... in this case, LCASE ( ... )

Alas, I can't reproduce this error on a new V17 database on Windows; all the result sets are the same:

CREATE TABLE persons ( name VARCHAR ( 10 ) );
INSERT persons VALUES ( 'Müller' );
INSERT persons VALUES ( 'MÜLLER' );
COMMIT;
SELECT @@VERSION, name FROM persons WHERE name LIKE 'Müller%'
SELECT @@VERSION, name FROM persons WHERE name LIKE 'MÜLLER%'
SELECT @@VERSION, name FROM persons WHERE name LIKE 'MÜller%'
SELECT @@VERSION, name FROM persons WHERE name LIKE LCASE ( 'MÜller%' );
SELECT @@VERSION, name FROM persons WHERE LCASE ( name ) LIKE LCASE ( 'MÜller%' );

@@VERSION,name
'17.0.4.2053','Müller'
'17.0.4.2053','MÜLLER'

permanent link

answered 15 Jun '17, 05:49

Breck Carter
32.5k●541●727●1050
accept rate: 20%

LCASE may also not work -- depending on the collation being used. UTF8BIN cannot do case conversion beyond the ASCII range.

(15 Jun '17, 05:51) John Smirnios

Replies hidden

Hi Breck

With

SELECT @@VERSION, name FROM persons WHERE name LIKE 'Müller%'

I expected to get the 2 records with 'Müller' and 'MÜLLER' but only get 'Müller'.

(20 Jun '17, 09:26) robert

flat view

Your answer

toggle preview

community wiki:

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

*italic* or _italic_
**bold** or __bold__
link:[text](http://url.com/ "title")
image?![alt text](/path/img.jpg "title")
numbered list: 1. Foo 2. Bar
to add a line break simply add two spaces to where you would like the new line to be.
basic HTML tags are also supported

learn more about Markdown

Question tags:

sa-17 ×248
collation ×17
character-set ×15

question asked: 14 Jun '17, 11:21

question was seen: 3,563 times

last updated: 12 Jul '17, 03:22

SAP SQL Anywhere

Upper case umlauts not recognised

Follow this question

Related questions