Hi

I found that upper case umlauts in a SELECT does not lead to a result (set). Following SELECT works as expected, i. e. results in some rows displayed.

SELECT name
FROM persons
WHERE name LIKE 'Müller%'

Changing only the lower case ü to an upper case Ü, and no rows are displayed any more. Any idea why this behaviour?

Regards, Robert

SQL Anywhere 17.0.4.2129 macOS 10.12.5 Java 1.8.0_131

asked 14 Jun, 11:21

robert's gravatar image

robert
578283548
accept rate: 0%

edited 14 Jun, 11:37

Volker%20Barth's gravatar image

Volker Barth
30.6k304455662

How did you initialize your database? What character set, collation, case sensitivity, etc? Run dbinfo on your database and post the output.

(14 Jun, 11:27) Mark Culp
Replies hidden

Also, what is the datatype of 'name': CHAR or NCHAR?

(14 Jun, 11:28) John Smirnios

Hi Mark

Excuse me for being so late. I was away. I run dbinfo and the following is the header part I got as result:

SQL Anywhere Information Utility Version 17.0.4.2129
Database  : /Applications/SQLAnywhere17/databases/Hades.db
Log file  : /Applications/SQLAnywhere17/databases/Hades.log
Log mirror: none
Free pages: 5445
Page size : 4096
Encrypted : No
Strings padded with blanks for comparisons: No
CHAR collation sequence: UTF8(CaseSensitivity=Ignore)
CHAR character set encoding: UTF-8
NCHAR collation sequence: UCA(CaseSensitivity=Ignore;AccentSensitivity=Ignore;PunctuationSensitivity=Primary)
NCHAR character set encoding: UTF-8
Database checksums enabled: Yes
Encrypted tables supported: No
Current timeline GUID: 2403a5a0-7023-4916-a7d0-220512338fa3

Best regards, Robert

(2 days ago) robert

Hi John

The datatype is CHAR(255)

best regards, Robert

(2 days ago) robert

Additional Info about set options: sort_collation: 51 (default: Internal)

and, but has nothing to do with the problem I assume: date_format: dd.MM.yyyy date_order: dmy

Regards, Robert

(2 days ago) robert

FWIW, I tested with a fresh v17 database with (almost) identical settings, particularly using CHAR with UTF-8 and UCA collation:

QL Anywhere Information Utility Version 17.0.7.3382
Database  : d:\data\Chartest\Chartest.db
Log file  : d:\data\Chartest\CharTest.log
Log mirror: none
Page size : 4096
Encrypted : No
Strings padded with blanks for comparisons: No
CHAR collation sequence: UCA(CaseSensitivity=Ignore;AccentSensitivity=Ignore;PunctuationSensitivity=Primary)
CHAR character set encoding: UTF-8
NCHAR collation sequence: UCA(CaseSensitivity=Ignore;AccentSensitivity=Ignore;PunctuationSensitivity=Primary)
NCHAR character set encoding: UTF-8
Database checksums enabled: Yes
Encrypted tables supported: No
Current timeline GUID: 520ce3ac-df3b-4948-8cea-cae5409731c6

and then used Breck's test script, and it always listed both "Müller" and "MÜLLER", as expected for UCA collations...


What I'm not aware of is the default for "AccentSensitivity" and "PunctuationSensitivity" for the UCA CHAR collation, although I choose the default values, and that are "Ignore" resp. "Primary" - in your DBINFO, those properties are not listed. Don't know whether that makes a difference, but as stated originally, you can easily use collation tailoring to test different collations without to have to create or rebuild databases.

(2 days ago) Volker Barth
showing 1 of 6 show all flat view

The SQL Anywhere collation "UTF8" (not to be confused with the UTF8 character set which is used to encode the characters) is very similar to UTF8BIN and is a SQL Anywhere legacy collation. It can only do case equivalency on single-byte characters. To sort UTF-8 encoded characters correctly, you must use UCA.

If you don't need Unicode / UTF-8 then you might use a collation such as windows-1252 but that, of course, limits what characters you can store to just the 256 characters in windows-1252.

permanent link

answered 2 days ago

John%20Smirnios's gravatar image

John Smirnios
8.9k377110
accept rate: 40%

I can't tell about that particular umlaut/case issue, however, you can use a fitting collation independent of your global database setting with the help of the builtin COMPARE function and the "collation tailoring" feature:

create table test (ch varchar(5));

insert into test values ('a');
insert into test values ('A');
insert into test values ('ä'); -- add accented char (umlaut)
insert into test values ('Á');

-- lists all four chars in a case-insensitive database
select * from test
where ch = 'A';

-- Test for equality based on UCA collation - lists just 'A'
select * from test
where compare(ch, 'A', 'UCA(locale=en;case=respect;accent=respect)') = 0;

-- Test for equality based on default 1252LATIN1 with case respect
-- - list 'A' and 'Ä' here (i.e. does ignore accents)
select * from test
where compare(ch, 'A', '1252LATIN1(case=respect)') = 0;
permanent link

answered 14 Jun, 11:34

Volker%20Barth's gravatar image

Volker Barth
30.6k304455662
accept rate: 32%

edited 14 Jun, 11:36

Hi Volker

Thanks for your answer. I recognise I must be more specific. What I mean is that if I have names written as 'MÜLLER' as well as 'Müller' in the database, I do not get both of them with the WHERE clause like 'Müller'. If I use 'MÜLLER', I get all names written in capitals, but not the ones written as CamelCase. And vice à versa. The database settings are (N)CHAR case sensitivity: Ignore (see screenshot).

I tried with your suggestion (your example works of course), but I can't figure out how to apply it on my SELECT, giving all 'Müller' as result, however there are stored in the database.

But what bothers me is that although the database is set to case sensitivity ignore, an upper case Umlaut does not "translate" in a lower case Umlaut.

(15 Jun, 05:29) robert
Replies hidden

Please answer the comments to your question. If you are using, for example, UTF8BIN collation rather than UCA, the behaviour is expected.

See http://dcx.sybase.com/index.html#sqla170/en/html/3bccdcfe6c5f10148f50801e84c733a2.html

(15 Jun, 05:50) John Smirnios

John, still on sybase.com? :)


So you basically say, one needs to use the Unicode Collation Algorithm (UCA) to get proper case conversion for characters beyond the 7-bit English characters, like the umlauts, right?

(15 Jun, 06:53) Volker Barth

Hi John

I hopefully did answer it with the above?

Datatype correction of my answer: The datatype is VARCHAR(255).

Thanks and regards, Robert

(2 days ago) robert

The "legacy" collations for SQL Anywhere can do case conversion for single-byte characters. That includes all characters for collations such as windows-1252. In UTF8, accented characters are multibyte characters and require ICU to do case conversion.

(2 days ago) John Smirnios

Ah yes, I should've known that, since I have never ever had problems converting umlauts with v5.5 and above using those legacy collations and (VAR)CHAR data types, and of course Germans use those a lot...:)

So my statement would only be true for UTF8.

(2 days ago) Volker Barth
showing 1 of 6 show all flat view

Try waving a dead chicken over the keyboard... in this case, LCASE ( ... )

Alas, I can't reproduce this error on a new V17 database on Windows; all the result sets are the same:

CREATE TABLE persons ( name VARCHAR ( 10 ) );
INSERT persons VALUES ( 'Müller' );
INSERT persons VALUES ( 'MÜLLER' );
COMMIT;
SELECT @@VERSION, name FROM persons WHERE name LIKE 'Müller%'
SELECT @@VERSION, name FROM persons WHERE name LIKE 'MÜLLER%'
SELECT @@VERSION, name FROM persons WHERE name LIKE 'MÜller%'
SELECT @@VERSION, name FROM persons WHERE name LIKE LCASE ( 'MÜller%' );
SELECT @@VERSION, name FROM persons WHERE LCASE ( name ) LIKE LCASE ( 'MÜller%' );

@@VERSION,name
'17.0.4.2053','Müller'
'17.0.4.2053','MÜLLER'
permanent link

answered 15 Jun, 05:49

Breck%20Carter's gravatar image

Breck Carter
25.7k428592852
accept rate: 20%

LCASE may also not work -- depending on the collation being used. UTF8BIN cannot do case conversion beyond the ASCII range.

(15 Jun, 05:51) John Smirnios
Replies hidden

Hi Breck

With

SELECT @@VERSION, name FROM persons WHERE name LIKE 'Müller%'

I expected to get the 2 records with 'Müller' and 'MÜLLER' but only get 'Müller'.

(2 days ago) robert
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×85
×14
×14

question asked: 14 Jun, 11:21

question was seen: 109 times

last updated: 2 days ago