Please be aware that the SAP SQL Anywhere Forum will be shut down on August 29th, 2024 when all it's content will be migrated to the SAP Community.

Hi

I found that upper case umlauts in a SELECT does not lead to a result (set). Following SELECT works as expected, i. e. results in some rows displayed.

SELECT name
FROM persons
WHERE name LIKE 'Müller%'

Changing only the lower case ü to an upper case Ü, and no rows are displayed any more. Any idea why this behaviour?

Regards, Robert

SQL Anywhere 17.0.4.2129 macOS 10.12.5 Java 1.8.0_131

asked 14 Jun '17, 11:21

robert's gravatar image

robert
853475468
accept rate: 0%

edited 14 Jun '17, 11:37

Volker%20Barth's gravatar image

Volker Barth
40.5k365556827

How did you initialize your database? What character set, collation, case sensitivity, etc? Run dbinfo on your database and post the output.

(14 Jun '17, 11:27) Mark Culp
Replies hidden

Also, what is the datatype of 'name': CHAR or NCHAR?

(14 Jun '17, 11:28) John Smirnios

Hi Mark

Excuse me for being so late. I was away. I run dbinfo and the following is the header part I got as result:

SQL Anywhere Information Utility Version 17.0.4.2129
Database  : /Applications/SQLAnywhere17/databases/Hades.db
Log file  : /Applications/SQLAnywhere17/databases/Hades.log
Log mirror: none
Free pages: 5445
Page size : 4096
Encrypted : No
Strings padded with blanks for comparisons: No
CHAR collation sequence: UTF8(CaseSensitivity=Ignore)
CHAR character set encoding: UTF-8
NCHAR collation sequence: UCA(CaseSensitivity=Ignore;AccentSensitivity=Ignore;PunctuationSensitivity=Primary)
NCHAR character set encoding: UTF-8
Database checksums enabled: Yes
Encrypted tables supported: No
Current timeline GUID: 2403a5a0-7023-4916-a7d0-220512338fa3

Best regards, Robert

(20 Jun '17, 09:10) robert

Hi John

The datatype is CHAR(255)

best regards, Robert

(20 Jun '17, 09:11) robert

Additional Info about set options: sort_collation: 51 (default: Internal)

and, but has nothing to do with the problem I assume: date_format: dd.MM.yyyy date_order: dmy

Regards, Robert

(20 Jun '17, 09:34) robert

FWIW, I tested with a fresh v17 database with (almost) identical settings, particularly using CHAR with UTF-8 and UCA collation:

QL Anywhere Information Utility Version 17.0.7.3382
Database  : d:\data\Chartest\Chartest.db
Log file  : d:\data\Chartest\CharTest.log
Log mirror: none
Page size : 4096
Encrypted : No
Strings padded with blanks for comparisons: No
CHAR collation sequence: UCA(CaseSensitivity=Ignore;AccentSensitivity=Ignore;PunctuationSensitivity=Primary)
CHAR character set encoding: UTF-8
NCHAR collation sequence: UCA(CaseSensitivity=Ignore;AccentSensitivity=Ignore;PunctuationSensitivity=Primary)
NCHAR character set encoding: UTF-8
Database checksums enabled: Yes
Encrypted tables supported: No
Current timeline GUID: 520ce3ac-df3b-4948-8cea-cae5409731c6

and then used Breck's test script, and it always listed both "Müller" and "MÜLLER", as expected for UCA collations...


What I'm not aware of is the default for "AccentSensitivity" and "PunctuationSensitivity" for the UCA CHAR collation, although I choose the default values, and that are "Ignore" resp. "Primary" - in your DBINFO, those properties are not listed. Don't know whether that makes a difference, but as stated originally, you can easily use collation tailoring to test different collations without to have to create or rebuild databases.


EDIT:

Sorry, I fully missed the difference in the CHAR collation (CHAR collation sequence: UTF8 vs. UCA). So that tells that John's answer w.r.t. "UTF8BIN" does apply here. Ingore my previous paragraph about UCA tailoring...

You can rebuild the database based on a newly created database with a CREATE DATABASE statement like

CREATE DATABASE ...
COLLATION 'UCA'
NCHAR COLLATION 'UCA' ...

possibly adding collation tailoring options for your (German?) locale...

As John has told, in case you can do with just 256 different characters (not uncommon with German), you can also use COLLATION 'UCA' ENCODING 'CP1252'.

(20 Jun '17, 11:14) Volker Barth

Hi Volker

Thanks for the dbinfo. That difference seems to be the solution. Do you know hot to add the UCA(...) information to my database(s)? I am not aware I specified it at creation time. Is it possible to add it when rebuilding the database?

Regards, Robert

(23 Jun '17, 10:31) robert

Thanks for pointing to the 'UCA' possibilities when creating a new database. Will try that. And yes,it's a German locale — Swiss German to be precise;-)

(04 Jul '17, 23:55) robert
Replies hidden

Ah, so you even don't need to store the 'ß', right? :)

(05 Jul '17, 01:03) Volker Barth

Aside: A few days ago German official orthography introduced the uppercase esszet (sharp s), so one might need to try out whether UCA already does handle that new char conversion...

Hm, the forum page apparently cannot yet display that character:)

(05 Jul '17, 07:14) Volker Barth

We hava a complicated developer life with Umlaute ;-)

(12 Jul '17, 03:22) robert
showing 3 of 11 show all flat view

The SQL Anywhere collation "UTF8" (not to be confused with the UTF8 character set which is used to encode the characters) is very similar to UTF8BIN and is a SQL Anywhere legacy collation. It can only do case equivalency on single-byte characters. To sort UTF-8 encoded characters correctly, you must use UCA.

If you don't need Unicode / UTF-8 then you might use a collation such as windows-1252 but that, of course, limits what characters you can store to just the 256 characters in windows-1252.

permanent link

answered 20 Jun '17, 10:36

John%20Smirnios's gravatar image

John Smirnios
12.0k396166
accept rate: 37%

I can't tell about that particular umlaut/case issue, however, you can use a fitting collation independent of your global database setting with the help of the builtin COMPARE function and the "collation tailoring" feature:

create table test (ch varchar(5));

insert into test values ('a');
insert into test values ('A');
insert into test values ('ä'); -- add accented char (umlaut)
insert into test values ('Á');

-- lists all four chars in a case-insensitive database
select * from test
where ch = 'A';

-- Test for equality based on UCA collation - lists just 'A'
select * from test
where compare(ch, 'A', 'UCA(locale=en;case=respect;accent=respect)') = 0;

-- Test for equality based on default 1252LATIN1 with case respect
-- - list 'A' and 'Ä' here (i.e. does ignore accents)
select * from test
where compare(ch, 'A', '1252LATIN1(case=respect)') = 0;
permanent link

answered 14 Jun '17, 11:34

Volker%20Barth's gravatar image

Volker Barth
40.5k365556827
accept rate: 34%

edited 14 Jun '17, 11:36

Hi Volker

Thanks for your answer. I recognise I must be more specific. What I mean is that if I have names written as 'MÜLLER' as well as 'Müller' in the database, I do not get both of them with the WHERE clause like 'Müller'. If I use 'MÜLLER', I get all names written in capitals, but not the ones written as CamelCase. And vice à versa. The database settings are (N)CHAR case sensitivity: Ignore (see screenshot).

I tried with your suggestion (your example works of course), but I can't figure out how to apply it on my SELECT, giving all 'Müller' as result, however there are stored in the database.

But what bothers me is that although the database is set to case sensitivity ignore, an upper case Umlaut does not "translate" in a lower case Umlaut.

(15 Jun '17, 05:29) robert
Replies hidden

Please answer the comments to your question. If you are using, for example, UTF8BIN collation rather than UCA, the behaviour is expected.

See http://dcx.sybase.com/index.html#sqla170/en/html/3bccdcfe6c5f10148f50801e84c733a2.html

(15 Jun '17, 05:50) John Smirnios

John, still on sybase.com? :)


So you basically say, one needs to use the Unicode Collation Algorithm (UCA) to get proper case conversion for characters beyond the 7-bit English characters, like the umlauts, right?

(15 Jun '17, 06:53) Volker Barth

Hi John

I hopefully did answer it with the above?

Datatype correction of my answer: The datatype is VARCHAR(255).

Thanks and regards, Robert

(20 Jun '17, 09:28) robert

The "legacy" collations for SQL Anywhere can do case conversion for single-byte characters. That includes all characters for collations such as windows-1252. In UTF8, accented characters are multibyte characters and require ICU to do case conversion.

(20 Jun '17, 10:31) John Smirnios

Ah yes, I should've known that, since I have never ever had problems converting umlauts with v5.5 and above using those legacy collations and (VAR)CHAR data types, and of course Germans use those a lot...:)

So my statement would only be true for UTF8.

(20 Jun '17, 10:37) Volker Barth
showing 1 of 6 show all flat view

Try waving a dead chicken over the keyboard... in this case, LCASE ( ... )

Alas, I can't reproduce this error on a new V17 database on Windows; all the result sets are the same:

CREATE TABLE persons ( name VARCHAR ( 10 ) );
INSERT persons VALUES ( 'Müller' );
INSERT persons VALUES ( 'MÜLLER' );
COMMIT;
SELECT @@VERSION, name FROM persons WHERE name LIKE 'Müller%'
SELECT @@VERSION, name FROM persons WHERE name LIKE 'MÜLLER%'
SELECT @@VERSION, name FROM persons WHERE name LIKE 'MÜller%'
SELECT @@VERSION, name FROM persons WHERE name LIKE LCASE ( 'MÜller%' );
SELECT @@VERSION, name FROM persons WHERE LCASE ( name ) LIKE LCASE ( 'MÜller%' );

@@VERSION,name
'17.0.4.2053','Müller'
'17.0.4.2053','MÜLLER'
permanent link

answered 15 Jun '17, 05:49

Breck%20Carter's gravatar image

Breck Carter
32.5k5417271050
accept rate: 20%

LCASE may also not work -- depending on the collation being used. UTF8BIN cannot do case conversion beyond the ASCII range.

(15 Jun '17, 05:51) John Smirnios
Replies hidden

Hi Breck

With

SELECT @@VERSION, name FROM persons WHERE name LIKE 'Müller%'

I expected to get the 2 records with 'Müller' and 'MÜLLER' but only get 'Müller'.

(20 Jun '17, 09:26) robert
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×248
×17
×15

question asked: 14 Jun '17, 11:21

question was seen: 3,563 times

last updated: 12 Jul '17, 03:22