The forum will be down for scheduled maintenance on Saturday, March 4 beginning at 10am EST. Actual downtime is unknown but may be several hours.

Preface: This has come up based on this question on UTF encodings...

Both "DBINIT -l" and "DBINIT -le" (and its variant "DBINIT -le+") can be used to list recommended collation sequences or available character set encodings, respectively.

Given the fact that the terms collations/encodings/code pages tend to get messed up (at least by me!), it would be nice if the different outputs of these commands could be explained a bit more.

Are my assumptions correct?

  1. "DBINIT -l" is meant to help to list those collations that can be used as CHAR collation with DBINIT -z. (If so, how does it tell the default collation, "1252LATIN1" in my case?)
  2. "DBINIT -le" is meant to specify which names for encodings are understood by SA. These are used for different purposes and are not collation names, so they are not usable with DBINIT -z or -zn but possibly with -ze.
  3. "DBINIT -le+" has the same purpose as "DBINIT -le" and lists the same encodings but does list further alternative labels for those (without telling what kind of label taxonomy is uses...)
  4. The "-le" output tells that one can use any value from the same row to get the same encoding.

For example, here's the excerpt on 1252 and UTF-16 from all three commands run with 12.0.1.4216:

  1. DBINIT -l:
    ...
    1252BIN Code Page 1252, Windows Latin 1, binary ordering
    1252LATIN1 Code Page 1252, Windows Latin 1, Western
    ...
    [no entry für UTF-16 meaning it can not be used to store CHAR data]

  2. DBINIT -le:
    SA, IANA, MIME, ICU, JAVA, ASE
    ...
    UTF-16BE, UTF-16BE, UTF-16BE, UTF-16BE, UTF-16BE,
    UTF-16LE, UTF-16LE, UTF-16LE, UTF-16LE, UTF-16LE, utf16
    ...
    windows-1252, windows-1252, , ibm-5348_P100-1997, windows-1252, cp1252

  3. DBINIT -le+:
    SA, IANA, MIME, ICU, JAVA, ASE, ...
    ...
    UTF-16BE, UTF-16BE, UTF-16BE, UTF-16BE, UTF-16BE, , UTF-16OE
    UTF-16LE, UTF-16LE, UTF-16LE, UTF-16LE, UTF-16LE, utf16, UTF-16, UTF-16PE, csUnicode, ucs-2, unicode
    ...
    windows-1252, windows-1252, , ibm-5348_P100-1997, windows-1252, cp1252, ibm-5348

I guess these are just a few starting questions, I18N is such a huge issue... - I hope John will share his deep insight:)

asked 10 Mar '15, 13:46

Volker%20Barth's gravatar image

Volker Barth
29.5k291441646
accept rate: 32%

edited 10 Mar '15, 13:47

alt text

(10 Mar '15, 15:22) Breck Carter

Is that a picture of me Breck?

(10 Mar '15, 15:30) Nick Elson S...
Replies hidden
1

It's how I feel when faced with collations and character sets.

(10 Mar '15, 15:54) Breck Carter

I'm not sure what deep insight I can provide: your assumptions sound correct to me!

Collations/encodings/code pages all have distinct, precise meanings as do the related concepts of character set & glpyh. Wikipedia can give you the gory details but, in short, "collation" refers to how things are sorted, "encoding" refers to the binary representation of characters, "character set" refers to what logical characters can be represented, a "code page" is essentially a character set and its encoding together, and "glyph" refers to how it appears when printed (kind of like a font). For example, UTF8 is one encoding of the Unicode character set and it can be sorted with UCA (Unicode Collation Algorithm) collation. Often, encoding and character set go hand-in-hand as well. For example, windows-1252 defines both a set of characters and how they are encoded.

SQLA has historically been imprecise when using these terms and that can lead to confusion. For example, all "database collations" in SQLA other than UCA also imply an encoding and a character set. For example, the 1252LATIN1 collation has a well-defined sort order (collation) and the data is always stored in the windows-1252 encoding which can represent the Latin Alphabet character set (plus numerous other characters).

permanent link

answered 10 Mar '15, 16:17

John%20Smirnios's gravatar image

John Smirnios
8.7k377106
accept rate: 40%

SQLA has historically been imprecise when using these terms and that can lead to confusion. For example, all "database collations" in SQLA other than UCA also imply an encoding and a character set.

I guess that's one of the main points of my previous confusion. Fortunately, as soon I have tried to ask that question, some more insight has been gained - so thanks to Nick and you for the clarification and the historical background!

(11 Mar '15, 05:44) Volker Barth

-ze is used to set the character set used to store character data (ie. encoding) and -z is used to set the collation (ie. for sorting and also used for comparisons generally).

-le and -l map to those concepts roughly as you would expect (and -le+ and -l+ only list a larger set of possible values though some may be historical). Those list possible collations and charsets support by the software and seems to include entries supported by both the server and client library support; not just those supported by the server.

As such, some encodings listed cannot be used as database character sets but client side setting for conversion/translation to the client. For example, any of the 16bit UTF transforms listed there are examples of that; ie. cannot be used to create the database. [I admit it would have been less confusing for dbinit to limit the values shown to just those supported in the creation of the database.]

Some of those can also be used with -zn for NCHAR data but again the only Unicode encoding supported is UTF-8 ... even for that.

For reference Code Pages refer to character sets or encodings, mostly historical for OEM and ANSI encodings, and are often tied to specific combinations or locale/language, one hardware/OS vendor Apple, Microsoft, IBM, HP, ... but they are still in use and will be for the foreseeable future but switching to Unicode leaps over that history; as you already know.

These do show up in the -le output. Expanded mapping to other equivalent names for same show up in the -le+ output [along with a few more additional lines]

For completeness only: I believe -l+ just show deprecated collations in addition to the supported ones listed for -l. You should probably avoid those.

I hope that helps add some clarity to what can be a difficult topic for many.

permanent link

answered 10 Mar '15, 15:29

Nick%20Elson%20SAP%20SQL%20Anywhere's gravatar image

Nick Elson S...
6.5k2895
accept rate: 29%

[I admit it would have been less confusing for dbinit to limit the values shown to just those supported in the creation of the database.]

Yes, that may be the main point of my confusion - but the picture starts getting clearer:)

(11 Mar '15, 04:52) Volker Barth
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×13
×8
×7

question asked: 10 Mar '15, 13:46

question was seen: 633 times

last updated: 11 Mar '15, 05:44