Is there a simple way to remove accented characters from a string?

For example àéêöhello! needs to be converted to aeeohello!

In SQL server I would use Collate to accomplish this in one line. However, I am not able to find a solution that is working other than using multiple replace statements or something of the sort.

This is in a database where multiple languages are used and stored. I need to convert Portuguese to English when the data is being exported, but don't have the budget for a translator.

Your help is appreciated.

asked 07 Oct '15, 15:57

warfdpn's gravatar image

warfdpn
26112
accept rate: 0%


I guess the CSCONVERT function may be of help here.

UPDATE:

Say, converting to ASCII will turn accented characters into their unaccented base version - however it may also replace other characters by probably undesired results:

select cast(csconvert('àéêöhello!', 'ascii') as varchar)

returns "aeeohello!" as desired but a test with German unlauts reveals a misfit for the "ß":

select cast(csconvert('aäboöuüssßÄÖÜ', 'ascii') as varchar)

returns "aaboouussAOU" (note: The character before the upper "A" is ASCII 26 and is not dispalyed here as a non-printable char).


Aside: What I had originally in mind with my suggestion was some kind of "collation tailoring" with csconvert(), i.e. using something like "AccentSensitivity=ignore" in the target charset.

For example you can use collation tailoring to ignore or respect accents when doing comparisons:

select compare('àéêöhello!', 'aeeohello!', 'uca(AccentSensitivity=ignore)'), compare('àéêöhello!', 'aeeohello!', 'uca(AccentSensitivity=Respect)')

returns 0 (= identical) vs. 1.

However, this will not work here as csconvert() uses a charset (and not a collation) as its arguments.

permanent link

answered 07 Oct '15, 16:31

Volker%20Barth's gravatar image

Volker Barth
29.6k294444650
accept rate: 32%

edited 12 Oct '15, 03:32

Quite an interesting solution. Here was my request:

select 'aáéíóúb' as a, TO_CHAR(CSCONVERT(a, 'ASCII'))

And this was the response: aáéíóúb,ab

What is funny is that the characters between a and b are displayed as spaces (indeed they are a\x1a\x1a\x1a\x1a\x1ab). Internet suggests a more generic solution: http://stackoverflow.com/questions/4024072/how-to-remove-accents-and-all-chars-a-z-in-sql-server

(09 Oct '15, 05:28) Vlad

Hm, I like your answer. But I do not know, why when I use TO_CHAR, I get a different result. Should I use CAST instead?

(12 Oct '15, 16:07) Vlad
Replies hidden

Well, as you do use the TO_CHAR() function without its second parameter, as to the docs, it should exactly do the same as a cast (though to char instead to varchar but that should not matter) - from the docs:

If source-charset-name is not specified, then this function is equivalent to:

CAST( string-expression AS CHAR );

Possibly we both do use a different database charset ("Windows-1252" in my case)?

(13 Oct '15, 02:22) Volker Barth

Just a very wild guess:

AFAIK collation tailoring is based on the ICU library used within SQL Anywhere (dbicu.dll).

As ICU does also allow to transform data (including decomposition of characters), you might be able to make use of the ICU library yourself. That may be more reliable than a home-brewn list of "special characters to convert".

permanent link

answered 13 Oct '15, 03:17

Volker%20Barth's gravatar image

Volker Barth
29.6k294444650
accept rate: 32%

edited 13 Oct '15, 07:37

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×13
×10

question asked: 07 Oct '15, 15:57

question was seen: 617 times

last updated: 13 Oct '15, 07:37