I'm trying to make SOAP calls to a third-party web service which expects the data sent in UTF-8 encoding. I'm using with the default charset "windows-1252" and NCHAR charset "UTF-8".

For simple SOAP parameters (say UUIDs), that works when declaring the parameters as nvarchar types, such as

create function WSF_GetRequestStatus(ReqId nvarchar(36))
    returns xml
    url 'http://...'
    type 'SOAP:DOC'
    set 'SOAP(OP=GetRequestStatusRequest)'
    header 'SOAPAction:"urn:GetRequestStatus"'
    namespace 'http://...'
    SET 'SOAP(VERSION=1.1)';

This results in a request with the desired header

Content-Type: text/xml; charset=utf-8

instead of the default

Content-Type: text/xml; charset=windows-1252

I think this is explained here in the docs although that doc topic seems to refer to web service procedures:

When one or more parameters are of type NCHAR, NVARCHAR, LONG NVARCHAR, or NTEXT then the response output is in UTF8.

However, other SOAP calls require XML parameters. These XML documents are already available as UTF-8 encoded files, so I load them via xp_read_file and access them via a variable of datatype XML, too.

Therefore I have tried to use a SOAP function with a simple XML datatype, such as

create function WSF_StartRequest(ReqData xml)
    returns xml
    url 'http://...'
    type 'SOAP:DOC'

However, then the request tells to use the unfitting "windows-1252" charset, and so the SOPA request is rejected.

When testing with a different tool (SoapUI), the same request is described with charset utf-8 and succeeds.

Question: How can I force SA to declare the request with the desired charset?

asked 29 Jan '16, 03:31

Volker%20Barth's gravatar image

Volker Barth
accept rate: 32%

edited 29 Jan '16, 03:42

The Accept-Charset: header is obviously changing when you change the type from SOAP to HTTP. I beleive you will find our default for Accept-Charset will track with the common practice for the request 'protocol'. (A wireshark trace will confirm that.)

From my understanding SOAP will be biased towards Unicode (and thus UTF-8), in contrast to HTTP, which is biased for 8-bit/ISO/Ansii-type charsets (iso-8859-1 would be typical) but in your case that would be your CHAR* charsetset; ie. CP1252.

You can probably force that header to change to suite, but that is just a preference, and the charset that will get returned is something that is negotiated at each request. It is decided by a best-first-match of what is accepted and what the content server is willing/able to supply.

(29 Jan '16, 09:59) Nick Elson S...
Replies hidden

Note, it's not the Accept-Charset header that is the problem but the Content-Type header. And even if I add the charset attribute to the Content-Type header when using a HTTP request, such as

create function WSF_StartRequest(XmlPayload long nvarchar)
    returns xml
    url 'http://...'
    type 'HTTP:POST:text/xml; charset=utf-8'
    header 'SOAPAction:"urn:StartRequest"'

that attribute gets added to the Content-Type header, but SA will add another Charset attribute automatically, making that an invalid header

Content-Type: text/xml; charset=utf-8; charset=windows-1252

So modifying that header is not really an option.

As stated, the third-party webservice does require an utf-8 encoded request, so there's no chance of "negotiating charsets" for me.

(29 Jan '16, 10:22) Volker Barth

FWIW, as a workaround, I took a different approach (comparable to the sample from the doc topic "Variables supplied in SOAP envelopes") and used a XML variable to build the whole SOAP envelope and thereby concatenated the values of the existing XML documents. Then I put the whole xml as parameter for a function with the following declaration:

create function WSF_StartRequest(XmlPayload long nvarchar)
    returns xml
    url 'http://...'
    type 'HTTP:POST:text/xml'
    header 'SOAPAction:"urn:StartRequest"'
    set 'HTTP(VERSION=1.1)';

This seems to work.

UPDATE: However, when doing so, in case the actual parameter is using one of the CHAR/VARCHAR/LONG VARCHAR/XML data types and the database charset is not UTF-8 but a typical one-byte-charset (Windows1252 in my case), then the actual parameter MUST be in that charset, too, because it will be converted to UTF-8 when supplied to the web client function.

I.e. if the original XML contents is already in UTF-8, you will need to re-convert it to the database charset, as with

set xmlData = csconvert(xmlData, 'char_charset', 'nchar_charset');

before you use that to construct the "xmlPayload" and use the latter with the web client function.

That is explained here in the docs:

When one or more parameters are of type NCHAR, NVARCHAR, LONG NVARCHAR, or NTEXT then the response output is in UTF8. If the client database uses the UTF-8 character encoding, there is no change in behavior (since NCHAR and CHAR data types are the same). However, if the database does not use the UTF-8 character encoding, then all parameters that are not an NCHAR data type are converted to UTF8. The value of the XML declaration encoding and Content-Type HTTP header will correspond to the character encoding used.

permanent link

answered 29 Jan '16, 06:19

Volker%20Barth's gravatar image

Volker Barth
accept rate: 32%

edited 03 Feb '16, 04:28

FWIW, I guess it might be easier to use UTF-8 as the default charset for the database (i.e. at "dbinit-time"), as then data type XML will use UTF-8 by default, too, and no character conversion will be required.

(03 Feb '16, 04:31) Volker Barth
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 29 Jan '16, 03:31

question was seen: 379 times

last updated: 03 Feb '16, 04:31