Please be aware that the content in SAP SQL Anywhere Forum will be migrated to the SAP Community in June and this forum will be retired.

Hello!

When using "openxml" on results of the Google Geocoding API, I receive an XML parser error with the message "Invalid control character" for certain address details. Here's an example:

with "CallGoogleapi" defined as

FUNCTION "CallGoogleapi"( in "sLocation" long varchar ) 
returns xml
url '!sLocation!'
type 'HTTP:GET'

What can I do, so that openxml does not geerate this error?

Thanks in advance!

Dirk

asked 15 Dec '17, 04:01

Bitwerk's gravatar image

Bitwerk
56117
accept rate: 0%

edited 20 Dec '17, 06:47

Volker%20Barth's gravatar image

Volker Barth
40.2k362550822

1

There is the ENCODING option in the help article: http://dcx.sap.com/index.html#sqla170/en/html/817205cf6ce21014a949c790fd6b0608.html Can you please try to use UTF-8 in this operator?

(16 Dec '17, 08:23) Vlad
Replies hidden
1

Try it with one exclamation: url '!sLocation'

(17 Dec '17, 08:54) Breck Carter
Replies hidden

Thanks for the hint. Unfortunately, I continue to receive the same error message.

select * from openxml("CallGoogleapi"('https://maps.googleapis.com/maps/api/geocode/xml?address=pasica+75000+tuzla&language=de@region=de'),'/GeocodeResponse/result/geometry/location/lat') with("lat" long nvarchar 'text()') OPTION (encoding="UTF-8")

returns still the same error.

(18 Dec '17, 01:54) Bitwerk

Thanks, but unfortunately this did not help either.

(18 Dec '17, 02:01) Bitwerk

I suggest to have a look at the XML document returned from the API, ideally in an XML aware editor / viewer which visualizes the document structure.

HTH
Volker
DB-TecKnowledgy

permanent link

answered 15 Dec '17, 20:25

Volker%20DB-TecKy's gravatar image

Volker DB-TecKy
5453715
accept rate: 25%

I forgot to mention that this call normally works. For "berlin" I can calculate lat/log without oproblems:

"https://maps.googleapis.com/maps/api/geocode/xml?address=berlin&language=en"

"......<lat>52.5200066</lat><lng>13.4049540</lng>....."

In the query "pasica+75000+tuzla" sa-17 reports the parsing error at character 150, line 6, column 41:

1    < ?xml version="1.0" encoding="UTF-8"?>
2    -< GeocodeResponse>
3     < status>OK< /status>
4     < result>
5        < type>route< /type>
6        < formatted_address>Hadži Hasanage Pašića, Tuzla 75000, Bosnia and Herzegovina< /formatted_address>
7        < address_component>
8          < long_name>Hadži Hasanage Pašića< /long_name>
          .....

Saving the result to a file, I see the following result with an unicode character:

  <formatted_address>Hadži Hasanage Paši**/u001a**a, Tuzla 75000, Bosnia and Herzegovina</formatted_address>

In line 6, column 41 there is a hex "1A00" or U+001A. Apparently, openxml has a problem with this character. Would I have to do a conversion here?

Kind regards, Dirk

permanent link

answered 16 Dec '17, 05:11

Bitwerk's gravatar image

Bitwerk
56117
accept rate: 0%

1

I'm not sufficiently expert in XML to decide whether this is correct but my guess is that 'encoding="UTF-8"' makes it valid. Unless your database's char collation is UTF-8, it might be worth trying to use nvarchar instead of varchar in your definitions.
At least, this information could help to locate the problem.
Cheers
Volker

(16 Dec '17, 07:30) Volker DB-TecKy
Replies hidden

I tried both, but I still get the XML parsing error at the same position.

(18 Dec '17, 03:23) Bitwerk

Does it work when you replace the mask "/u" with "\u", as that is the expected encoding when using UNISTR to generate NCHAR data based on their code point?

(18 Dec '17, 04:43) Volker Barth

What encoding does the file use?

(18 Dec '17, 04:44) Volker Barth

I called "SELECT "CallMapsGoogleapi"('https://maps.googleapis.com/maps/api/geocode/xml?address=pasica+75000+tuzla&language=de@region=de')" and saved the rsult directly in Interactive SQL to a file?

Interactive SQL used UTF-8.

(18 Dec '17, 04:53) Bitwerk

There is no /u or \u in the output. I can't copy the character to the editor. It is the 20th character in this string: "Hadži Hasanage Pašia" (between i and a at the end).

As soon as I save the comment, the UTF-8 character disappears.

(18 Dec '17, 04:57) Bitwerk
showing 3 of 6 show all flat view

I think you should use (long) nvarchar data types for the OPEN XML output columns (i.e. in your case the "lat" column). According to the docs, the XML data will then be interpreted as NCHAR data, as well - to cite:

The xml-data is parsed directly in the NCHAR encoding if there are any NCHAR columns in the output. The xpath and namespaces arguments are also converted and parsed in the NCHAR encoding.

permanent link

answered 17 Dec '17, 11:56

Volker%20Barth's gravatar image

Volker Barth
40.2k362550822
accept rate: 34%

edited 17 Dec '17, 12:19

I tried this, but I still get the error message.

(18 Dec '17, 03:39) Bitwerk

I was finally able to solve this... I don't even know the correct word... 'thing'. I don't know where to start, because the error is unclear, and the documentation looks incomplete for Unicode-related questions.

There are important links that explain that SqlAnywhere (when it cannot convert a string from one encoding to another) replaces the multi-byte character with '0x1a'. There is an option (on_charset_conversion_failure) that you can try to switch to Error to see if SA faces some difficulties with an implicit conversion. And I forgot to mention that you cannot use this option, because the default collation is LATIN1251 (at least on my machine) - 'single byte'.

If you execute the SELECT statement in SCJVIEW, you will see that SA returns a different string than you submitted:

SELECT  'Pašića Hadži' AS chr1,   'Жпа' AS chr2,
       N'Pašića Hadži' AS nchr1, N'Жпа' AS nchr2;

Here is what I see on my screen:
:(

But after I have created the DB file with the following command line options:

dbinit -dba dba,sql -mpl 3 -ze utf8 -zn UTF8BIN -z UTF8BIN anotherDB

I was able to get what I really wanted:

:)

Even you code works on a fresh DB file:

CREATE OR REPLACE FUNCTION "dba"."CallGoogleApi"( IN "sLocation" LONG NVARCHAR)
RETURNS LONG BINARY
URL '!sLocation'
TYPE 'HTTP:GET'

SELECT * FROM OPENXML("CallGoogleapi"('https://maps.googleapis.com/maps/api/geocode/xml?address=pasica+75000+tuzla&language=de@region=de'), '/GeocodeResponse/result/geometry/location/lat')
              WITH("lat" LONG NVARCHAR'text()')

The Grand Budapest Hotel

p.s. don't repeat my faults... don't work with Unicode in SQL. This is painful.

permanent link

answered 19 Dec '17, 09:13

Vlad's gravatar image

Vlad
2.5k91127
accept rate: 16%

1

Note that all SQL statements are converted and transmitted to the server in Database CHAR charset and characters that cannot be mapped to the Database CHAR charset get replaced by the '0x1a' character. This is why you are getting the results that you are seeing.

In a similar way, HTTP results are expected to be in Database CHAR charset as Volker pointed out in his response above.

(19 Dec '17, 09:27) Mark Culp
Replies hidden
1

So my conclusion is correct that a SOAP type request is required for NCHAR data when the database does use a single-byte CHAR collation?

I agree with Vlad that Unicode handling with web client procedures could be explained better in the docs... :)

(19 Dec '17, 10:06) Volker Barth
1

I would like to add that I do not see any conversion to the DB CHAR charset. I will explain why:
1. When the CallGoogleApi is executed, I get a temporary value (the server response) as LONG BINARY (one-to-one, no conversion)
2. I pass this BINARY to OPENXML that should accept it as well (indeed I can also add ENCODING if I want, but the documentation says that USING VALUE can read data from LONG BINARY)
3. Then I select all fields/records from the OPENXML result set.

In other words, I do not store values in the DB. I do not use single-byte functions. The code should work with temporary variables/values as in any other programming language. I just submit my query to the server and expect that it will process it well.

I agree that only after I plan to insert the obtained UTF-8 string into a table, the DB server should warn me that implicit character conversion is used and either decline the INSERT or replace characters.

(19 Dec '17, 10:33) Vlad

Thanks a lot for your help!

With a new database with UTF8 as CHAR charset there are no problems.

With a new database with default chasets (LATIN1251/UCA) I still habe the same error. Although theoretically no CHAR conversion should take place, it seems to take place somewhere.

I also tried "TYPE 'SOAP'" in CallGoogleApi and get the same results.

Unfortunately, the Google Geocoding API does not seem to have an option to influence the encoding of the result.

Too bad that there seems to be no solution. In any case, changing the CHAR charset to UTF8 is not possible.

(20 Dec '17, 03:08) Bitwerk
Replies hidden

I agree that no conversion should happen when no persistence is involved, but it seems that the entire DB server (including even temporary results, function calls) uses the DB charset.

I'd better avoid using OPENXML, and try to work with JSON instead (as a long string). E.g.

https://maps.googleapis.com/maps/api/geocode/json?address=pasica+75000+tuzla

or the most reliable way is to use strongly typed languages such as Java.

I like the idea to use DB as a universal platform for everything, but not today.

(20 Dec '17, 05:07) Vlad

Hm, I do use SOAP web client functions with UTF-8 result sets in a database with default CHAR collation (1252Latin1), so I think that should work here, too.

Oops, I have to correct myself: These are SOAP calls but I use type "HTTP:POST:text/xml" with long nvarchar parameters, and I compose the SOAP envelope internally. It was the contrary restriction: type SOAP:DOC with XML parameters does not work with NCHAR data as the XML datatype is based on CHAR, not NCHAR...

See that FAQ for further information.

(20 Dec '17, 06:07) Volker Barth

I have noticed that simply not all characters from Bitwerk's payload are available in 1252Latin1. E.g. there is an article somewhere in help, where the word 'Straße' is used as an example how to work with Unicode in SA, but this example is wrong, because the letter 'ß' exists in a single-byte codepage Windows-1252.

If they took the real Unicode word (e.g. 'Pašića Hadži' or something from Hebrew/Chinese/something_else), their example will look completely different. I'd better see how to manipulate with Unicode strings in SA on real examples, such as: if we do this, we get that.

(20 Dec '17, 10:13) Vlad

Oh, I forgot what I wanted to tell :)

Yes, you can work with the SOAP web service, but who knows, maybe you use characters from your current charset, and everything works fine. If your web service returns you something that doesn't exist in your charset, your code might fail, or you see a blank square.

(20 Dec '17, 10:16) Vlad

Hm, currently I have no time for further tests but I hope to do some in the next weeks...

(20 Dec '17, 11:51) Volker Barth
showing 2 of 9 show all flat view

I tried it again with newly created databases. When using the default collation ("COLLATION '1252LATIN1' NCHAR COLLATION 'UCA'") I still get the error.

When using "COLLATION 'UCA' NCHAR COLLATION 'UCA'" everything works fine!

I changed everything to "NVARCHAR" and tried it again with default collation: I still get the error.

select * from openxml("CallGoogleapi"('https://maps.googleapis.com/maps/api/geocode/xml?address=pasica+75000+tuzla&language=de@region=de'),'/GeocodeResponse/result/geometry/location/lat') with("lat" long nvarchar 'text()') OPTION (ENCODING='UTF-8')

FUNCTION "DBA"."CallGoogleapi"( in "sLocation" long nvarchar ) 
returns long nvarchar
url '!sLocation'
type 'HTTP:GET'

Shouldn't the call also be possible with the default collation?

permanent link

answered 18 Dec '17, 04:38

Bitwerk's gravatar image

Bitwerk
56117
accept rate: 0%

I think the problem is not the OPENXML call but the web client function. It seems to convert the returned value to the CHAR datatype (so it works when you use UTF-8 as CHAR encoding). That restriction is documented here under "data-type", if my assumption is correct:

Only SOAP requests support the transmission of typed data such as FLOAT, INT, and so on. HTTP requests support the transmission of strings only, so you are limited to CHAR types.

So you might have to use a web client function of type SOAP...

permanent link

answered 18 Dec '17, 07:40

Volker%20Barth's gravatar image

Volker Barth
40.2k362550822
accept rate: 34%

converted 18 Dec '17, 07:41

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×247
×48
×22

question asked: 15 Dec '17, 04:01

question was seen: 2,891 times

last updated: 20 Dec '17, 11:51