The forum will experience an outage sometime between February 10 at 7:00pm EST and February 12 at 11:59 EST for installation of security updates. The actual time and duration of the outage are unknown but attempts will be made to minimize the downtime. We apologize for any inconvenience.

Hello, I have a very big XML file, that I would like to chop a bit and then process every chunk separately, but I do not know how to handle the xsi:noNamespaceSchemaLocation attribute because now if I try to ignore it, I get an error.

First how I divide the big XML into smaller ones (query No. 1):

SELECT *
FROM OPENXML('<CONSOLIDATED_LIST xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../resources/xsd/consolodated_list.xsd" dateGenerated="2015-02-19T16:30:27.001-05:00">
    <INDIVIDUALS>
        <INDIVIDUAL xsi:noNamespaceSchemaLocation="file:/C:/Program%20Files/Oxygen%20XML%20Editor%2015/frameworks/sc-sanctions/xsd/sc-sanctions.xsd">
            <FIRST_NAME>MOHAMMAD BAQER</FIRST_NAME>
            <SECOND_NAME>ZOLQADR</SECOND_NAME>
            <INDIVIDUAL_ALIAS>
                <ALIAS_NAME>Mohammad Bakr Zolqadr</ALIAS_NAME>
            </INDIVIDUAL_ALIAS>
        </INDIVIDUAL>
        <INDIVIDUAL xsi:noNamespaceSchemaLocation="file:/C:/Program%20Files/Oxygen%20XML%20Editor%2015/frameworks/sc-sanctions/xsd/sc-sanctions.xsd">
            <FIRST_NAME>ALI AKBAR</FIRST_NAME>
            <SECOND_NAME>TABATABAEI</SECOND_NAME>
            <INDIVIDUAL_ALIAS>
                <ALIAS_NAME>Sayed Akbar Tahmaesebi</ALIAS_NAME>
            </INDIVIDUAL_ALIAS>
            <INDIVIDUAL_ALIAS>
                <ALIAS_NAME>Ali Akber Tabatabaei</ALIAS_NAME>
            </INDIVIDUAL_ALIAS>
        </INDIVIDUAL>
    </INDIVIDUALS>
</CONSOLIDATED_LIST>', '/CONSOLIDATED_LIST/INDIVIDUALS/INDIVIDUAL')
WITH(first_name varchar(200) 'FIRST_NAME',
     second_name varchar(200) 'SECOND_NAME',
     individual_xml long varchar '@mp:xmltext')

XML is much bigger in real life, it's just an example.

Then I try to get some stuff from one of the smaller ones (query No. 2):

SELECT *
FROM OPENXML('<INDIVIDUAL xsi:noNamespaceSchemaLocation="file:/C:/Program%20Files/Oxygen%20XML%20Editor%2015/frameworks/sc-sanctions/xsd/sc-sanctions.xsd">
        <FIRST_NAME>ALI AKBAR</FIRST_NAME>
        <SECOND_NAME>TABATABAEI</SECOND_NAME>
        <INDIVIDUAL_ALIAS>
            <ALIAS_NAME>Sayed Akbar Tahmaesebi</ALIAS_NAME>
        </INDIVIDUAL_ALIAS>
        <INDIVIDUAL_ALIAS>
            <ALIAS_NAME>Ali Akber Tabatabaei</ALIAS_NAME>
        </INDIVIDUAL_ALIAS>
    </INDIVIDUAL>', '/INDIVIDUAL/INDIVIDUAL_ALIAS')
WITH(alias_name varchar(200) 'ALIAS_NAME',
     individual_alias_xml long varchar '@mp:xmltext')

And I get an error Undeclared namespace prefix: xsi.

I tried declaring namespace like written in documentation, no luck, my guess of course that I am doing it wrong, so if someone could show me how to do that, I would be very grateful.

Also bonus question, maybe there is a way to easily remove said attribute from XML using SA?

I am using SA12 if that matters.

asked 24 Feb '15, 09:31

Valdas's gravatar image

Valdas
28781221
accept rate: 80%

edited 25 Feb '15, 08:30


I have a very big XML file, that I would like to chop a bit and then process every chunk separately...

What exactly are you trying to do? With openxml() you should be able to filter a big XML file directly (via XPath and/or a WHERE clause), so you should not need to split up the XML document beforehand by yourself. This is somewhat comparable to a SELECT statement on a big table where you can restrict the result set by using an appropriate WHERE clause instead of removing rows from the table...

How I divide the big XML into smaller ones?

So I would suggest to not modify the XML document but to use a WHERE clause to then filter the generated result set.


In case you want to shorten the XML document (primarily for testing/development) and it does contain very many "<INDIVIDUAL\>" nodes, you might be more successful if you remove most of them but preserve the underlying document structure including the root. That should prevent the namespace error, too.

As to the "xsi:noNamespaceSchemaLocation" attribute: AFAIK openxml() does not validate the schema at all, so that attribute should not matter here. To cite from the docs:

The XML parser used by the openxml system procedure is non-validating, and does not read the external DTD subset or external parameter entities.

permanent link

answered 24 Feb '15, 10:46

Volker%20Barth's gravatar image

Volker Barth
29.3k287438645
accept rate: 32%

edited 24 Feb '15, 10:51

I have a database full of individuals, I need to find a match between them and ones included in big XML and save particular individuals XML for later use. I do not think that it would be a good idea to save whole XML for every user and filter it every time I need data for specific individual.

As to the "openxml() does not validate the schema at all" I am not sure if I understand this sentence, why then I get the error?

(25 Feb '15, 01:33) Valdas
Replies hidden

I guess I still do not really understand your task. Do you have to check whether the particular database individuals are contained in the XML document or vice versa?

However, if you want to search the big XML for particular elements and you can use an equality test for that, that should do with a XPath expression - cf. this FAQ. Something like "openxml(..., '/CONSOLIDATED_LIST/INDIVIDUALS/INDIVIDUAL[SECOND_NAME='Valdas']) with"...


If your task is to extract just one or a few nodes from a big XML document and store the resulting XML, I'm not sure whether openxml() is the tool of choice here. That may be more appropriate for basic XML tools using XPath, XQuery or the like (note: That's not my domain...). Apparently the situation would be different if you can generate the "single-indidivual XML document" from the database contents itself.


W.r.t. validation: In my understanding the XML parser (AFAIK Apache Xerces) will check for syntax errors but will not use the specified DTD or XML schema to verify whether the elements comply with that specified schema, so it will not check if elements and attributes are named and typed and organized as specified.

(25 Feb '15, 04:00) Volker Barth

I need to save XML node <individual/> for later use, and I need to do that using SA12, maybe I can use XPath, XQuery or some other tool for that, but I am pretty sure result will be the same, <individual> with xsi:noNamespaceSchemaLocation attribute.

Later the saved XML needs to be opened with openxml() again, and I can not do that because of attribute that I do not know how to handle.

So the question is, is it possible to use openxml() with XML shown in query No. 2 and if possible, then how? First query is provided just for reference, maybe there is vital information for namespace declaration?..

(25 Feb '15, 08:30) Valdas
1

Well, just test this with a browser: When using Firefox, the 2nd document raises the same "undeclard namespace prefix" error while the first opens up fine.

So you'll have to add the "xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" to the 2nd document, as well, just before " xsi:noNamespaceSchemaLocation". Then it should open up, too.

(25 Feb '15, 08:39) Volker Barth

Thank You, this is a valid solution, but not the best because I get the XML directly from United Nations and would love to keep it unmodified, can't I use namespace-declaration for that?

If modifying the XML is the only solution, then I need a way to do that, simple REPLACE() doesn't sound like a professional move :)

(25 Feb '15, 09:02) Valdas

Hm, it looks like your question is not really related to SQL Anywhere but to XML in general, so other forums may be more appropriate...

That being said, in case you need to refer to namespaces then you will need to have an namespace declaration within your XML document, "unmodified" or not. If you do not want to modify the single INDIVIDUAL nodes then you could simply leave the original "CONSOLIDATED_LIST" and "INDIVIDUALS" nodes as they are in the extracted document, so the latter would only consist of one single "INDIVIDUAL" node. In other words: A "single individual document" would look exactly like the big one but would not contain many but only one INDIVIDUAL node.

But as stated, I still do not really know what you are trying to achieve so my hints might not be too helpful.

(25 Feb '15, 09:20) Volker Barth

And how can I achieve single "INDIVIDUAL" node with the original "CONSOLIDATED_LIST" and "INDIVIDUALS" nodes using SQL Anywhere? I tried mentioned XPatch filter but metarpoperty @mp:xmltext gives back the whole XML anyway...

(25 Feb '15, 09:38) Valdas
showing 1 of 7 show all flat view
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×404
×10

question asked: 24 Feb '15, 09:31

question was seen: 902 times

last updated: 25 Feb '15, 09:38