AFAIK, in older versions, the XML data type was just a "built-in user-defined" data type based on LONG VARCHAR, i.e. it was listed in SYS.SYSUSERTYPE. In current versions, it seems to be a distinct data type listed in SYS.SYSDOMAIN. Questions:
|
Some good questions and a few possible challenges too Volker. . . . The rest of this is just a few observations that may help. . . . While there is a distinct XML Data Type, such columns ... storage-wise, described-as, and encoding-wise ... are very much the same as a Long Varchar; even being stored (and if required converted) to the charset encoding of the database. This much you seem very aware of. And there-in lies one challenge, which charset encoding you should use for a database when the the standard itself indicates such a wide variety of valid, possible encodings My first instinct would be similar to yours (with your suggestion of LONG NVARCHAR) of storing XML as UTF-8 ... but the fact that a full XML document can specify it's own encoding such conversion could be problematic if you do not control that encoding at the source. Parsing the content and normalizing the Encoding attribute to match the stored encoding may be required to make that truly robust. So yes you could store you xml content as LONG NVARCHAR instead though the easier solution may be to initialize the database to be UTF-8 for both CHAR-type and NCHAR-types. {assuming all your content is UTF-8). There may even be a reason to go with long binary in some situations as well, but the good new may be that the OpenXML( ) operator can deal with any of those (if correctly managed). -- Note none of this should be considered a recommendation without first adding in some -- practical considerations and application requirements into a proper analysis. So that may only leave the question of "What you give up when you do not use the XML Data Type, explicitly? ... Again I'm not the expert. There may be some benefit to certain ADO.Net functions but I suspect the biggest advantage would be when using the various XML*( ) functions and not needing to cast other datatypes to XML. The need to cast and possibly needing to convert data encoding can be a bit challenging. Do share your thoughts and experiences though. HTH Many thanks, Nick, that answers my questions - at least for the moment:). And particular thanks to include an answer for the inherent question "What you give up when you do not use the XML Data Type, explicitly?". As I'm using the XML_% functions, I'll prefer to stick with the XML data type, too. So the most important point seems to be aware of the endoding issue at all, methinks...
(11 Feb '16, 04:01)
Volker Barth
FWIW, is there a builtin function that would help with this task? (AFAIK, DBISQL itself tries to determine a script's encoding based on several rules.) I'm asking for the general case - in my current task, the encodings are known beforehand.
(11 Feb '16, 05:41)
Volker Barth
|
I hadn't found a single function, builtin or otherwise for the most critical part of this. The 'encoding="..."' attribute (if explicitly specified) exists only in the "<?xml ...?>" header and is not an xml parseable <tag>. So that only leaves string functions such as this query
Of course where that becomes important is how, when and where you are accessing and how much charset conversion has already happened. Good luck OK, yes, that's helpful for XML data. (I guess I had hoped that SA would possibly have a general "attempt to determine a text file's encoding" function but that would certainly be very difficult to implement...)
(12 Feb '16, 02:42)
Volker Barth
1
I agree it is a bit awkward but then again I suspect much of the XML manipulation that goes on probably happens in applications and applicaton servers and not as much in the back-end store technology. And I suspect we are not very much different from other RDBMS in that regard. But I do suspect much of the XML/SOAP/DISH/WDSL-type support happens inside of a CLR (external environment in our case) or WEB/Content-serving type environment so you might want to see what benefits that approach brings; 'outside of the SQL DB server context' pèr sé. Maybe a little too suspicious (or skeptical) but maybe that opens up your investigations a bit wider; and hopefully a more complete answer can be had that way. I'm pretty sure you will find lots of C# and VB.Net code examples (as well as Java+DOM/SAX examples, and C++ XML code samplets too) that can shed some light on many requirements.
(12 Feb '16, 08:56)
Nick Elson S...
|
By current versions, you mean anything after version 9.
I'm not sure I have meant that but now I know:)
FWIW, I have not found anything regarding that change in the v10 "What's New" doc sections...
I did the same search. I did find a .NET change where we noted a new SADbType.Xml data type. But that's about the only hint concerning this change.