Tree/FAQ
From Code Synthesis Wiki
Revision as of 08:00, 5 August 2007 Boris (Talk | contribs) (→General - Add a Q&A about thread safety) ← Previous diff |
Revision as of 08:02, 5 August 2007 Boris (Talk | contribs) (→How do I parse an XML instance to a Xerces-C++ DOM document?) Next diff → |
||
Line 41: | Line 41: | ||
There are several methods to programmatically specify schema locations to be used in validation. For example, you can use the <code>xml_schema::properties</code> argument to parsing functions to specify schema locations instead of those specified with the <code>xsi::schemaLocation</code> and <code>xsi::noNamespaceSchemaLocation</code> attributes in XML documents. For more information refer to Section 5.1, [http://www.codesynthesis.com/projects/xsd/documentation/cxx/tree/guide/#5.1 "XML Schema Validation and Searching"] in the [http://www.codesynthesis.com/projects/xsd/documentation/cxx/tree/guide/ C++/Tree Mapping Getting Started Guide]. | There are several methods to programmatically specify schema locations to be used in validation. For example, you can use the <code>xml_schema::properties</code> argument to parsing functions to specify schema locations instead of those specified with the <code>xsi::schemaLocation</code> and <code>xsi::noNamespaceSchemaLocation</code> attributes in XML documents. For more information refer to Section 5.1, [http://www.codesynthesis.com/projects/xsd/documentation/cxx/tree/guide/#5.1 "XML Schema Validation and Searching"] in the [http://www.codesynthesis.com/projects/xsd/documentation/cxx/tree/guide/ C++/Tree Mapping Getting Started Guide]. | ||
- | === How do I parse an XML instance to a Xerces-C++ DOM document? === | + | === How do I parse an XML document to a Xerces-C++ DOM document? === |
While this question is not exactly about [[XSD]] or the [[Tree|C++/Tree mapping]] and it is covered in the [http://xml.apache.org/xerces-c/program.html Xerces-C++ Programming Guide], this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides | While this question is not exactly about [[XSD]] or the [[Tree|C++/Tree mapping]] and it is covered in the [http://xml.apache.org/xerces-c/program.html Xerces-C++ Programming Guide], this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides |
Revision as of 08:02, 5 August 2007
General
Is the generated code thread-safe?
XSD-generated code is thread-safe in the sense that you can use different instantiations of the object model in several threads concurrently. This is possible due to the generated code not relying on any writable global variables. If you need to share the same object between several threads then you will need to provide some form of synchronization. If you also would like to call parsing and/or serialization functions from several threads potentially concurrently, then you will need to make sure the Xerces-C++ runtime is initialized and terminated only once. For more information on this topic refer to Section 3.4, "Thread Safety" in the C++/Tree Mapping Getting Started Guide.
What character encoding does the generated code use?
XSD has built-in support for two character types: char
and wchar_t
. You can select the character type with the --char-type
command line option. The default character type is char
. The character encoding depends on the character
type used.
For the char
character type the encoding is UTF-8. In XSD prior to version 2.3.1, the so-called "local code page" encoding was used via the Xerces-C++ XMLString::transcode
functions. On some platforms (e.g., UNIX-like) you could set the local code page with the call to setlocale
. On other platforms (e.g., Windows), the local code page is preset and cannot be changed. For backwards compatibility XSD allows you to use the local code page encoding by defining the XSD_USE_LCP
preprocessor macro when compiling your source code.
For the wchar_t
character type the encoding is automatically selected between UTF-16 and UTF-32/UCS-4 depending on the size of the wchar_t type. On some platforms (e.g., Windows with Visual C++ and AIX with IBM XL C++) wchar_t is 2 bytes long. For these platforms the encoding is UTF-16. On other platforms wchar_t is 4 bytes long and UTF-32/UCS-4 is used.
Is it possible to change names assigned to anonymous types?
By default XSD uses names of enclosing elements and attributes to derive names for anonymous types. You can alter this behavior with the --anonymous-regex
option as described in the XSD Compiler Command Line Manual. For example, the following option appends Type
to names of all anonymous types:
--anonymous-regex '%.* .* (.+/)*(.+)%$2Type%'
Alternatively, you can make names of anonymous types start with capital letters by using the following option:
--anonymous-regex '%.* .* (.+/)*(.+)%\u$2%'
Parsing
Why do I get "error: Unknown element 'root-element'" when I try to parse a valid XML document?
This usually means that the parser could not find a schema to validate your XML document against. Since by default validation is turned on, this results in an error. For more information on various ways to resolve this see Section 5.1, "XML Schema Validation and Searching" in the C++/Tree Mapping Getting Started Guide.
How do I disable validation of XML documents?
To disable validation you will need to pass the xml_schema::flags::dont_validate
flag to one of the parsing
functions, as described in Section 5.1, "XML Schema Validation and Searching" in the C++/Tree Mapping Getting Started Guide.
How do I specify a schema location other than in an XML document?
There are several methods to programmatically specify schema locations to be used in validation. For example, you can use the xml_schema::properties
argument to parsing functions to specify schema locations instead of those specified with the xsi::schemaLocation
and xsi::noNamespaceSchemaLocation
attributes in XML documents. For more information refer to Section 5.1, "XML Schema Validation and Searching" in the C++/Tree Mapping Getting Started Guide.
How do I parse an XML document to a Xerces-C++ DOM document?
While this question is not exactly about XSD or the C++/Tree mapping and it is covered in the Xerces-C++ Programming Guide, this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable.
#include <istream> #include <xercesc/dom/DOM.hpp> #include <xercesc/util/XMLUniDefs.hpp> #include <xercesc/framework/Wrapper4InputSource.hpp> #include <xsd/cxx/xml/string.hxx> #include <xsd/cxx/xml/dom/elements.hxx> #include <xsd/cxx/xml/dom/bits/error-handler-proxy.hxx> #include <xsd/cxx/xml/sax/std-input-source.hxx> #include <xsd/cxx/tree/exceptions.hxx> #include <xsd/cxx/tree/error-handler.hxx> xsd::cxx::xml::dom::auto_ptr<xercesc::DOMDocument> parse (std::istream& is) { using namespace xercesc; namespace xml = xsd::cxx::xml; namespace tree = xsd::cxx::tree; const bool validate (true); const XMLCh ls_id [] = {chLatin_L, chLatin_S, chNull}; // Get an implementation of the Load-Store (LS) interface. // DOMImplementation* impl ( DOMImplementationRegistry::getDOMImplementation (ls_id)); // Create a DOMBuilder. // xml::dom::auto_ptr<DOMBuilder> parser ( impl->createDOMBuilder(DOMImplementationLS::MODE_SYNCHRONOUS, 0)); // Discard comment nodes in the document. // parser->setFeature (XMLUni::fgDOMComments, false); // Enable datatype normalization. // parser->setFeature (XMLUni::fgDOMDatatypeNormalization, true); // Do not create EntityReference nodes in the DOM tree. No // EntityReference nodes will be created, only the nodes // corresponding to their fully expanded substitution text // will be created. // parser->setFeature (XMLUni::fgDOMEntities, false); // Perform Namespace processing. // parser->setFeature (XMLUni::fgDOMNamespaces, true); // Do not include ignorable whitespace in the DOM tree. // parser->setFeature (XMLUni::fgDOMWhitespaceInElementContent, false); // Enable/Disable validation. // parser->setFeature (XMLUni::fgDOMValidation, validate); parser->setFeature (XMLUni::fgXercesSchema, validate); parser->setFeature (XMLUni::fgXercesSchemaFullChecking, validate); // We will release the DOM document ourselves. // parser->setFeature (XMLUni::fgXercesUserAdoptsDOMDocument, true); // Set error handler. // tree::error_handler<char> eh; xml::dom::bits::error_handler_proxy<char> ehp (eh); parser->setErrorHandler (&ehp); // Prepare input stream. // xml::sax::std_input_source isrc (is); Wrapper4InputSource wrap (&isrc, false); xml::dom::auto_ptr<DOMDocument> doc (parser->parse (wrap)); eh.throw_if_failed<tree::parsing<char> > (); return doc; }
Below is a simple program that uses the above code.
#include <fstream> #include <xercesc/util/PlatformUtils.hpp> int main (int argc, char* argv[]) { using namespace xercesc; namespace xml = xsd::cxx::xml; XMLPlatformUtils::Initialize (); { std::ifstream ifs (argv[1]); xml::dom::auto_ptr<DOMDocument> doc (parse (ifs)); } XMLPlatformUtils::Terminate (); }
How do I handle XML data of an unknown type?
Here we assume that you need to handle XML instances that can be of several predefined types. There is no informtaion that distinguishes one instance from the other other than the root element name.
Suppose we have two root elements defined in our schema: foo and bar with types Foo and Bar, respectively. There are two ways to handle this situation. The first is quite straightforward but slow. It boils down to calling each parsing function in a sequence expecting all except one to fail. The slow part comes from the need to re-parse XML to DOM for each invocation. The following code outlines this approach:
while (true) { try { std::auto_ptr<Foo> f (foo ("instance.xml")); // Try to parse as Foo. // Do something useful with f. break; } catch (xml_schema::unexpected_element const&) { // Try the next function. } try { std::auto_ptr<Bar> b (bar ("instance.xml")); // Try to parse as Bar. // Do something useful with b. break; } catch (xml_schema::unexpected_element const&) { // Try the next function. } // This instance is of some other type. }
The second approach involves splitting the parsing process into two stages: XML to DOM and DOM to Tree. After the XML to DOM stage we peek at the root element and decide which parsing function to call:
#include <xercesc/dom/DOM.hpp> #include <xsd/cxx/xml/string.hxx> using namespace xercesc; DOMDocument* dom = ... // Parse XML into DOM. DOMElement* root = dom->getDocumentElement (); std::string name (xsd::cxx::xml::transcode<char> (root->getLocalName ())); if (name == "foo") { std::auto_ptr<Foo> f (foo (*dom)); // Parse dom to Foo. // Do something useful with f. } else if (name == "bar") { std::auto_ptr<Bar> b (bar (*dom)); // Parse dom to Bar. // Do something useful with b. }
For more information on parsing XML to DOM see How do I parse an XML instance to a Xerces-C++ DOM document?
Serialization
How do I create an empty Xerces-C++ DOM document?
While this question is not exactly about XSD or the C++/Tree mapping and it is covered in the Xerces-C++ Programming Guide, this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable.
#include <xercesc/dom/DOM.hpp> #include <xsd/cxx/xml/string.hxx> #include <xsd/cxx/xml/dom/elements.hxx> xsd::cxx::xml::dom::auto_ptr<xercesc::DOMDocument> create (const std::string& root_element_name, const std::string& root_element_namespace = "", const std::string& root_element_namespace_prefix = ""); xsd::cxx::xml::dom::auto_ptr<xercesc::DOMDocument> create (const std::string& name, const std::string& ns, const std::string& prefix) { using namespace xercesc; namespace xml = xsd::cxx::xml; const XMLCh ls_id [] = {chLatin_L, chLatin_S, chNull}; // Get an implementation of the Load-Store (LS) interface. // DOMImplementation* impl ( DOMImplementationRegistry::getDOMImplementation (ls_id)); xml::dom::auto_ptr<DOMDocument> doc ( impl->createDocument ( (ns.empty () ? 0 : xml::string (ns).c_str ()), xml::string ((prefix.empty () ? name : prefix + ':' + name)).c_str (), 0)); return doc; }
The following code fragment shows how to use this function. It also shows how to establish additional namespace-prefix mappings
and set the schemaLocation
attribute:
#include <xercesc/util/PlatformUtils.hpp> int main (int argc, char* argv[]) { using namespace xercesc; namespace xml = xsd::cxx::xml; XMLPlatformUtils::Initialize (); { xml::dom::auto_ptr<DOMDocument> doc ( create ("example", "http://www.example.com/xmlns/example", "e")); DOMElement* root (doc->getDocumentElement ()); root->setAttributeNS ( xml::string ("http://www.w3.org/2000/xmlns/").c_str (), xml::string ("xmlns:xsi").c_str (), xml::string ("http://www.w3.org/2001/XMLSchema-instance").c_str ()); root->setAttributeNS ( xml::string ("http://www.w3.org/2001/XMLSchema-instance").c_str (), xml::string ("xsi:schemaLocation").c_str (), xml::string ("http://www.example.com/xmlns/example example.xsd").c_str ()); } XMLPlatformUtils::Terminate (); }
The call to create
above creates a DOM document with the example
element as its root. The example
element is in the http://www.example.com/xmlns/example
namespace to which we assigned the e
namespace prefix.
How do I serialize a Xerces-C++ DOM document to XML?
While this question is not exactly about XSD or the C++/Tree mapping and it is covered in the Xerces-C++ Programming Guide, this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable.
#include <ostream> #include <xercesc/dom/DOM.hpp> #include <xercesc/util/XMLUniDefs.hpp> #include <xercesc/framework/Wrapper4InputSource.hpp> #include <xsd/cxx/xml/string.hxx> #include <xsd/cxx/xml/dom/elements.hxx> #include <xsd/cxx/xml/dom/serialization.hxx> #include <xsd/cxx/xml/dom/bits/error-handler-proxy.hxx> #include <xsd/cxx/tree/exceptions.hxx> #include <xsd/cxx/tree/error-handler.hxx> void serialize (std::ostream& os, const xercesc::DOMDocument& doc, const std::string& encoding = "UTF-8") { using namespace xercesc; namespace xml = xsd::cxx::xml; namespace tree = xsd::cxx::tree; const XMLCh ls_id [] = {chLatin_L, chLatin_S, chNull}; // Get an implementation of the Load-Store (LS) interface. // DOMImplementation* impl ( DOMImplementationRegistry::getDOMImplementation (ls_id)); // Create a DOMWriter. // xml::dom::auto_ptr<DOMWriter> writer (impl->createDOMWriter ()); // Set error handler. // tree::error_handler<char> eh; xml::dom::bits::error_handler_proxy<char> ehp (eh); writer->setErrorHandler (&ehp); // Set encoding. // writer->setEncoding(xml::string (encoding).c_str ()); // Set some generally nice features. // writer->setFeature (XMLUni::fgDOMWRTDiscardDefaultContent, true); writer->setFeature (XMLUni::fgDOMWRTFormatPrettyPrint, true); // Adapt ostream to format target and serialize. // xml::dom::ostream_format_target oft (os); writer->writeNode (&oft, doc); eh.throw_if_failed<tree::parsing<char> > (); }
This function can be used like this:
#include <fstream> #include <xercesc/util/PlatformUtils.hpp> int main (int argc, char* argv[]) { using namespace xercesc; namespace xml = xsd::cxx::xml; XMLPlatformUtils::Initialize (); { DOMDocument& doc = ... std::ofstream ofs (argv[1]); serialize (ofs, *doc); } XMLPlatformUtils::Terminate (); }