Tree/FAQ

From Code Synthesis Wiki

< Tree(Difference between revisions)
Jump to: navigation, search
Revision as of 07:42, 9 February 2009
Boris (Talk | contribs)
(Serialization - Add a Q&A about speeding up serialization)
← Previous diff
Current revision
Boris (Talk | contribs)
(How can I speed up serialization?)
Line 11: Line 11:
type used. type used.
-For the <code>char</code> character type the encoding is UTF-8. In XSD prior to version 2.3.1, the so-called "local code page" encoding was used via the Xerces-C++ <code>XMLString::transcode</code> functions. On some platforms (e.g., UNIX-like) you could set the local code page with the call to <code>setlocale</code>. On other platforms (e.g., Windows), the local code page is preset and cannot be changed. For backwards compatibility XSD allows you to use the local code page encoding by defining the <code>XSD_USE_LCP</code> preprocessor macro when compiling your source code.+For the <code>char</code> character type the default application encoding is UTF-8. However, you can configure the character encoding that should be used by the object model using the <code>--char-encoding</code> options. As an argument to this option you can specify <code>iso8859-1</code>, <code>lcp</code> (Xerces-C++ local code page), and <code>custom</code>.
-For the <code>wchar_t</code> character type the encoding is automatically selected between UTF-16 and UTF-32/UCS-4 depending on the size of the wchar_t type. On some platforms (e.g., Windows with Visual C++ and AIX with IBM XL C++) wchar_t is 2 bytes long. For these platforms the encoding is UTF-16. On other platforms wchar_t is 4 bytes long and UTF-32/UCS-4 is used.+The <code>custom</code> option allows you to support a custom encoding. For this to work you will need to implement the transcoder interface for your encoding (see the <code>libxsd/xsd/cxx/xml/char-*</code> files for examples) and include this implementation’s header at the beginning of the generated header files (see the <code>--hxx-prologue</code> option).
 + 
 +When the local code page encoding is specified the Xerces-C++ <code>XMLString::transcode</code> functions are used for character conversion. On some platforms (e.g., UNIX-like) you can set the local code page with the call to <code>setlocale</code>. On other platforms (e.g., Windows), the local code page is preset and cannot be changed.
 + 
 +For the <code>wchar_t</code> character type the encoding is automatically selected between UTF-16 and UTF-32/UCS-4 depending on the size of the wchar_t type. On some platforms (e.g., Windows with Visual C++ and AIX with IBM XL C++) wchar_t is 2 bytes long. For these platforms the encoding is UTF-16. On other platforms wchar_t is 4 bytes long and UTF-32 is used.
 + 
 +=== How can I make sure my application catches every possible exception? ===
 + 
 +Exceptions that can be thrown during the "normal" execution of the XSD-generated code are derived from <code>xml_schema::exception</code> (which, in turn, is derived from <code>std::exception</code>) and catching this exception is generally sufficient. By "normal" we mean that such exceptions are caused by things that are outside of the application's control (such as invalid XML documents) rather than bugs in the application or the generated code itself. Generally, every application should be prepared to encounter and handle such "normal" exceptions.
 + 
 +There is, however, a handful of exceptions that the generated code may throw which would indicate a bug in the application or the generated code. Such exceptions are currently not derived from <code>xml_schema::exception</code> (we are planning to change this in the next release of XSD and make all exceptions, without exception, derive from <code>xml_schema::exception</code>). If your application must handle absolutely every exception thrown by the generated code, then you will need to add a few specific exceptions to your <code>catch</code> set. The following code example shows how to achieve this for XSD 3.3.0:
 + 
 + try
 + {
 + //
 + // Parse, work with object model, serialize.
 + //
 + }
 + catch (const xml_schema::exception& e)
 + {
 + cerr << e << endl;
 + return 1;
 + }
 + catch (const xml_schema::properties::argument&)
 + {
 + cerr << "invalid property argument (empty namespace or location)" << endl;
 + return 1;
 + }
 + catch (const xsd::cxx::xml::invalid_utf16_string&)
 + {
 + cerr << "invalid UTF-16 text in DOM model" << endl;
 + return 1;
 + }
 + catch (const xsd::cxx::xml::invalid_utf8_string&)
 + {
 + cerr << "invalid UTF-8 text in object model" << endl;
 + return 1;
 + }
 + 
 +If your object model uses the ISO-8859-1 encoding instead of UTF-8, you will need to replace the last <code>catch</code> block with this one:
 + 
 + catch (const xsd::cxx::xml::iso8859_1_unrepresentable&)
 + {
 + cerr << "XML data is not representable in ISO-8859-1" << endl;
 + return 1;
 + }
 + 
 +It is also theoretically possible that the underlying XML parser (Xerces-C++) throw one of its exceptions. This would normally indicate a bug
 +in XSD or Xerces-C++ or an unrecoverable error such as the out of memory situation (in the next release of XSD we are also planning to catch all
 +such exceptions and translate them to an XSD exception derived from <code>xml_schema::exception</code>). If you would like to catch these exceptions as well, you will need to initialize the Xerces-C++ runtime yourself and add <code>catch</code> blocks for the following exceptions:
 + 
 +<code>xercesc::XMLException (util/XMLException.hpp)</code>
 +:Most Xerces-C++ exceptions are derived from this base exception (except for the following three). It has some member functions which should allow you to query a text message.
 + 
 +<code>xercesc::OutOfMemoryException (util/OutOfMemoryException.hpp)</code>
 +:This one is special in that construction of XMLException may trigger this condition so OutOfMemoryException cannot derive from XMLException.
 + 
 +<code>xercesc::DOMException (dom/DOMException.hpp)</code>
 +:This is the base for DOM exceptions. If it is thrown from XSD-generated code, then that means that either the Xerces-C++ parser/serializer tried to perform an illegal DOM operation (bug in Xerces-C++) or the XSD-generated code did so (bug in XSD). It also has some member functions which should allow you to query a text message.
 + 
 +<code>xercesc::SAXException (sax/SAXException.hpp)</code>
 +:Similar to DOM exception. While SAX is not used by C++/Tree mapping directly, you may want to handle this exception if you decide to use the streaming approach (see the <code>streaming</code> example for details).
 + 
 +The following code example shows how we can achieve this:
 + 
 + int r (0);
 +
 + // Initialize the Xerces-C++ runtime since the exceptions that
 + // may be thrown by Xerces-C++ should be caught before the
 + // runtime is terminated.
 + //
 + xercesc::XMLPlatformUtils::Initialize ();
 +
 + try
 + {
 + //
 + // Pass the xml_schema::flags::dont_initialize to parsing and
 + // serialization functions.
 + //
 + }
 + //
 + // Catch-blocks from the previous example go here.
 + //
 + catch (const xercesc::XMLException&)
 + {
 + cerr << "unexpected Xerces-C++ exception" << endl;
 + r = 1;
 + }
 + catch (const xercesc::OutOfMemoryException&)
 + {
 + cerr << "Xerces-C++ out of memory exception" << endl;
 + r = 1;
 + }
 + catch (const xercesc::DOMException&)
 + {
 + cerr << "unexpected Xerces-C++ DOM exception" << endl;
 + r = 1;
 + }
 + catch (const xercesc::SAXException&) // optional
 + {
 + cerr << "unexpected Xerces-C++ SAX exception" << endl;
 + r = 1;
 + }
 +
 + xercesc::XMLPlatformUtils::Terminate ();
 + return r;
 + 
 +Finally, the generated code uses the C++ standard library so a number of standard exceptions can be thrown such as <code>std::bad_alloc</code>. To make your application completely exception-tight, you will probably also want to catch <code>std::exception</code>.
=== Is it possible to change names assigned to anonymous types? === === Is it possible to change names assigned to anonymous types? ===
Line 27: Line 134:
== Parsing == == Parsing ==
-=== Why do I get "error: Unknown element 'root-element'" when I try to parse a valid XML document? ===+=== Why do I get "error: no declaration found for element 'root-element'" when I try to parse a valid XML document? ===
This usually means that the parser could not find a schema to validate your XML document against. Since by default validation is This usually means that the parser could not find a schema to validate your XML document against. Since by default validation is
Line 43: Line 150:
=== How do I parse an XML document to a Xerces-C++ DOM document? === === How do I parse an XML document to a Xerces-C++ DOM document? ===
-While this question is not exactly about [[XSD]] or the [[Tree|C++/Tree mapping]] and it is covered in the [http://xml.apache.org/xerces-c/program.html Xerces-C++ Programming Guide], this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable. The code presented in this entry can also be found in the <code>multiroot</code> example in the <code>examples/cxx/tree/</code> directory of+While this question is not exactly about [[XSD]] or the [[Tree|C++/Tree mapping]] and it is covered in the [http://xml.apache.org/xerces-c/program-3.html Xerces-C++ Programming Guide], this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable. The code presented in this entry can also be found in the <code>multiroot</code> example in the <code>examples/cxx/tree/</code> directory of
XSD distribution. XSD distribution.
Line 78: Line 185:
DOMImplementationRegistry::getDOMImplementation (ls_id)); DOMImplementationRegistry::getDOMImplementation (ls_id));
- #if _XERCES_VERSION >= 30000 
-  
- // Xerces-C++ 3.0.0 and later. 
- // 
xml::dom::auto_ptr<DOMLSParser> parser ( xml::dom::auto_ptr<DOMLSParser> parser (
impl->createLSParser (DOMImplementationLS::MODE_SYNCHRONOUS, 0)); impl->createLSParser (DOMImplementationLS::MODE_SYNCHRONOUS, 0));
Line 115: Line 218:
conf->setParameter (XMLUni::fgXercesSchema, validate); conf->setParameter (XMLUni::fgXercesSchema, validate);
conf->setParameter (XMLUni::fgXercesSchemaFullChecking, false); conf->setParameter (XMLUni::fgXercesSchemaFullChecking, false);
 +
 + // Xerces-C++ 3.1.0 is the first version with working multi import
 + // support.
 + //
 + #if _XERCES_VERSION >= 30100
 + conf->setParameter (XMLUni::fgXercesHandleMultipleImports, true);
 + #endif
// We will release the DOM document ourselves. // We will release the DOM document ourselves.
Line 125: Line 235:
xml::dom::bits::error_handler_proxy<char> ehp (eh); xml::dom::bits::error_handler_proxy<char> ehp (eh);
conf->setParameter (XMLUni::fgDOMErrorHandler, &ehp); conf->setParameter (XMLUni::fgDOMErrorHandler, &ehp);
- +
- #else // _XERCES_VERSION >= 30000+
- +
- // Same as above but for Xerces-C++ 2 series.+
- //+
- xml::dom::auto_ptr<DOMBuilder> parser (+
- impl->createDOMBuilder (DOMImplementationLS::MODE_SYNCHRONOUS, 0));+
- +
- parser->setFeature (XMLUni::fgDOMComments, false);+
- parser->setFeature (XMLUni::fgDOMDatatypeNormalization, true);+
- parser->setFeature (XMLUni::fgDOMEntities, false);+
- parser->setFeature (XMLUni::fgDOMNamespaces, true);+
- parser->setFeature (XMLUni::fgDOMWhitespaceInElementContent, false);+
- parser->setFeature (XMLUni::fgDOMValidation, validate);+
- parser->setFeature (XMLUni::fgXercesSchema, validate);+
- parser->setFeature (XMLUni::fgXercesSchemaFullChecking, false);+
- parser->setFeature (XMLUni::fgXercesUserAdoptsDOMDocument, true);+
- +
- tree::error_handler<char> eh;+
- xml::dom::bits::error_handler_proxy<char> ehp (eh);+
- parser->setErrorHandler (&ehp);+
- +
- #endif // _XERCES_VERSION >= 30000+
- +
// Prepare input stream. // Prepare input stream.
// //
Line 154: Line 241:
Wrapper4InputSource wrap (&isrc, false); Wrapper4InputSource wrap (&isrc, false);
- #if _XERCES_VERSION >= 30000 
xml::dom::auto_ptr<DOMDocument> doc (parser->parse (&wrap)); xml::dom::auto_ptr<DOMDocument> doc (parser->parse (&wrap));
- #else 
- xml::dom::auto_ptr<DOMDocument> doc (parser->parse (wrap)); 
- #endif 
eh.throw_if_failed<tree::parsing<char> > (); eh.throw_if_failed<tree::parsing<char> > ();
Line 255: Line 338:
} }
-For a complete program that uses this technique refer to the <code>multiroot</code> example in the <code>examples/cxx/tree/</code> directory of XSD distribution. For more information on parsing XML to DOM see [[#How_do_I_parse_an_XML_document_to_a_Xerces-C.2B.2B_DOM_document.3F|How do I parse an XML document to a Xerces-C++ DOM document?]]+For a complete program that uses this technique refer to the <code>multiroot</code> example in the <code>examples/cxx/tree/</code> directory of XSD distribution.
 + 
 +If your XML vocabulary has a large number of root elements, then writing and maintaining such code manually quickly becomes burdensome.
 +To overcome this you can instruct the XSD compiler to generate wrapper types instead of parsing/serialization functions for root elements in your vocabulary (<code>--generate-element-type</code> option). You can also request the generation of an element map for uniform parsing/serialization of the element types (<code>--generate-element-map</code> option). For more information on this approach see
 +the <code>messaging</code> example in the <code>examples/cxx/tree/</code> directory of XSD distribution as well as Section 2.9.1, “Element Types” and Section 2.9.2, “Element Map” in the C++/Tree Mapping User Manual.
 + 
 +For more information on parsing XML to DOM see [[#How_do_I_parse_an_XML_document_to_a_Xerces-C.2B.2B_DOM_document.3F|How do I parse an XML document to a Xerces-C++ DOM document?]]
=== How do I parse an XML document that is missing namespace information? === === How do I parse an XML document that is missing namespace information? ===
Line 308: Line 397:
=== How can I speed up parsing? === === How can I speed up parsing? ===
-Parsing functions perform the following steps for each document being parsed. These steps can be moved out and done only once during the application startup:+Parsing functions perform the following steps for each document being parsed:
# Initialization and termination of the Xerces-C++ runtime. # Initialization and termination of the Xerces-C++ runtime.
-# Construction of the parser object.+# Construction and configuration of the parser object.
# If XML Schema validation is enabled, loading and parsing of the schema(s). # If XML Schema validation is enabled, loading and parsing of the schema(s).
-The <code>performance</code> example in the <code>examples/cxx/tree/</code> directory of the XSD distribution shows how to do all this. For more information on how to parse from <code>std::istream</code> instead of the memory buffer, see Q2.4 in this section.+These steps can be moved out and done only once during the application startup. The <code>performance</code> example in the <code>examples/cxx/tree/</code> directory of the XSD distribution shows how to do this. For more information on how to parse from <code>std::istream</code> instead of the memory buffer, see Q2.4 in this section.
== Serialization == == Serialization ==
Line 320: Line 409:
=== How do I create an empty Xerces-C++ DOM document? === === How do I create an empty Xerces-C++ DOM document? ===
-While this question is not exactly about [[XSD]] or the [[Tree|C++/Tree mapping]] and it is covered in the [http://xml.apache.org/xerces-c/program.html Xerces-C++ Programming Guide], this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable.+While this question is not exactly about [[XSD]] or the [[Tree|C++/Tree mapping]] and it is covered in the [http://xml.apache.org/xerces-c/program-3.html Xerces-C++ Programming Guide], this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable.
#include <xercesc/dom/DOM.hpp> #include <xercesc/dom/DOM.hpp>
Line 396: Line 485:
=== How do I serialize a Xerces-C++ DOM document to XML? === === How do I serialize a Xerces-C++ DOM document to XML? ===
-While this question is not exactly about [[XSD]] or the [[Tree|C++/Tree mapping]] and it is covered in the [http://xml.apache.org/xerces-c/program.html Xerces-C++ Programming Guide], this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable.+While this question is not exactly about [[XSD]] or the [[Tree|C++/Tree mapping]] and it is covered in the [http://xml.apache.org/xerces-c/program-3.html Xerces-C++ Programming Guide], this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable.
#include <ostream> #include <ostream>
Line 431: Line 520:
xml::dom::ostream_format_target oft (os); xml::dom::ostream_format_target oft (os);
-  
- #if _XERCES_VERSION >= 30000 
// Create a DOMSerializer. // Create a DOMSerializer.
Line 455: Line 542:
writer->write (&doc, out.get ()); writer->write (&doc, out.get ());
-  
- #else 
-  
- // Create a DOMWriter. 
- // 
- xml::dom::auto_ptr<DOMWriter> writer (impl->createDOMWriter ()); 
-  
- // Set error handler. 
- // 
- writer->setErrorHandler (&ehp); 
-  
- // Set encoding. 
- // 
- writer->setEncoding(xml::string (encoding).c_str ()); 
-  
- // Set some generally nice features. 
- // 
- writer->setFeature (XMLUni::fgDOMWRTDiscardDefaultContent, true); 
- writer->setFeature (XMLUni::fgDOMWRTFormatPrettyPrint, true); 
-  
- writer->writeNode (&oft, doc); 
-  
- #endif 
eh.throw_if_failed<tree::serialization<char> > (); eh.throw_if_failed<tree::serialization<char> > ();
Line 546: Line 610:
# Initialization and termination of the Xerces-C++ runtime. # Initialization and termination of the Xerces-C++ runtime.
# Construction and configuration of the serializer object. # Construction and configuration of the serializer object.
 +# Creation of the DOM document and root element.
These steps can be moved out and done only once during the application startup. The <code>performance</code> example in the <code>examples/cxx/tree/</code> directory of the XSD distribution shows how to do this. For more information on how to serialize to <code>std::ostream</code> instead of the memory buffer, see Q3.2 in this section. These steps can be moved out and done only once during the application startup. The <code>performance</code> example in the <code>examples/cxx/tree/</code> directory of the XSD distribution shows how to do this. For more information on how to serialize to <code>std::ostream</code> instead of the memory buffer, see Q3.2 in this section.

Current revision

Contents

General

Is the generated code thread-safe?

XSD-generated code is thread-safe in the sense that you can use different instantiations of the object model in several threads concurrently. This is possible due to the generated code not relying on any writable global variables. If you need to share the same object between several threads then you will need to provide some form of synchronization. If you also would like to call parsing and/or serialization functions from several threads potentially concurrently, then you will need to make sure the Xerces-C++ runtime is initialized and terminated only once. For more information on this topic refer to Section 3.4, "Thread Safety" in the C++/Tree Mapping Getting Started Guide.

What character encoding does the generated code use?

XSD has built-in support for two character types: char and wchar_t. You can select the character type with the --char-type command line option. The default character type is char. The character encoding depends on the character type used.

For the char character type the default application encoding is UTF-8. However, you can configure the character encoding that should be used by the object model using the --char-encoding options. As an argument to this option you can specify iso8859-1, lcp (Xerces-C++ local code page), and custom.

The custom option allows you to support a custom encoding. For this to work you will need to implement the transcoder interface for your encoding (see the libxsd/xsd/cxx/xml/char-* files for examples) and include this implementation’s header at the beginning of the generated header files (see the --hxx-prologue option).

When the local code page encoding is specified the Xerces-C++ XMLString::transcode functions are used for character conversion. On some platforms (e.g., UNIX-like) you can set the local code page with the call to setlocale. On other platforms (e.g., Windows), the local code page is preset and cannot be changed.

For the wchar_t character type the encoding is automatically selected between UTF-16 and UTF-32/UCS-4 depending on the size of the wchar_t type. On some platforms (e.g., Windows with Visual C++ and AIX with IBM XL C++) wchar_t is 2 bytes long. For these platforms the encoding is UTF-16. On other platforms wchar_t is 4 bytes long and UTF-32 is used.

How can I make sure my application catches every possible exception?

Exceptions that can be thrown during the "normal" execution of the XSD-generated code are derived from xml_schema::exception (which, in turn, is derived from std::exception) and catching this exception is generally sufficient. By "normal" we mean that such exceptions are caused by things that are outside of the application's control (such as invalid XML documents) rather than bugs in the application or the generated code itself. Generally, every application should be prepared to encounter and handle such "normal" exceptions.

There is, however, a handful of exceptions that the generated code may throw which would indicate a bug in the application or the generated code. Such exceptions are currently not derived from xml_schema::exception (we are planning to change this in the next release of XSD and make all exceptions, without exception, derive from xml_schema::exception). If your application must handle absolutely every exception thrown by the generated code, then you will need to add a few specific exceptions to your catch set. The following code example shows how to achieve this for XSD 3.3.0:

try
{
  //
  // Parse, work with object model, serialize.
  //
}
catch (const xml_schema::exception& e)
{
  cerr << e << endl;
  return 1;
}
catch (const xml_schema::properties::argument&)
{
  cerr << "invalid property argument (empty namespace or location)" << endl;
  return 1;
}
catch (const xsd::cxx::xml::invalid_utf16_string&)
{
  cerr << "invalid UTF-16 text in DOM model" << endl;
  return 1;
}
catch (const xsd::cxx::xml::invalid_utf8_string&)
{
  cerr << "invalid UTF-8 text in object model" << endl;
  return 1;
}

If your object model uses the ISO-8859-1 encoding instead of UTF-8, you will need to replace the last catch block with this one:

catch (const xsd::cxx::xml::iso8859_1_unrepresentable&)
{
  cerr << "XML data is not representable in ISO-8859-1" << endl;
  return 1;
}

It is also theoretically possible that the underlying XML parser (Xerces-C++) throw one of its exceptions. This would normally indicate a bug in XSD or Xerces-C++ or an unrecoverable error such as the out of memory situation (in the next release of XSD we are also planning to catch all such exceptions and translate them to an XSD exception derived from xml_schema::exception). If you would like to catch these exceptions as well, you will need to initialize the Xerces-C++ runtime yourself and add catch blocks for the following exceptions:

xercesc::XMLException (util/XMLException.hpp)

Most Xerces-C++ exceptions are derived from this base exception (except for the following three). It has some member functions which should allow you to query a text message.

xercesc::OutOfMemoryException (util/OutOfMemoryException.hpp)

This one is special in that construction of XMLException may trigger this condition so OutOfMemoryException cannot derive from XMLException.

xercesc::DOMException (dom/DOMException.hpp)

This is the base for DOM exceptions. If it is thrown from XSD-generated code, then that means that either the Xerces-C++ parser/serializer tried to perform an illegal DOM operation (bug in Xerces-C++) or the XSD-generated code did so (bug in XSD). It also has some member functions which should allow you to query a text message.

xercesc::SAXException (sax/SAXException.hpp)

Similar to DOM exception. While SAX is not used by C++/Tree mapping directly, you may want to handle this exception if you decide to use the streaming approach (see the streaming example for details).

The following code example shows how we can achieve this:

int r (0);

// Initialize the Xerces-C++ runtime since the exceptions that
// may be thrown by Xerces-C++ should be caught before the
// runtime is terminated.
//
xercesc::XMLPlatformUtils::Initialize ();

try
{
 //
 // Pass the xml_schema::flags::dont_initialize to parsing and
 // serialization functions.
 //
}
//
// Catch-blocks from the previous example go here.
//
catch (const xercesc::XMLException&)
{
  cerr << "unexpected Xerces-C++ exception" << endl;
  r = 1;
}
catch (const xercesc::OutOfMemoryException&)
{
  cerr << "Xerces-C++ out of memory exception" << endl;
  r = 1;
}
catch (const xercesc::DOMException&)
{
  cerr << "unexpected Xerces-C++ DOM exception" << endl;
  r = 1;
}
catch (const xercesc::SAXException&) // optional
{
  cerr << "unexpected Xerces-C++ SAX exception" << endl;
  r = 1;
}

xercesc::XMLPlatformUtils::Terminate ();
return r;

Finally, the generated code uses the C++ standard library so a number of standard exceptions can be thrown such as std::bad_alloc. To make your application completely exception-tight, you will probably also want to catch std::exception.

Is it possible to change names assigned to anonymous types?

By default XSD uses names of enclosing elements and attributes to derive names for anonymous types. You can alter this behavior with the --anonymous-regex option as described in the XSD Compiler Command Line Manual. For example, the following option appends Type to names of all anonymous types:

--anonymous-regex '%.* .* (.+/)*(.+)%$2Type%'

Alternatively, you can make names of anonymous types start with capital letters by using the following option:

--anonymous-regex '%.* .* (.+/)*(.+)%\u$2%'

Parsing

Why do I get "error: no declaration found for element 'root-element'" when I try to parse a valid XML document?

This usually means that the parser could not find a schema to validate your XML document against. Since by default validation is turned on, this results in an error. For more information on various ways to resolve this see Section 5.1, "XML Schema Validation and Searching" in the C++/Tree Mapping Getting Started Guide.

How do I disable validation of XML documents?

To disable validation you will need to pass the xml_schema::flags::dont_validate flag to one of the parsing functions, as described in Section 5.1, "XML Schema Validation and Searching" in the C++/Tree Mapping Getting Started Guide.

How do I specify a schema location other than in an XML document?

There are several methods to programmatically specify schema locations to be used in validation. For example, you can use the xml_schema::properties argument to parsing functions to specify schema locations instead of those specified with the xsi::schemaLocation and xsi::noNamespaceSchemaLocation attributes in XML documents. For more information refer to Section 5.1, "XML Schema Validation and Searching" in the C++/Tree Mapping Getting Started Guide.

How do I parse an XML document to a Xerces-C++ DOM document?

While this question is not exactly about XSD or the C++/Tree mapping and it is covered in the Xerces-C++ Programming Guide, this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable. The code presented in this entry can also be found in the multiroot example in the examples/cxx/tree/ directory of XSD distribution.

#include <string>
#include <istream>

#include <xercesc/dom/DOM.hpp>
#include <xercesc/util/XMLUniDefs.hpp> // chLatin_*
#include <xercesc/framework/Wrapper4InputSource.hpp>

#include <xsd/cxx/xml/dom/auto-ptr.hxx>
#include <xsd/cxx/xml/sax/std-input-source.hxx>
#include <xsd/cxx/xml/dom/bits/error-handler-proxy.hxx>

#include <xsd/cxx/tree/exceptions.hxx>
#include <xsd/cxx/tree/error-handler.hxx>

// Parse an XML document from the standard input stream with an
// optional resource id. Resource id is used in diagnostics as
// well as to locate schemas referenced from inside the document.
//
xsd::cxx::xml::dom::auto_ptr<xercesc::DOMDocument>
parse (std::istream& is, const std::string& id, bool validate)
{
  using namespace xercesc;
  namespace xml = xsd::cxx::xml;
  namespace tree = xsd::cxx::tree;

  const XMLCh ls_id [] = {chLatin_L, chLatin_S, chNull};

  // Get an implementation of the Load-Store (LS) interface.
  //
  DOMImplementation* impl (
    DOMImplementationRegistry::getDOMImplementation (ls_id));

  xml::dom::auto_ptr<DOMLSParser> parser (
    impl->createLSParser (DOMImplementationLS::MODE_SYNCHRONOUS, 0));

  DOMConfiguration* conf (parser->getDomConfig ());

  // Discard comment nodes in the document.
  //
  conf->setParameter (XMLUni::fgDOMComments, false);

  // Enable datatype normalization.
  //
  conf->setParameter (XMLUni::fgDOMDatatypeNormalization, true);

  // Do not create EntityReference nodes in the DOM tree. No
  // EntityReference nodes will be created, only the nodes
  // corresponding to their fully expanded substitution text
  // will be created.
  //
  conf->setParameter (XMLUni::fgDOMEntities, false);

  // Perform namespace processing.
  //
  conf->setParameter (XMLUni::fgDOMNamespaces, true);

  // Do not include ignorable whitespace in the DOM tree.
  //
  conf->setParameter (XMLUni::fgDOMElementContentWhitespace, false);

  // Enable/Disable validation.
  //
  conf->setParameter (XMLUni::fgDOMValidate, validate);
  conf->setParameter (XMLUni::fgXercesSchema, validate);
  conf->setParameter (XMLUni::fgXercesSchemaFullChecking, false);

  // Xerces-C++ 3.1.0 is the first version with working multi import
  // support.
  //
#if _XERCES_VERSION >= 30100
  conf->setParameter (XMLUni::fgXercesHandleMultipleImports, true);
#endif

  // We will release the DOM document ourselves.
  //
  conf->setParameter (XMLUni::fgXercesUserAdoptsDOMDocument, true);

  // Set error handler.
  //
  tree::error_handler<char> eh;
  xml::dom::bits::error_handler_proxy<char> ehp (eh);
  conf->setParameter (XMLUni::fgDOMErrorHandler, &ehp);
  
  // Prepare input stream.
  //
  xml::sax::std_input_source isrc (is, id);
  Wrapper4InputSource wrap (&isrc, false);

  xml::dom::auto_ptr<DOMDocument> doc (parser->parse (&wrap));

  eh.throw_if_failed<tree::parsing<char> > ();

  return doc;
}

Below is a simple program that uses the above code.

#include <fstream>
#include <xercesc/util/PlatformUtils.hpp>

int
main (int argc, char* argv[])
{
  using namespace xercesc;
  namespace xml = xsd::cxx::xml;

  XMLPlatformUtils::Initialize ();

  {
    std::ifstream ifs (argv[1]);
    xml::dom::auto_ptr<DOMDocument> doc (parse (ifs, argv[1], true));
  }
 
  XMLPlatformUtils::Terminate ();
}

How do I handle XML data of an unknown type?

Here we assume that you need to handle XML documents that can be of several predefined types. There is no informtaion that distinguishes one document from the other other than the root element name.

Suppose we have two root elements defined in our schema: foo and bar with types Foo and Bar, respectively. There are two ways to handle this situation. The first is quite straightforward but slow. It boils down to calling each parsing function in a sequence expecting all except one to fail. The slow part comes from the need to re-parse XML to DOM for each invocation. The following code outlines this approach:

while (true)
{
  try
  {
    std::auto_ptr<Foo> f (foo ("document.xml")); // Try to parse as Foo.

    // Do something useful with f.

    break;
  }
  catch (xml_schema::unexpected_element const&)
  {
    // Try the next function.
  }

  try
  {
    std::auto_ptr<Bar> b (bar ("document.xml")); // Try to parse as Bar.

    // Do something useful with b.

    break;
  }
  catch (xml_schema::unexpected_element const&)
  {
    // Try the next function.
  }

  // This document is of some other type.
}

The second approach involves splitting the parsing process into two stages: XML to DOM and DOM to Tree. After the XML to DOM stage we peek at the root element and decide which parsing function to call:

#include <xercesc/dom/DOM.hpp>
#include <xsd/cxx/xml/string.hxx>

using namespace xercesc;

DOMDocument* dom = ... // Parse XML into DOM.
DOMElement* root = dom->getDocumentElement ();
std::string name (xsd::cxx::xml::transcode<char> (root->getLocalName ()));

if (name == "foo")
{
  std::auto_ptr<Foo> f (foo (*dom)); // Parse dom to Foo.

  // Do something useful with f.
}
else if (name == "bar")
{
  std::auto_ptr<Bar> b (bar (*dom)); // Parse dom to Bar.

  // Do something useful with b.
}

For a complete program that uses this technique refer to the multiroot example in the examples/cxx/tree/ directory of XSD distribution.

If your XML vocabulary has a large number of root elements, then writing and maintaining such code manually quickly becomes burdensome. To overcome this you can instruct the XSD compiler to generate wrapper types instead of parsing/serialization functions for root elements in your vocabulary (--generate-element-type option). You can also request the generation of an element map for uniform parsing/serialization of the element types (--generate-element-map option). For more information on this approach see the messaging example in the examples/cxx/tree/ directory of XSD distribution as well as Section 2.9.1, “Element Types” and Section 2.9.2, “Element Map” in the C++/Tree Mapping User Manual.

For more information on parsing XML to DOM see How do I parse an XML document to a Xerces-C++ DOM document?

How do I parse an XML document that is missing namespace information?

Strictly speaking such a document is invalid per the schema and the best solution is to either fix the XML document to conform to the schema or change the schema to allow such documents. If none of these methods are acceptable, then you can use the following work-around. First, you will need to perform the XML-to-DOM parsing and disable XML Schema validation (since your document is invalid, it cannot be validated). For more information on how to do this see the C++/Tree mapping examples in the XSD distribution as well as the rest of this FAQ.

Once you obtain the DOM representation of your XML document, you will need to add the missing namespace information to the elements. For example, the following function adds the namespace information to all the elements in the document fragment recursively:

#include <string>

#include <xercesc/dom/DOMDocument.hpp>
#include <xercesc/dom/DOMElement.hpp>

#include <xsd/cxx/xml/string.hxx>

xercesc::DOMElement*
add_namespace (xercesc::DOMDocument* doc,
               xercesc::DOMElement* e,
               const XMLCh* ns)
{
  using namespace xercesc;

  DOMElement* ne =
    static_cast<DOMElement*> (
      doc->renameNode (e, ns, e->getLocalName ()));

  for (DOMNode* n = ne->getFirstChild (); n != 0; n = n->getNextSibling ())
  {
    if (n->getNodeType () == DOMNode::ELEMENT_NODE)
    {
      n = add_namespace (doc, static_cast<DOMElement*> (n), ns);
    }
  }

  return ne;
}

xercesc::DOMElement*
add_namespace (xercesc::DOMDocument* doc,
               xercesc::DOMElement* e,
               const std::string& ns)
{
  return add_namespace (doc, e, xsd::cxx::xml::string (ns).c_str ());
}

After this, you can call one of the parsing functions to convert this fixed-up document to the object model. The following code fragment illustrates these steps:

DOMDocument* doc = ... // Parse XML to DOM.
add_namespace (doc, doc->getDocumentElement (), "http://www.example.com/foo");
auto_ptr<Foo> = foo (*doc); // Parse DOM to object model.

How can I speed up parsing?

Parsing functions perform the following steps for each document being parsed:

  1. Initialization and termination of the Xerces-C++ runtime.
  2. Construction and configuration of the parser object.
  3. If XML Schema validation is enabled, loading and parsing of the schema(s).

These steps can be moved out and done only once during the application startup. The performance example in the examples/cxx/tree/ directory of the XSD distribution shows how to do this. For more information on how to parse from std::istream instead of the memory buffer, see Q2.4 in this section.

Serialization

How do I create an empty Xerces-C++ DOM document?

While this question is not exactly about XSD or the C++/Tree mapping and it is covered in the Xerces-C++ Programming Guide, this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable.

#include <xercesc/dom/DOM.hpp>

#include <xsd/cxx/xml/string.hxx>
#include <xsd/cxx/xml/dom/auto-ptr.hxx>

xsd::cxx::xml::dom::auto_ptr<xercesc::DOMDocument>
create (const std::string& root_element_name,
        const std::string& root_element_namespace = "",
        const std::string& root_element_namespace_prefix = "");

xsd::cxx::xml::dom::auto_ptr<xercesc::DOMDocument>
create (const std::string& name,
        const std::string& ns,
        const std::string& prefix)
{
  using namespace xercesc;
  namespace xml = xsd::cxx::xml;

  const XMLCh ls_id [] = {chLatin_L, chLatin_S, chNull};

  // Get an implementation of the Load-Store (LS) interface.
  //
  DOMImplementation* impl (
    DOMImplementationRegistry::getDOMImplementation (ls_id));

  xml::dom::auto_ptr<DOMDocument> doc (
    impl->createDocument (
      (ns.empty () ? 0 : xml::string (ns).c_str ()),
      xml::string ((prefix.empty () ? name : prefix + ':' + name)).c_str (),
      0));

  return doc;
}

The following code fragment shows how to use this function. It also shows how to establish additional namespace-prefix mappings and set the schemaLocation attribute:

#include <xercesc/util/PlatformUtils.hpp>

int
main (int argc, char* argv[])
{
  using namespace xercesc;
  namespace xml = xsd::cxx::xml;

  XMLPlatformUtils::Initialize ();

  {
    xml::dom::auto_ptr<DOMDocument> doc (
      create ("example",
              "http://www.example.com/xmlns/example",
              "e"));

    DOMElement* root (doc->getDocumentElement ());

    root->setAttributeNS (
      xml::string ("http://www.w3.org/2000/xmlns/").c_str (),
      xml::string ("xmlns:xsi").c_str (),
      xml::string ("http://www.w3.org/2001/XMLSchema-instance").c_str ());

    root->setAttributeNS (
      xml::string ("http://www.w3.org/2001/XMLSchema-instance").c_str (),
      xml::string ("xsi:schemaLocation").c_str (),
      xml::string ("http://www.example.com/xmlns/example example.xsd").c_str ());
  }

  XMLPlatformUtils::Terminate ();
}

The call to create above creates a DOM document with the example element as its root. The example element is in the http://www.example.com/xmlns/example namespace to which we assigned the e namespace prefix.

How do I serialize a Xerces-C++ DOM document to XML?

While this question is not exactly about XSD or the C++/Tree mapping and it is covered in the Xerces-C++ Programming Guide, this step is a prerequisite to some more advanced techniques covered in this FAQ. Furthermore, the XSD runtime provides some untilities that make the code a little bit more palatable.

#include <ostream>

#include <xercesc/dom/DOM.hpp>
#include <xercesc/util/XMLUniDefs.hpp>

#include <xsd/cxx/xml/string.hxx>
#include <xsd/cxx/xml/dom/auto-ptr.hxx>
#include <xsd/cxx/xml/dom/serialization-source.hxx>
#include <xsd/cxx/xml/dom/bits/error-handler-proxy.hxx>

#include <xsd/cxx/tree/exceptions.hxx>
#include <xsd/cxx/tree/error-handler.hxx>

void
serialize (std::ostream& os,
           const xercesc::DOMDocument& doc,
           const std::string& encoding = "UTF-8")
{
  using namespace xercesc;
  namespace xml = xsd::cxx::xml;
  namespace tree = xsd::cxx::tree;

  const XMLCh ls_id [] = {chLatin_L, chLatin_S, chNull};

  // Get an implementation of the Load-Store (LS) interface.
  //
  DOMImplementation* impl (
    DOMImplementationRegistry::getDOMImplementation (ls_id));
 
  tree::error_handler<char> eh;
  xml::dom::bits::error_handler_proxy<char> ehp (eh);

  xml::dom::ostream_format_target oft (os);

  // Create a DOMSerializer.
  //
  xml::dom::auto_ptr<DOMLSSerializer> writer (
    impl->createLSSerializer ());

  DOMConfiguration* conf (writer->getDomConfig ());

  // Set error handler.
  //
  conf->setParameter (XMLUni::fgDOMErrorHandler, &ehp);

  // Set some generally nice features.
  //
  conf->setParameter (XMLUni::fgDOMWRTDiscardDefaultContent, true);
  conf->setParameter (XMLUni::fgDOMWRTFormatPrettyPrint, true);

  xml::dom::auto_ptr<DOMLSOutput> out (impl->createLSOutput ());
  out->setEncoding (xml::string (encoding).c_str ());
  out->setByteStream (&oft);

  writer->write (&doc, out.get ());

  eh.throw_if_failed<tree::serialization<char> > ();
}

This function can be used like this:

#include <fstream>
#include <xercesc/util/PlatformUtils.hpp>

int
main (int argc, char* argv[])
{
  using namespace xercesc;
  namespace xml = xsd::cxx::xml;

  XMLPlatformUtils::Initialize ();

  {
    DOMDocument& doc = ...

    std::ofstream ofs (argv[1]);

    serialize (ofs, *doc);
  }

  XMLPlatformUtils::Terminate ();
}

How do I serialize to XML without any namespace information?

Strictly speaking such an XML document would be invalid per the schema and the best solution is to either fix the consumer of such documents to be able to parse the version with namespaces or change the schema to allow such documents. If none of these methods are acceptable, then you can use the following work-around. First, you will need to call one of the serialization functions to serialize the object model to a DOM document. Once you obtain the DOM representation of your XML document, you will need to remove the namespace information from the elements. For example, the following function removes the namespace information from all the elements in the document fragment recursively:

#include <xercesc/dom/DOMDocument.hpp>
#include <xercesc/dom/DOMElement.hpp>

xercesc::DOMElement*
remove_namespace (xercesc::DOMDocument* doc, xercesc::DOMElement* e)
{
  using namespace xercesc;

  DOMElement* ne =
    static_cast<DOMElement*> (
      doc->renameNode (e, 0, e->getLocalName ()));

  for (DOMNode* n = ne->getFirstChild (); n != 0; n = n->getNextSibling ())
  {
    if (n->getNodeType () == DOMNode::ELEMENT_NODE)
    {
      n = remove_namespace (doc, static_cast<DOMElement*> (n));
    }
  }

  return ne;
}

Once this is done, you will need to perform the DOM-to-XML serialization. For more information on how to do this see the C++/Tree mapping examples in the XSD distribution as well as the rest of this FAQ. The following code fragment illustrates these steps:

Foo& f = ... // Object model.
xml_schema::dom::auto_ptr<DOMDocument> doc = foo (f); // Serialize the object model to DOM.
remove_namespace (doc.get (), doc->getDocumentElement ());
serialize (std::cout, *doc); // Serialize DOM to XML.

How can I speed up serialization?

Serialization functions perform the following steps for each document being serialized:

  1. Initialization and termination of the Xerces-C++ runtime.
  2. Construction and configuration of the serializer object.
  3. Creation of the DOM document and root element.

These steps can be moved out and done only once during the application startup. The performance example in the examples/cxx/tree/ directory of the XSD distribution shows how to do this. For more information on how to serialize to std::ostream instead of the memory buffer, see Q3.2 in this section.

See also

Personal tools