Talk:Schemas/NIST-ITL

From Code Synthesis Wiki

Jump to: navigation, search

Version(s) of XercesC++ for the sample to run under Cygwin?

I am trying to build it under Cygwin, and I'm not sure which version of XercesC++ needs to be installed. You mention that 3.0.x behaves differently than previous versions, but it's not clear which one I need to have to make the sample files work. Can you clarify?

Fuhrmanator, 29 March 2009 (EDT)

With the proposed fix to the schema, both Xerces-C++ 2.8.0 and 3.0.x should work.

Boris 16:04, 31 March 2009 (EDT)

Ok - Cygwin compilation (using Xerces-2.8.x) gave us two problems, the first of which we found a work-around. However, the latter is tougher.

  • The generated code (DocumentIntelligenceCategoryCodeSimpleType.[ch]xx) has an enumeration that takes a value of SIGINT, which I think is a collision with the standard Unix-like signals. The error the compiler gives is something about an invalid constant defined in an enumeration. So, we just renamed that enum value in the generated code and that problem went away. Is there a better way to fix this in an XSD config file?
  • The ld (linker) command finally crashes with an "internal error" asking us to report it to the developers.

For the time being, we've decided to run Linux under VMWare so we can make some progress with this. We'll skin the Cygwin cat later...

Fuhrmanator 19:08, 1 April 2009 (EDT)

For the first problem there is the --reserved-name option which allows you to tell the XSD compiler that some names cannot be used in the generated code. I am not sure about the second problem. If you are not using a fairly recent version of Cygwin, you can try to upgrade. One thing that is "unusual" about the linking process is the number of object files. You may want to try to create a static library (e.g., 'ar rc libitl.a out/*.o') and then link that to the executable.

Boris 02:08, 2 April 2009 (EDT)

The static library approach with 'ar' as described above is unsuccessful, as there is not enough memory for 'ar' to finish (it gives an accurate error message about memory being exhausted). So, Cygwin is proving problematic with ANSI/NIST ITL object files because of memory constraints. Increasing memory space for Cygwin applications doesn't seem to be obvious, as documented in this discussion. Any suggestions?
Fuhrmanator 16:48, 3 April 2009 (EDT)
I managed to link everything under Cygwin running on WinXP with 2GB RAM (though it never used more than 1GB) if I compile everything with optimization (-O3). Here is what I did:
  1. I built Xerces-C++ as a static optimized library (./configure CFLAGS=-O3 CXXFLAGS=-O3 --disable-shared)
  2. I compiled the generated code with the -O3 option. I used this makefile though simple 'g++ -O3 -c *.cxx' in the out/ directory should also work.
  3. I then could link all the object files either directly or by first creating a static library.
Linking takes quite some time so if you are planning to develop your application under Cygwin then you will probably want to build Xerces-C++ and ITL as shared libraries which should speed linking up considerably.
Boris 07:12, 4 April 2009 (EDT)
I built Xerces-C++ 3.0.0 as described above, and your makefile was useful. The -O3 optimization made the code small enough to make into an ar file. I managed to get the whole thing to link, however, with a dozen or so of messages like
/usr/lib/gcc/i686-pc-cygwin/3.4.4/../../../../i686-pc-cygwin/bin/ld: warning: auto-importing 
has been activated without --enable-auto-import specified on the command line.
This should work unless it involves constant data structures referencing symbols from auto-imported DLLs.Info: resolving 
xercesc_3_0::XMLUni::fgXMLNSURIName       by linking to __imp___ZN11xercesc_3_06XMLUni14fgXMLNSURINameE (auto-import)
Finally, the driver.exe file doesn't appear to do anything, but the return code (echo $?) is 53. FYI, the libitl.a is large (400+ M), and the executable is nearly 8M. I used the following set of commands to do the build:
xsd cxx-tree --reserved-name SIGINT --generate-xml-schema --output-dir out \
    --options-file xsd.options xml-schema.xsd
xsd cxx-tree --reserved-name SIGINT --file-per-type --output-dir out \
    --options-file itl.options ITL-2007f-Package.xsd 
make -C out -f ~/makefile -j 2 CXXFLAGS="-O3 -I/usr/lib/include"
ar rc libitl.a out/*.o
g++ -L/usr/local/lib -I/usr/local/include -I/usr/lib/xsd/libxsd driver.cxx \
    xml-schema-custom.cxx -o driver.exe -W1,-Bstatic libitl.a -W1,-Bstatic -lxerces-c
Fuhrmanator 10:38, 8 April 2009 (EDT)
Sorry, I made a mistake in the Xerces-C++ build instructions. The --disable-static option should actually be --disable-shared (I've fixed it above). Can you try to rebuild Xerces-C++ and see if that helps? The size of the executable is about the same as what I've got. The static library is expectedly large. The file-per-type mode is quite messy in this regard which is the price one has to pay for handling poorly designed schemas. Also Cygwin is not a very good choice for heavy-weight development. It is slow and the toolchain support is quite poor.
Boris 10:54, 8 April 2009 (EDT)
I agree that Cygwin is less than ideal for this and that the schema is challenging. Perhaps our customer won't require us to support a Cygwin version. Anyway, the good news is we have had some success now, at least in terms of building. I uninstalled the XercesC++ 3 I had built (with the incorrect configure option), and re-installed libxerces-c (2.8.0) via Cygwin setup.exe. I was able to link to it without using static options or running out of memory. I think the -O3 option on the generated code is enough, without having to generate an optimized XercesC++.
However, when I run the driver.exe file on the Instance_2007f.xml file, I get an error "expected element 'http://niem.gov/niem/ansi-nist/2.0#RecordImage'" -- I will check with Zack to see if the Ubuntu executable doesn't get the same error.
Fuhrmanator 11:59, 8 April 2009 (EDT)
Just to follow-up on the last point: We are OK with Ubuntu and the sample instance unmarshal/re-marshal. Cygwin is definitely a problem (but we're not worried for now about trying to solve it). Thanks again, Boris, for all your help.
Fuhrmanator 10:53, 23 April 2009 (EDT)

Thanks again for the pointers. We tried compiling under Ubuntu 8.10 and got an "collect2: ld terminated with signal 9 [Killed]" error -- Googling that error gets us info that implies we are running out of heap during the link. So, we're upping the memory allocation on the VMWare and trying again. Just keeping this info here in the event it will help others.

Fuhrmanator 12:02, 2 April 2009 (EDT)

Thanks, this can definitely be helpful to others. FYI, my machine has 3GB of RAM and everything compiles and links without a problem.

Boris 12:11, 2 April 2009 (EDT)

The SIGINT enum value seems to be a non-issue when compiling the code in Linux. I successfully generated the code and compiled it without adding the --reserved-name option.

Zhutzell 12:52, 2 April 2009 (EDT)

Personal tools