Sunday, October 10, 2010

XML Schema Validation using Xerces-C++

Recently I've been writing some serious codes in Xerces-C++, Xml-Security-C++ and OpenSSL. I planned to share my experience in a series of posts. In this post I am going to show how can we validate a XML against its schema using Xerces-C++. To demonstrate the schema validation I am going to use the example schema and XML from W3 Schools. Our schema file is shiporder.xsd and the XML file is shiporder.xml. Lets jump into the code
#include <stdio.h>

#include <xercesc/util/XMLString.hpp>
#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/framework/LocalFileInputSource.hpp>
#include <xercesc/sax/ErrorHandler.hpp>
#include <xercesc/sax/SAXParseException.hpp>
#include <xercesc/validators/common/Grammar.hpp>

XERCES_CPP_NAMESPACE_USE

class WStr
{
    private:
        XMLCh*  wStr;

    public:
        WStr(const char* str)
        {
            wStr = XMLString::transcode(str);
        }

        ~WStr()
        {
            XMLString::release(&wStr);
        }

        operator const XMLCh*() const
        {
            return wStr;
        }
};

class ParserErrorHandler : public ErrorHandler
{
    private:
        void reportParseException(const SAXParseException& ex)
        {
            char* msg = XMLString::transcode(ex.getMessage());
            fprintf(stderr, "at line %llu column %llu, %s\n", 
                    ex.getColumnNumber(), ex.getLineNumber(), msg);
            XMLString::release(&msg);
        }

    public:
        void warning(const SAXParseException& ex)
        {
            reportParseException(ex);
        }

        void error(const SAXParseException& ex)
        {
            reportParseException(ex);
        }

        void fatalError(const SAXParseException& ex)
        {
            reportParseException(ex);
        }

        void resetErrors()
        {
        }
};

void ValidateSchema(const char* schemaFilePath, const char* xmlFilePath)
{
    XercesDOMParser domParser;
    if (domParser.loadGrammar(schemaFilePath, Grammar::SchemaGrammarType) == NULL)
    {
        fprintf(stderr, "couldn't load schema\n");
        return;
    }

    ParserErrorHandler parserErrorHandler;

    domParser.setErrorHandler(&parserErrorHandler);
    domParser.setValidationScheme(XercesDOMParser::Val_Auto);
    domParser.setDoNamespaces(true);
    domParser.setDoSchema(true);
    domParser.setValidationConstraintFatal(true);

    domParser.parse(xmlFilePath);
    if (domParser.getErrorCount() == 0)
        printf("XML file validated against the schema successfully\n");
    else
        printf("XML file doesn't conform to the schema\n");
}

int main(int argc, const char *argv[])
{
    if (argc != 3)
    {
        printf("SchemaValidator <schema file> <xml file>\n");
        return 0;
    }

    XMLPlatformUtils::Initialize();

    ValidateSchema(argv[1], argv[2]);

    XMLPlatformUtils::Terminate();

    return 0;
}
The classes WStr and ParserErrorHandler are helper classes. The class WStr helps us to convert UTF-8 string to UTC-16 string with auto pointer like usage. The class ParserErrorHandler helps to show the parser errors to the console. The function ValidateSchema does the actual validation. The important lines are
domParser.setValidationScheme(XercesDOMParser::Val_Auto);
domParser.setDoNamespaces(true);
domParser.setDoSchema(true);
domParser.setValidationConstraintFatal(true);
which enables namespace processing and schema validation. It also tells the parser to consider any validation error as fatal. The command line to compile the SchemaValidator code is
$ g++ SchemaValidator.cpp -o SchemaValidator -lxerces-c
Lets try our SchemaValidator with valid and invalid xml. First lets try with valid xml file
$ ./SchemaValidator shiporder.xsd shiporder.xml
XML file validated against the schema successfully
and with a invalid shiporder in which I've added a element "oldprice" to the second "item"
$ ./SchemaValidator shiporder.xsd shiporder-invalid.xml
at line 15 column 23, no declaration found for element 'oldprice'
XML file doesn't conform to the schema
You can see the schema validation fails saying "oldprice" is unknown element according to given schema(xsd). I hope I showed you how to validate XML schema using Xerces-C++. We'll see what else we can do with Xerces-C++ in the future posts.

3 comments:

Anonymous said...

This pgm helped me a lot

R K Verma said...

It was helpful. Keep up good work.

Ed said...

Thank you for sharing. I will give Xerces a try.

When I use Qt's QXmlSchemaValidator it does not seem to work as expected when using order xsd:all and sometimes hangs. My schema may not be valid for xsd:all but I do not think it should hang.

http://harmattan-dev.nokia.com/docs/library/html/qt4/xmlpatterns-schema.html