Top Three Rules for XML

Aside from standard things like using well-formed XML, I have three rules which I think will smooth your use of XML in your applications.

1. Use an XML Schema

Always define a schema for your XML document, using XSD and make sure your code is validating against it. Most (half)decent XML libraries will validate against a schema if it is properly referenced. You can reference your schema as follows:

<?xml version="1.0" encoding="utf-8" ?>
<rootElement xmlns="http://tempuri.org/myschema.xsd"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://tempuri.org/myschema.xsd myschema.xsd">

For more info about XSD see the w3schools XSD Tutorial, and also see my post about default namespaces in C# to get you up and running.

2. Use the XML DOM (Document Object Model)

Don’t bother parsing XML yourself, leave that for the geniuses who write the XML library you use, and load your XML into an Xml Document object – available in all (half)decent XML libraries. I can hear faint whimpers of, “What if my connection is not going to be wide enough to get the whole document fast enough?” Don’t use XML. If you are drip feeding messages over a really slow, or small connection, use some sort of binary encoding – XML is too verbose for your little pipe/bus/link/(generic inter-device connection). The DOM makes accessing your data (let’s face it, that’s all we really care about) a lot easier, especially when combined with the next rule.

To read more about the XML DOM, I recommend the w3schools XML DOM Tutorial.

3. Use XPath

XPath allows you to query right into your XML Document. No more iterating through the tree, depth or breadth-first traversing your way to a particular node. From what I gather, most (half)decent XML libraries will build up a form of index into the XML document as it builds the DOM, called an Name Table, which can then exploited by their XPath code to do fast queries of the document. An XPath query looks something like:

/rootNode/someOtherNode[@name="foo"]/box

Querying the following XML, with the query above would return one result:

<rootNode>
	<someOtherNode name="foo">
		<box>Matches the Query</box>
	</someOtherNode>
	<someOtherNode name="notfoo">
		<box>Doesn't match the query.</box>
	</someOtherNode>
</rootNode>

To read more on XPath, I suggest the w3schools XPath Tutorial.

Using XPath with a default namespace in .NET 2.0

I have recently been writing an assembly which facilitates automatic deployment of K2.Net 2003 workflows. The assembly reads in a configuration file and deploys the workflows as specified in the configuration file. Also, rather than writing any custom code to validate the XML, I decided to use an XML Schema Definition, mainly because it saves me time and typing, but also because it’s good practice.

Anyhow, in order to use the XSD, I had to give my schema a namespace, and because I didn’t want to have to add a whole bunch of prefixes to my configuration file, I decided to use the default namespace:

<configuration xmlns="http://temuri.org/configuration.xsd"></configuration>

This broke all my xpath queries because the .NET XPath query code does not seem to handle the default namespace. What I ended up doing to fix it was using an XmlNamespaceManager to create a fake namespace with the same URI as the default namespace, and then change all of my XPath queries to use the prefix for that fake namespace. Something like:

XmlDocument doc = new XmlDocument();
doc.Load("configuration.xml");
// To get the URI for the default namespace
string xmlns = doc.DocumentElement.GetAttribute("xmlns");
XmlNamespaceManager namespaceManager = new XmlNamespaceManager();
namespaceManager.AddNamespace( "a", xmlns );

// Now query with XPath
XmlNodeList nodes = doc.SelectNodes("/a:configuration/a:solution");

With a configuration file which looks something like:

<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns="http://temupri.org/configuration.xsd" ... >
	<solution path="">
		<projects>
			<project name="Job Setup">
				<references>
					<reference name="" gac="false" fullname="" />
				</references>
				<processes>
					<process name="RequestLeave" />
					<process name="RequestSomethingElse" />
				</processes>
			</project>
		</projects>
	</solution>
</configuration>

The way I see it, it involves less typing than actually using a prefix. That said, given that it feels like a lazy way out, it probably isn’t the best practice because there is now a difference between my queries and my xml.