Aside from standard things like using well-formed XML, I have three rules which I think will smooth your use of XML in your applications.
1. Use an XML Schema
Always define a schema for your XML document, using XSD and make sure your code is validating against it. Most (half)decent XML libraries will validate against a schema if it is properly referenced. You can reference your schema as follows:
<?xml version="1.0" encoding="utf-8" ?> <rootElement xmlns="http://tempuri.org/myschema.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://tempuri.org/myschema.xsd myschema.xsd">
For more info about XSD see the w3schools XSD Tutorial, and also see my post about default namespaces in C# to get you up and running.
2. Use the XML DOM (Document Object Model)
Don’t bother parsing XML yourself, leave that for the geniuses who write the XML library you use, and load your XML into an Xml Document object – available in all (half)decent XML libraries. I can hear faint whimpers of, “What if my connection is not going to be wide enough to get the whole document fast enough?” Don’t use XML. If you are drip feeding messages over a really slow, or small connection, use some sort of binary encoding – XML is too verbose for your little pipe/bus/link/(generic inter-device connection). The DOM makes accessing your data (let’s face it, that’s all we really care about) a lot easier, especially when combined with the next rule.
To read more about the XML DOM, I recommend the w3schools XML DOM Tutorial.
3. Use XPath
XPath allows you to query right into your XML Document. No more iterating through the tree, depth or breadth-first traversing your way to a particular node. From what I gather, most (half)decent XML libraries will build up a form of index into the XML document as it builds the DOM, called an Name Table, which can then exploited by their XPath code to do fast queries of the document. An XPath query looks something like:
/rootNode/someOtherNode[@name="foo"]/box
Querying the following XML, with the query above would return one result:
<rootNode> <someOtherNode name="foo"> <box>Matches the Query</box> </someOtherNode> <someOtherNode name="notfoo"> <box>Doesn't match the query.</box> </someOtherNode> </rootNode>
To read more on XPath, I suggest the w3schools XPath Tutorial.