Introduction to DOM – Part II

Usage of DOM in applications

When working with DOM trees in an application, the process starts by

creating a new, empty Document instance, or by having a parser parse an

XML tree, creating a Document instance,. This document can then be

traversed and modified using the methods defined in the DOM API.


Figure 2. The class/interface hierarchy of DOM Level 1.

DOMException is a class, all other entities are interfaces.

Figure 2 shows the class/interface hierarchy of DOM Level 1. As can be seen

from the chart, pretty much everything in DOM is derived from a Node. Node

is the primary data type representing a single node in the document tree.

Other important interfaces are Document, which represents the entire XML

document; Element, which represents a node point with any contained

nodes; Attr, an attribute of an element; and Text, the textual contents of an


Because a DOM implementation doesn’t necessarily have to implement all

features of the higher (2!) level interfaces, the DOMImplementation

interface has the method hasFeature(feature, version) that an

application can use to determine the precence of a higher level



Example Java code

Program listing 1 displays a simple Java code example. The code creates a

new Document instance, adds a root Element called “fortune” and finally

adds some text to the root Element. The “TX” prefix comes from the DOM

implementation, in this case IBM’s XML4Java package.

try {

String quote = readString("fortunes.txt", random() %


TXDocument xmlDoc = new TXDocument();

// Set the necessary parameters explicitly here.






xmlDoc.appendChild(new DTD("fortune", new

ExternalID(PUBLIC_ID, null)));

// Create and add document root element.

TXElement rootElement = new TXElement("fortune");


// Create and add quote as the body text in root element

TXText quoteElement = new TXText(quote);


return xmlDoc;

} catch(Exception e) {


return null;


Program listing 1. Creating a simple Document in Java

The created Document is shown in Figure 3.

Figure 3. The created Document

ECMAScript example

This example loads a DOM tree from a file called “tree.xml”, traverses it and

diplays some of its node names and values.




<BODY ONLOAD="run()">


function run()


var xmlDoc = new ActiveXObject("Microsoft.XMLDOM");


var rootElem = xmlDoc.documentElement;

var name = rootElem.nodeName;

var value = rootElem.nodeValue;

var child1Name = rootElem.childNodes.item(0).nodeName;

var child1Value = rootElem.childNodes.item(0).text;

var child2Name = rootElem.childNodes.item(1).nodeName;

var child2Value = rootElem.childNodes.item(1).text;

alert("Tree name: " + name +

"\nRoot value: " + value +

"\nChild1 (" + child1Name + "): " + child1Value +

"\nChild2 (" + child2Name + "): " + child2Value);





Program listing 2. Traversing a DOM tree in ECMAScript


First Hop Escio Portal & HTXML

Functionality overview

First Hop Oy’s Escio Portal is a Java and DOM based publishing multichannel

publishing system that uses templates to display XML content formatted as

WML, HTML etc. The system architecture is shown in Figure 4.

Figure 4. The Escio Portal publishing engine architecture

The engine is responsible for, based on the request and end user device,

selecting the corresponding XMLSource and template. The engine is usually a

Java servlet.

The XMLSources act as front ends to the XML-formatted content. They can

connect e.g. to a local database or to XML-formatted files over an HTTP

connection. The XMLSources provide the content in the form of DOM trees to

the engine, which in turn forwards it to the approperiate template.

The templates contain the structure and formatting of the page the end user

will receive. The template contains HTXML language that traverses the DOM

tree it received from the engine and publishes the content within. The

contents of an example template is shown in program listing 3.

The HTXML template language

HTXML is a Java- and Javascript-based language that is used to bind

together the template and the contents of an XML tree. The language is

embedded inside HTML/WML pages which are then compiled into Java




<H2>Here is your random fortune:</H2>










Program listing 3. A template that shows a fortune.


Some personal experiences with DOM quirks

Whitespace issues with different parsers

In XML all whitespace characters have to be passed through to the

application. This means that if your source XML has whitespace, such as

carriage returns, between tags in your source file, these have to be passed

through, even if they’re just there for layout purposes. The DOM

implementation has to put this whitespace somewhere, and the only

possibility is a text node. In practice this means that in some cases when

you traverse a newly-created Document, half of the nodes seem to be empty.

The result depends on the parser; some parsers discard the excess

whitespace characters. If your parser of choice creates whitespace nodes and

your application doesn’t have any need for them, you’ll have to write a method

that trims the whitespace before the Document is used.


Node owning & importing

When you are working with several Documents and would like to copy Nodes

from one Document to another, things are not as simple as you might initially


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at

Up ↑

%d bloggers like this: