API Overview, Javasoft XML APIs

Top Contents Index Glossary

3. An Overview of the APIs

Link Summary

Local Links

API References

Glossary Terms

namespace, prolog, URL, URN, W3C

This page gives you a map so you can find your way around JAXP and the associated XML APIs.

The JAXP APIs

The JAXP APIs, contained in the jaxp.jar file, are comprised of the javax.xml.parsers package. That package contains two vendor-neutral factory classes: SAXParserFactory and DocumentBuilderFactory that give you a SAX parser and a DocumentBuilder, respectively. The DocumentBuilder, in turn, creates DOM-compliant Document object.

The factory APIs give you the ability to plug in an XML implementation offered by another vendor without changing your source code. The implementation you get depends on the setting of the javax.xml.parsers.SAXParserFactory and javax.xml.parsers.DocumentBuilderFactory system properties. The default values (unless overridden at runtime) point to Sun's reference implementations at com.sun.xml.

The remainder of this section shows how those APIs relate to each other in an application. As you read, watch for this logo: It identifies the parts of the discussion that deal exclusively with Sun's reference implementation.

An Overview of SAX and DOM

As discussed in the previous section, the SAX and DOM APIs are defined by XML-DEV group and by the W3C, respectively. The libraries that define those APIs are included in the parser.jar file, which also contains Sun's reference implementation, Project X.

The "Simple API" for XML (SAX) is the event-driven, serial-access mechanism that does element-by-element processing. The API for this level reads and writes XML to a data repository or the Web. For server-side and high-performance apps, you will want to fully understand this level. But for many applications, a minimal understanding will suffice.

The DOM API is generally an easier API to use. It provides a relatively familiar tree structure of objects. You can use the DOM API to manipulate the hierarchy of application objects it encapsulates. The DOM API is ideal for interactive applications because the entire object model is present in memory, where it can be accessed and manipulated by the user.

On the other hand, constructing the DOM requires reading the entire XML structure and holding the object tree in memory, so it is much more CPU and memory intensive. For that reason, the SAX API will tend to be preferred for server-side applications and data filters that do not require an in-memory representation of the data.

The SAX APIs

The basic outline of a SAX parser is shown at right. First, the SAXParserFactory shown at the top generates an instance of the parser.

The XML text is shown coming in to the parser from the left. As the data is parsed, the parser invokes one of several callback methods defined by the interfaces DocumentHandler, ErrorHandler, DTDHandler, and EntityResolver.

Here is a summary of the key SAX APIs:

SAXParserFactory: A SAXParserFactory object creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory.
Parser: The org.xml.sax.Parser interface defines methods like setDocumentHandler to set up event handlers and parse(URL) to actually do the parsing. This interface is implemented by the Parser and ValidatingParser classes in the com.sun.xml.parser package.

DocumentHandler: Methods like startDocument, endDocument, startElement, and endElement are invoked when an XML tag is recognized. This interface also defines methods characters and processingInstruction, which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively.

ErrorHandler: Methods error, fatalError, and warning are invoked in response to various parsing errors. The default error handler throws an exception for fatal errors and ignores other errors (including validation errors). That's one reason you need to know something about the SAX parser, even if you are using the DOM. Sometimes, the application may be able to recover from a validation error. Other times, it may need to generate an exception. To ensure the correct handling, you'll need to supply your own error handler to the parser.

DTDHandler: Methods defined in this interface are invoked when processing definitions in a DTD. These methods are discussed in Using the DTDHandler and EntityResolver. This interface is extended by the com.sun.java.xml interface DtdEventListener, which adds methods like startDtd and endDtd.

EntityResolver: The resolveEntity method is invoked when the parser must identify data identified by a URI. In most cases, a URI is simply a URL, which specifies the location of a document, but in some cases the document may be identified by a URN -- a public identifier, or name, that is unique in the web space. The public identifier may be specified in addition to the URL. The EntityResolver can then use the public identifier instead of the URL to find the document, for example to access a local copy of the document if one exists.

A typical application provides a DocumentHandler, at a minimum. Since the default implementations of the interfaces ignore all inputs except for fatal errors, a robust implementation may want to provide an ErrorHandler to report more errors or report them differently.

Note: The class org.xml.sax.HandlerBase implements all of these interfaces with null methods, so you can override the methods for events you need to process and ignore the methods for other events.

Packages

The SAX parser is defined in the following packages.

*Package*	*Description*
org.xml.sax	Defines the SAX interfaces. The name "`org.xml`" is the package prefix that was settled on by the group that defined the SAX API. This package also defines `HandlerBase` -- a default implementation of a base class for the various "handlers" defined by the interfaces, as well as an `InputSource` class, which encapsulates information that tells where the XML data is coming from.
org.xml.sax.helpers	This package is part of SAX. It defines the `ParserFactory` class, which lets you acquire an instance of a parser either by specifying a name string or by using the value defined by the `org.xml.sax.parser` system property. This package also provides implementations for two other interfaces defined in `org.xml.sax`, but these classes are not needed when using Sun's Java XML SAX parsers.
javax.xml.parsers	Defines the `SAXParserFactory` class which returns the SAXParser. Also defines the `ParserConfigurationException` class for reporting errors.
com.sun.xml.parser	Contains the Java XML parser (`com.sun.xml.parser.Parser`), validating parser (`com.sun.xml.parser.ValidatingParser`), and entity resolver. The fully qualified name of either parser can be sent to the parser factory to obtain an instance of that parser. The nonvalidating parser generates errors if a document is not well formed, and does some processing of the DTD (if present) but does not check to make sure that the document obeys all of the constraints defined by the DTD. The validating parser, on the other hand, checks to make sure that the document obeys all such constraints.

Technical Note:
All nonvalidating parsers are not created equal! Although a validating parser is required to process all external entities referenced from within the document, some of that processing is optional for a nonvalidating parser. With such a parser, an externally stored section of the DTD that is "included" in the current document using an entity reference might not be processed. In addition, a nonvalidating parser is not required to identify ignorable whitespace (although a validating parser must). In that case, whitespace which can legitimately be ignored would be returned as part of the normal character stream. The nonvalidating parser in Sun's Java XML library implements both of these optional behaviors -- it processes all external entities and it identifies ignorable whitespace.

Other SAX Interfaces

In addition to the APIs described here, the SAX APIs define a few other interfaces that you are likely to use when you write a SAX application, as well as a utility package with a number of classes that are helpful for building real-world applications. When you are ready for a bit more information pertaining to SAX, go here: 3b. Other SAX APIs.

The Document Object Model (DOM) APIs

The diagram below shows the JAXP APIs in action:

You use the javax.xml.parsers.DocumentBuilderFactory class to get a DocumentBuilder instance (upper left), and use that to produce a Document (a DOM) that conforms to the DOM specification (lower right). The builder you get, in fact, is determined by the System property, javax.xml.parsers.DocumentBuilderFactory, which selects the factory implementation that is used to produce the builder. (The platform's default value can be overridden from the command line.)

You can use the builder's newDocument() method to create an empty Document that implements the org.w3c.dom.Document interface. Alternatively, you can use one of the builder's parse methods to create a Document from existing XML data. The result is a DOM tree like that shown in the lower right corner of the diagram.

Packages

The Document Object Model implementation is defined in the following packages:

*Package*	*Description*
org.w3c.dom	Defines the DOM programming interfaces for XML (and, optionally, HTML) documents, as specified by the W3C.
javax.xml.parsers	Defines the DocumentBuilderFactory class and the DocumentBuilder class, which returns an object that implements the W3C Document interface. The factory that is used to create the builder is determined by the `javax.xml.parsers` system property, which can be set from the command line or overridden when invoking the `newInstance` method. This package also defines the `ParserConfigurationException` class for reporting errors.
com.sun.xml.tree	Sun's Java XML implementation of the DOM libraries, including the `XmlDocument`, `XmlDocumentBuilder`, and `TreeWalker` classes.

The Project X Reference Implementation

This section shows how the reference implementation combines the SAX and DOM APIs.

Note:
The material in the remainder of this section is specific to Project X, Sun's reference implementation for the JAXP standard. The material in this section is not part of the standard. Because it is not part of the JAXP standard, the functionality described here may very well be implemented differently in other parsers. In addition, as standards evolve, future versions of the JAXP reference implementation could employ different mechanisms to achieve the same goals.

Overview

In Sun's reference implementation, the DOM API builds on the SAX API as shown in the diagram below:

Sun's implementation of the Document Object Model (DOM) API uses the SAX libraries to read in XML data and construct the tree of data objects that constitutes the DOM. Sun's implementation also provides a framework to help output the object tree as XML data.

Implementation

The diagram below shows how Sun's DocumentBuilder operates "under the hood":

The section of the diagram inside the wavy orange lines shows what Sun's reference implementation does when you parse existing XML data.

The default DocumentBuilder creates an object which implements the SAX DocumentHandler interface. It then hands that object to one of the SAX parsers (Parser or ValidatingParser, depending on how the builder factory was configured). When the input source is parsed, the DocumentHandler creates a Document object.

Note:
To control other aspects of the parser's behavior, you use the DocumentBuilder methods setErrorHandler and setEntityResolver. DocumentBuilder does not implement a setDTDHandler method, though. It is shown here only because it is part of the SAX parser. (The DTDHandler is an older, SGML-derived mechanism for handling embedded binary data. More appropriate mechanisms that accomplish the same task are discussed in the section, Referencing Binary Entities.)

Where Do You Go from Here?

At this point, you have enough information to begin picking your own way through the XML libraries. Your next step from here depends on what you want to accomplish. You might want to go to:

The XML Thread: If you want to learn more about XML, spending as little time as possible on Sun's Java XML APIs. (You will see all of the XML sections in the normal course of the tutorial. Follow this thread if you want to bypass the API programming steps.)

Designing an XML Data Structure: If you are creating XML data structures for an application and want some tips on how to proceed. (This is the next step in the XML overview.)


Serial Access with the Simple API for XML (SAX): If the data structures have already been determined, and you are writing a server application or an XML filter that needs to do the fastest possible processing.

Manipulating Document Contents with the Document Object Model (DOM): If you need to build an object tree from XML data so you can manipulate it in an application, or convert an in-memory tree of objects to XML.

Browse the Examples: To see some real code. Sun's Java XML libraries come with a large number of examples (even though many of them may not make much sense just yet). You can find them in the JAXP examples directory, or you can browse to the XML Examples page. The table below divides them into categories depending on whether they are primarily SAX-related, are primarily DOM-related, or serve some special purpose.

Example Description

samples Sample XML files

simple A very short example that creates a DOM using XmlDocument's static createXmlDocument method and echoes it to System.out. Illustrates the least amount of coding necessary to read in XML data, assuming you can live with all the defaults -- for example, the default error handler, which ignores errors.

dom A program that creates a Document Object Model in memory and uses it to output an XML structure.

gui An example that reads XML data into a DOM and populates a JTree.

sax An application that uses the SAX API to echo the content and structure of an XML document using either the validating or non-validating parser, on either a well
-formed, valid, or invalid document so you can see the difference in errors that the parsers report. Lets you set the org.xml.sax.parser system variable on the command line to determine the parser returned by org.xml.sax.helpers.ParserFactory.

namespaces An application that reads an XML document into a DOM and echoes its namespaces.

rpc A client/servlet pair that illustrates XML transmitted over a Remote Process Communication (RPC) connection and converted into a DOM.

transcode A character set translation example. A document written with one character set is converted to another.

jhtml A servlet for use with the Java Web Server^TM that validates an XML document. Precursor to a future JSP version.

Top Contents Index Glossary

Example	Description
samples	Sample XML files
simple	A very short example that creates a DOM using `XmlDocument`'s static `createXmlDocument` method and echoes it to `System.out`. Illustrates the least amount of coding necessary to read in XML data, assuming you can live with all the defaults -- for example, the default error handler, which ignores errors.
dom	A program that creates a Document Object Model in memory and uses it to output an XML structure.
gui	An example that reads XML data into a DOM and populates a JTree.

sax	An application that uses the SAX API to echo the content and structure of an XML document using either the validating or non-validating parser, on either a well -formed, valid, or invalid document so you can see the difference in errors that the parsers report. Lets you set the `org.xml.sax.parser` system variable on the command line to determine the parser returned by `org.xml.sax.helpers.ParserFactory`.

namespaces	An application that reads an XML document into a DOM and echoes its namespaces.
rpc	A client/servlet pair that illustrates XML transmitted over a Remote Process Communication (RPC) connection and converted into a DOM.
transcode	A character set translation example. A document written with one character set is converted to another.
jhtml	A servlet for use with the Java Web Server^TM that validates an XML document. Precursor to a future JSP version.