![]() ![]() ![]() ![]() ![]() |
Top Contents Index Glossary |
This page gives you a map so you can find your way around JAXP and the associated XML APIs.
The JAXP APIs, contained in the jaxp.jar file, are comprised of the
javax.xml.parsers
package. That package contains two vendor-neutral
factory classes: SAXParserFactory
and DocumentBuilderFactory
that give you a SAX parser and a DocumentBuilder, respectively. The DocumentBuilder,
in turn, creates DOM-compliant Document
object.
The factory APIs give you the ability to plug in an XML implementation offered
by another vendor without changing your source code. The implementation you
get depends on the setting of the javax.xml.parsers.SAXParserFactory
and javax.xml.parsers.DocumentBuilderFactory system properties. The
default values (unless overridden at runtime) point to Sun's reference implementations
at com.sun.xml
.
The remainder of this section shows how those APIs relate to each other in
an application. As you read, watch for this logo:
It identifies the parts of the discussion that deal exclusively with Sun's reference
implementation.
As discussed in the previous section, the SAX and DOM APIs are defined by XML-DEV group and by the W3C, respectively. The libraries that define those APIs are included in the parser.jar file, which also contains Sun's reference implementation, Project X.
The "Simple API" for XML (SAX) is the event-driven, serial-access mechanism that does element-by-element processing. The API for this level reads and writes XML to a data repository or the Web. For server-side and high-performance apps, you will want to fully understand this level. But for many applications, a minimal understanding will suffice.
The DOM API is generally an easier API to use. It provides a relatively familiar tree structure of objects. You can use the DOM API to manipulate the hierarchy of application objects it encapsulates. The DOM API is ideal for interactive applications because the entire object model is present in memory, where it can be accessed and manipulated by the user.
On the other hand, constructing the DOM requires reading the entire XML structure and holding the object tree in memory, so it is much more CPU and memory intensive. For that reason, the SAX API will tend to be preferred for server-side applications and data filters that do not require an in-memory representation of the data.
![]() |
SAXParserFactory
shown at the top generates an instance of the parser.
The XML text is shown coming in to the parser from the left. As the data is
parsed, the parser invokes one of several callback methods defined by the interfaces
DocumentHandler
, ErrorHandler
, DTDHandler
,
and EntityResolver
.
Here is a summary of the key SAX APIs:
SAXParserFactory
javax.xml.parsers
.SAXParserFactory.Parser
org.xml.sax.Parser
interface defines methods like setDocumentHandler
to set up
event handlers and parse(URL)
to actually do the parsing. This
interface is implemented by the Parser
and ValidatingParser
classes in the com.sun.xml.parser
package.DocumentHandler
startDocument
, endDocument
, startElement
,
and endElement
are invoked when an XML tag is recognized. This
interface also defines methods characters
and processingInstruction
,
which are invoked when the parser encounters the text in an XML element
or an inline processing instruction, respectively.ErrorHandler
error
, fatalError
, and warning
are invoked in response to various parsing errors. The default error handler
throws an exception for fatal errors and ignores other errors (including
validation errors). That's one reason you need to know something about the
SAX parser, even if you are using the DOM. Sometimes, the application may
be able to recover from a validation error. Other times, it may need to
generate an exception. To ensure the correct handling, you'll need to supply
your own error handler to the parser.DTDHandler
DtdEventListener
,
which adds methods like startDtd
and endDtd
.EntityResolver
resolveEntity
method is invoked when the parser must
identify data identified by a URI. In
most cases, a URI is simply a URL, which
specifies the location of a document, but in some cases the document may
be identified by a URN -- a public
identifier, or name, that is unique in the web space.
The public identifier may be specified in addition to the URL. The EntityResolver
can then use the public identifier instead of the URL to find the document,
for example to access a local copy of the document if one exists.A typical application provides a DocumentHandler
, at a minimum.
Since the default implementations of the interfaces ignore all inputs except
for fatal errors, a robust implementation may want to provide an ErrorHandler
to report more errors or report them differently.
Note: The class
org.xml.sax.HandlerBase
implements all of these interfaces with null methods, so you can override the methods for events you need to process and ignore the methods for other events.
The SAX parser is defined in the following packages.
Package | Description |
org.xml.sax | Defines the SAX interfaces. The name "org.xml "
is the package prefix that was settled on by the group that defined the
SAX API. This package also defines HandlerBase -- a default
implementation of a base class for the various "handlers" defined
by the interfaces, as well as an InputSource class, which
encapsulates information that tells where the XML data is coming from. |
org.xml.sax.helpers |
This package is part of SAX. It defines the |
javax.xml.parsers | Defines the SAXParserFactory class which returns
the SAXParser. Also defines the ParserConfigurationException
class for reporting errors. |
com.sun.xml.parser | Contains the Java XML parser (com.sun.xml.parser.Parser ),
validating parser (com.sun.xml.parser.ValidatingParser ),
and entity resolver. The fully qualified name of either parser can be
sent to the parser factory to obtain an instance of that parser. The nonvalidating
parser generates errors if a document is not well formed, and does some
processing of the DTD (if present) but does not check to make sure that
the document obeys all of the constraints defined by the DTD. The validating
parser, on the other hand, checks to make sure that the document obeys
all such constraints. |
Technical Note:
All nonvalidating parsers are not created equal! Although a validating parser is required to process all external entities referenced from within the document, some of that processing is optional for a nonvalidating parser. With such a parser, an externally stored section of the DTD that is "included" in the current document using an entity reference might not be processed. In addition, a nonvalidating parser is not required to identify ignorable whitespace (although a validating parser must). In that case, whitespace which can legitimately be ignored would be returned as part of the normal character stream. The nonvalidating parser in Sun's Java XML library implements both of these optional behaviors -- it processes all external entities and it identifies ignorable whitespace.
In addition to the APIs described here, the SAX APIs define a few other interfaces that you are likely to use when you write a SAX application, as well as a utility package with a number of classes that are helpful for building real-world applications. When you are ready for a bit more information pertaining to SAX, go here: 3b. Other SAX APIs.
The diagram below shows the JAXP APIs in action:
You use the javax.xml.parsers.
DocumentBuilderFactory class to
get a DocumentBuilder instance (upper left), and use that to produce a Document
(a DOM) that conforms to the DOM specification (lower right). The builder you
get, in fact, is determined by the System property, javax.xml.parsers.DocumentBuilderFactory,
which selects the factory implementation that is used to produce the builder.
(The platform's default value can be overridden from the command line.)
You can use the builder's newDocument()
method to create an empty
Document that implements the org.w3c.dom.Document
interface. Alternatively, you can use one of the builder's parse methods to
create a Document from existing XML data. The result is a DOM tree like that
shown in the lower right corner of the diagram.
The Document Object Model implementation is defined in the following packages:
Package | Description |
org.w3c.dom | Defines the DOM programming interfaces for XML (and, optionally, HTML) documents, as specified by the W3C. |
javax.xml.parsers | Defines the DocumentBuilderFactory class and the DocumentBuilder
class, which returns an object that implements the W3C Document interface.
The factory that is used to create the builder is determined by the javax.xml.parsers
system property, which can be set from the command line or overridden when
invoking the newInstance method. This package also defines
the ParserConfigurationException class for reporting errors. |
Sun's Java XML implementation of the DOM libraries, including the |
This section shows how the reference implementation combines the SAX and DOM APIs.
Note:
The material in the remainder of this section is specific to Project X, Sun's reference implementation for the JAXP standard. The material in this section is not part of the standard. Because it is not part of the JAXP standard, the functionality described here may very well be implemented differently in other parsers. In addition, as standards evolve, future versions of the JAXP reference implementation could employ different mechanisms to achieve the same goals.
In Sun's reference implementation, the DOM API builds on the SAX API as shown in the diagram below:
Sun's implementation of the Document Object Model (DOM) API uses the SAX libraries to read in XML data and construct the tree of data objects that constitutes the DOM. Sun's implementation also provides a framework to help output the object tree as XML data.
The diagram below shows how Sun's DocumentBuilder operates "under the hood":
The section of the diagram inside the wavy orange lines shows what Sun's reference implementation does when you parse existing XML data.
The default DocumentBuilder creates an object which implements the SAX DocumentHandler
interface. It then hands that object to one of the SAX parsers (Parser
or ValidatingParser
, depending on how the builder factory was
configured). When the input source is parsed, the DocumentHandler creates
a Document object.
Note:
To control other aspects of the parser's behavior, you use the DocumentBuilder methodssetErrorHandler
andsetEntityResolver
. DocumentBuilder does not implement asetDTDHandler
method, though. It is shown here only because it is part of the SAX parser. (The DTDHandler is an older, SGML-derived mechanism for handling embedded binary data. More appropriate mechanisms that accomplish the same task are discussed in the section, Referencing Binary Entities.)
At this point, you have enough information to begin picking your own way through the XML libraries. Your next step from here depends on what you want to accomplish. You might want to go to:
Example Description samples Sample XML files simple A very short example that creates a DOM using XmlDocument
's staticcreateXmlDocument
method and echoes it toSystem.out
. Illustrates the least amount of coding necessary to read in XML data, assuming you can live with all the defaults -- for example, the default error handler, which ignores errors.dom A program that creates a Document Object Model in memory and uses it to output an XML structure. gui An example that reads XML data into a DOM and populates a JTree. sax An application that uses the SAX API to echo the content and structure of an XML document using either the validating or non-validating parser, on either a well
-formed, valid, or invalid document so you can see the difference in errors that the parsers report. Lets you set theorg.xml.sax.parser
system variable on the command line to determine the parser returned byorg.xml.sax.helpers.ParserFactory
.namespaces An application that reads an XML document into a DOM and echoes its namespaces. rpc A client/servlet pair that illustrates XML transmitted over a Remote Process Communication (RPC) connection and converted into a DOM. transcode A character set translation example. A document written with one character set is converted to another. jhtml A servlet for use with the Java Web ServerTM that validates an XML document. Precursor to a future JSP version.
![]() ![]() ![]() ![]() ![]() |
Top Contents Index Glossary |