Chapter 25

Working with Databases


CONTENTS


It can be argued that, due to the widespread use of computing technology throughout corporate America, no new tool or language can be considered "industrial strength" until support for relational databases is provided. In fact, during the first year of Java's growth, very few large-scale applications have been attempted using the language because of this shortcoming. Fortunately, Sun Microsystems and database tool makers such as Oracle, Sybase, and Microsoft realize the power of the Java environment, and they are currently rushing to fill the Java/database void.

This chapter will introduce the use of relational databases with Java to the reader. Following a preliminary section covering the basics of SQL (Structured Query Language) and relational databases, the remainder of the chapter will focus on three possible ways of building database-aware Java applications. Here are these three methods:

A Brief Tutorial on SQL and Relational Databases

Before discussing how Java can be used to actually connect to relational databases, retrieve information, and return it to a Web browser, this section will briefly introduce the reader to the Structured Query Language and the world of relational databases.

What Is a Database Server?

Programmers who have never worked with relational databases may be somewhat confused by the term "database server." Fear not, however, because as a Web developer, you already have a great deal of experience with the client/server model of application development. A database server is simply a powerful piece of software running on a machine that is located on a network. This software monitors a port (just as a Web server typically monitors port 80 on a machine) and handles all incoming requests for the underlying database data. In the case of relational databases, this incoming request is presented to the server using SQL (Structured Query Language). The database server processes the SQL statement(s) and takes some action based on the contents of the SQL statement(s). This action could be a data retrieval operation, a data insertion or deletion, or even a security modification request sent by the database administrator (DBA).

Please take note here of what is meant by the word database. Many programmers accustomed to writing single-user file-based applications call these files databases, and this is fine. However, this chapter will focus on relational databases, which separate all data into related tables of information and are managed by a software product known as an RDBMS (relational database management system). Here are some of the most popular RDBMSs:

All of these database servers can run on a wide variety of platforms and can handle a great number of users simultaneously. Several features set these rather expensive systems apart from simple file-based databases such as Microsoft Access or dBASE. Among the valuable features are the following:

As you can see, a huge amount of effort already has gone into the creation of extremely powerful database management systems. Many computing professionals make their living by administering large databases. Most professional developers within business environments today probably have worked on at least one relational database-aware application. No matter what language their application was written in (C, C++, Object Pascal, Visual Basic, and so on), at some point SQL commands had to be written and passed to the database in order to retrieve data. Because of its ubiquity, SQL will be the topic of the next section.

The Relational Database

Relational databases store all of their information in groups known as tables. Each database can contain one or more of these tables. (Keep in mind that an RDBMS can manage many individual underlying databases, with each one of these databases containing many tables.) These tables are related to each other using some type of common element. A table can be thought of as containing a number of rows and columns. Each individual element stored in the table is known as a column. Each set of data within the table is known as a row. The following example illustrates a simple database containing address information.

Using the Sybase SQL Server RDBMS, assume that we have created a database named ADDRESSES. Because this is a simple example, this database will contain only one table: ADDRESS. Here are the individual columns (and their datatypes) contained in the ADDRESS table:

STRUCTURE OF TABLE ADDRESS
LAST_NAME String(30)
FIRST_NAME String(30)
STREET STRING(50)
CITY STRING(50)
STATE STRING(2)
ZIP INTEGER

Therefore, it can be said that the ADDRESS table has six columns: LAST_NAME, FIRST_NAME, STREET, CITY, STATE, and ZIP. Note that these columns specify only the format of the table, not the actual information stored in the table. Some individual rows, or records as they are often called, could be the following:

LAST_NAME
FIRST_NAME
STREET
CITY
STATE
ZIP
BERRY
KATIE
123 Forest St.
Huntsville
AL
36507
LITTLE
PARKER
456 Wetwood Dr.
Pensacola
FL
32537
HOFFMAN
LINDA
709 Avenida
San Diego
CA
93426

This database works very well if this is all the designer intended it to do. However, a primary goal of the relational database architecture is to reduce redundant information. Looking at the previous example, you can quickly see how information could be repeated if one person has several addresses. Let's assume in the previous example that Linda Hoffman moves every few months to an entirely new part of the country. Because we want to keep track of her migrant lifestyle, we will design this table to store her address each time she moves without deleting the old one. As she continues to move, our table looks like the following:

LAST_NAME
FIRST_NAME
STREET
CITY
STATE
ZIP
BERRY
KATIE
123 Forest St. Huntsville
AL
36507
LITTLE
PARKER
456 Wetwood Dr. Pensacola
FL
32537
HOFFMAN
LINDA
709 Avenida San Diego
CA
93426
HOFFMAN
LINDA
347 Rivers St. Frankfort
KY
49682
HOFFMAN
LINDA
902 Woods Ave. Alexandria
VA
23734
HOFFMAN
LINDA
920 Tiger Ln. Clemson
SC
29632

Although the data is stored correctly, Linda Hoffman's name is stored repeatedly, resulting in the duplication of data stored in the table. Designers of relational databases often design using a process known as normalization. Although it is beyond the scope of this book, the normalization process essentially attempts to break data down into its smallest segments so that no data is duplicated throughout the database. Using the previous example, to reduce data duplication, the ADDRESS table would be broken up into two separate tables: NAME and ADDRESS.

STRUCTURE OF TABLE NAME
LAST_NAME String(30)
FIRST_NAME String(30)
ID INTEGER
STRUCTURE OF TABLE ADDRESS
ID INTEGER
STREET STRING(50)
CITY STRING(50)
STATE STRING(2)
ZIP INTEGER

Using this structure, or database schema as it is often called, the individuals' names will only be entered into the database one time. The ID field will be used to link, or relate, the two tables together. This linkage between tables is known as referential integrity. It is the responsibility of the database developer to strive to maintain the database's referential integrity so that data does not become corrupt. Corrupt data could occur in this example if a record was deleted from the NAME table without ever deleting the corresponding address information in the ADDRESS table. Because of this oversight, this information would remain unused in the ADDRESS table and could potentially result in problems if another user was assigned the deleted record's ID value.

This covers the basics of how relational databases are used to store information. The following topic examines SQL in more detail and explains how SQL is used to query relational databases to retrieve or modify the database contents.

Structured Query Language (SQL)

Structured Query Language (SQL) is an ANSI standard computer programming language used to query relational databases.

Note
SQL and the relational database were invented and introduced in 1970 by Dr. E. F. Codd of IBM. Dr. Codd's original paper titled "A Relational Model of Data for Large Shared Data Banks" was a revolutionary paper that has truly changed the face of modern computing.

The ANSI standard for SQL (the latest standard is commonly referred to as SQL-92 because it was voted on in 1992) specifies a core syntax for the language itself. The reader should be aware that many relational database management systems advertise themselves to be SQL-92 compliant, but that all of these RDBMSs have added vendor-specific extensions to the language to support advanced features such as security, stored procedures, and triggers. There is nothing wrong with these extensions as long as the vendor has at least standardized on the extensions across its own product line. Simply be aware that these extensions exist, and try to note the differences when porting from one RDBMS to another.

Basic SQL Syntax

SQL is most commonly used to retrieve or modify data in a relational database. The four most commonly used statements are SELECT, INSERT, DELETE, and UPDATE. This section will briefly discuss these statements, and then explain database programming constructs such as triggers and stored procedures.

The SELECT Statement
The SELECT statement is used to select data from a database. At its simplest, the SELECT statement retrieves data from a specified table using the FROM clause. To retrieve all of the addresses from the ADDRESSES database (see the previous section titled "The Relational Database"), the user would issue the following SQL statement to the RDBMS:
SELECT * FROM ADDRESS
Upon execution, the RDBMS would return all of the rows in the ADDRESS table to the user. This request also could have been worded using the following syntax in order to retrieve all of the data:
SELECT ID, STREET, CITY, STATE, ZIP FROM ADDRESS
This data retrieval also can be filtered using the WHERE clause. The WHERE clause is used to specify some condition that each row must meet in order to be returned to the user. The following statement returns all records to the user where the address is located in the state of Florida:
SELECT * FROM ADDRESS WHERE STATE = 'FL'
The ORDER BY clause can be used to order data in some default order (determined by the data type of the ordering field). To return all rows to the user ordered by the LAST_NAME field, exe-cute the following statement:
SELECT * FROM ADDRESS ORDER BY LAST_NAME
The SELECT statement also can be used to join multiple tables together in order to retrieve a combination of each table's data. For example, to retrieve all records from the NAME and ADDRESS table that are related, issue the following join command:
SELECT * FROM NAME, ADDRESS where NAME.ID = ADDRESS.ID

Note
Many different types of joins can be used to return different subsets of data. For more information on joins, consult a bona fide book about SQL such as Teach Yourself SQL In 14 Days, by Sams Publishing.

In short, here is the basic syntax for the SELECT statement:
SELECT <FIELDNAME1, FIELDNAME2, … FIELDNAMEn | *>
FROM <TABLE1, TABLE2, TABLE3,...>
WHERE <CONDITION>
The INSERT Statement
At this point, you may be wondering how data actually gets put into the tables in the first place. There are actually two methods:
This discussion will focus on the use of the SQL INSERT statement to insert data into a table.
The INSERT statement syntax, at its simplest, looks like this:
INSERT INTO <TABLE_NAME>
(COLUMN1, COLUMN2, COLUMN3, ...)
VALUES (VALUE1, VALUE2, VALUE3, ...)
When inserting data, you must follow three rules:
As a quick example, the following statement inserts a record into the final ADDRESS table mentioned in the section titled "The Relational Database":
INSERT INTO ADDRESS(ID, STREET, CITY, STATE, ZIP)
VALUES (1, '123 Forest St.', 'Huntsville', 'AL', 36507)
The UPDATE Statement
The UPDATE statement is used to update existing information within a database table. For instance, to modify the zip code value of an existing ADDRESS row, the UPDATE statement would be used. Here's the format of the UPDATE statement:
UPDATE <TABLE_NAME>
SET <COLUMNNAME1 = VALUE1, COLUMNNAME2 = VALUE2, ...>
WHERE <CONDITION>
As you can see, the UPDATE statement makes use of the WHERE clause originally introduced in the previous SELECT statement discussion. As an example, the following statement could be used to correct an entry error where someone entered "Pensacola" instead of "Miami" for the CITY field in the ADDRESS table:
UPDATE ADDRESS
SET CITY = 'Miami'
WHERE CITY = 'Pensacola'
Take note here that if the WHERE clause is omitted from the UPDATE statement, all records in the specified table will be updated! Therefore, make sure that the records being updated are always specified.
The DELETE Statement
The DELETE statement is used to remove rows from a relational table. The DELETE statement makes use of the WHERE clause introduced in the previous SELECT statement discussion. Depending on how the WHERE clause is used, the DELETE statement can do the following:
When using the DELETE statement, you should remember several things:
Here's the basic syntax of the DELETE statement:
DELETE FROM <TABLE_NAME> WHERE <CONDITION>
The following example statement shows the syntax used to delete all addresses located in the state of California:
DELETE FROM ADDRESS WHERE STATE = 'CA'

Using Java with Databases

The first portion of this chapter introduced Java programmers to the world of relational databases. By now, you should have a decent understanding of what a database server is, what SQL is, and some knowledge of how to query a set of tables within a database. Of course, all of the information presented up to this point has been focused on executing SQL statements directly. The remainder of this chapter will discuss different methods for passing queries to a database server from within a Java applet or application. Currently, there are three primary methodologies for Java database access:

The following sections examine each of these topics briefly so that the reader will have a basic understanding of what is required to do database operations using Java code. Each section will point the reader in the direction of much more detailed sources that can be used if more information is required.

Java and CGI Calls

Before the advent of powerful Java database connectivity tools, the Common Gateway Interface (CGI) was the database querying method used by the vast majority of Web developers. This interface defines a methodology for HTTP-based Web servers to talk to programs. CGI programs provide a way for clients to issue a procedure call, have that procedure execute on the server, and then return data to the client. CGI applications were the first "dynamic" elements on the Web, and they continue to be extremely popular because of the momentum behind the technology. A beginning Web developer may wonder, "Why use Java then if CGI is dynamic and platform-independent?" The answer is that a CGI application is simply a program residing on a server that understands how to "speak" HTTP. CGI applications can only return data to the client using a format that the client understands. At the current time (and in the foreseeable future), this data consists of HTML code. Therefore, unlike Java, CGI can only return basic form elements (some of which could be Java applets!). In other words, CGI programs are used to provide information; they are not the actual information themselves.

Where does CGI fit in for Java developers? CGI's primary strength (for Java developers) is its support for scripts written in any language. At the current time, there are no open Java packages available that allow applets to connect directly to SQL databases without the addition of special software drivers on each client machine. Therefore, CGI is an excellent candidate for simple database access. A CGI script could be written in C++ to query an Oracle database based on a number of input parameters. The result set returned from the query could then be formatted in HTML and returned (through the Common Gateway Interface) to the client that requested the information. Of course, this information could be displayed on the client using a Java applet.

Passing Data from the Client to the Gateway Program

There are two primary ways for a client (an HTML page, or in the Java programmer's case, perhaps a Java applet) to pass input parameters to a CGI program residing on a Web server:

CGI-compliant Web servers use the QUERY_STRING and PATH_INFO environment variables to pass data to CGI scripts.

To pass information to the QUERY_STRING environment variable, include the arguments to be passed after a question mark (?) character on the command line. For instance, to call the query.pl script with the "address" argument using the QUERY_STRING variable, use the following syntax:

http://www.someserver.com/query.pl?address

To pass information to the PATH_INFO environment variable, include the arguments to be passed after a slash (/) followed by an equals sign and the parameters value. For instance, to call the query.pl script with the "address" argument using the PATH_INFO variable, use the following syntax:

http://www.someserver.com/query.pl/address=Miami

The query.pl CGI script then can extract this string from the environment variable, parse it, and do whatever needs to be done. Using environment variables to pass information is a risky undertaking, however, due to the memory or string size limitations of some environments.

Although somewhat out of the scope of this book, HTML forms also can be used to "post" data to the standard input (stdin) of CGI scripts. A brief example of this type of input will be supplied so that readers will have some idea how to call CGI scripts directly from Java. There are many online and textual sources that describe CGI and HTML in great detail. Consult these sources if you need in-depth information on developing CGI applications.

The following snippet of code (see Listing 25.1) can be used to post information to a CGI script. Note that this code monitors an input stream continuously in a loop. Therefore, if you do nothing else, make sure that this code is broken off and running in its own thread.


Listing 25.1. Sample Java code used to call the query.pl CGI script.
Socket theSocket = new Socket("http://www.someserver.com", 80);
DataOutputStream theOut = new DataOutputStream(theSocket.getOutputStream());
DataInputStream theIn = new DataInputStream(theSocket.getInputStream());

String theScript = "query.pl";
String theParameter = "Address";

theOut.writeBytes("POST " + theScript + " HTTP/1.0\r\n" + "Content-type:
application/octet-stream\r\n"
  + "Content-length: " + theParameter.length() + "\r\n\r\n" + theParameter;

String theReturn;
String loopdeloop;
while ((theReturn = theIn.readLine()) != null)
 
 loopdeloop += theReturn + "\n";
theIn.close();
theOut.close;

The code in Listing 25.1 opens a connection to the Web server http://www.someserver.com on port 80. After this connection has been made, the code opens a new output and input stream for communication with the server. The script is called using the output stream and then the code reads the server's reply using the input stream. At this point, it is the Java code's responsibility to parse through the reply and do something with it (display to user, perform calculations, and so on).

CGI applications represented an important first step toward a truly dynamic World Wide Web, but CGI has definite shortcomings. Most notably, CGI programs return what looks like Romulan to unsuspecting clients, and therefore these programs can become difficult to maintain, particularly for anyone other than the original programmer. CGI programs also suffer from notoriously poor performance as servers become bogged down. Developers also are forced to segment their application into distinct pieces. One side of the application passes a query through a very thin pipe, and then the other side of the application retrieves the query information from the pipe and talks to the database. When the required information is retrieved, the information is parsed and sent back to the requesting side of the application through the thin pipe. The next section ("The JDBC API") discusses a way for developers to make database queries directly within Java code and retrieve the query results directly. This will result in much cleaner (and faster) application development, as well as provide a more generic database API for applications if database servers ever need to be switched.

The JDBC API

The Java Database Connectivity Application Programming Interface (API) is an API currently being designed by Sun Microsystems that provides a Java language interface to the X/Open SQL Call Level Interface standard. This standard provides a DBMS-independent interface to relational databases that defines a generic SQL database access framework. The most visible implementation of the X/Open SQL CLI is Microsoft's ODBC (Open Database Connectivity). This API defines a common SQL syntax and function calls that can be used by developers to send SQL commands to and retrieve data from SQL databases. ODBC-enabled applications make use of database drivers (similar in concept to other device drivers) installed on the system that allow applications to talk to a vendor's database. Using this methodology, all of the DBMS-specific code is placed inside the ODBC driver and the application developer is shielded from implementation-specific problems in theory. Practically speaking, it is sometimes difficult to completely remove vendor-specific syntax from all ODBC operations, but in most cases, it is a relatively simple task to port ODBC to run on a new database server.

For Java developers, ODBC's primary drawback is that it is written in C. Because of the limitations inherent in the use of native methods, the JDBC designers have designed the JDBC specification to most easily use ODBC in the short term, but they have provided the capability long-term for JDBC to be implemented in other ways.

Note
JDBC, like ODBC, is designed to be a call-level SQL interface. Because many of its low-level operations can be combined into a higher-level, object-oriented interface, expect to see Java class libraries released in the future that provide a mapping to the underlying JDBC calls. This happened with ODBC almost immediately, and the majority of ODBC developers currently use other interfaces rather than using the ODBC API directly.

The JDBC API is expressed as a series of abstract Java interfaces within the java.sql package that will be provided as part of the JDK 1.1 release. Here are the most commonly used interfaces:

JDBC-enabled applets and applications make use of database drivers to connect to remote databases. What sets JDBC apart from ODBC is that these drivers can actually be applets themselves that get uploaded to the client system at runtime. Therefore, the overall Java model of a "thin client" querying a powerful database remains.

There are several different configurations in which JDBC can be used to access a database. Among the possible configurations are the following:

Steps Required to Access a JDBC Database

For programmers familiar with ODBC or other call-level interfaces to databases, the steps required to access a JDBC database should be familiar. JDBC uses the concept of a "connection" handle to manage a program's connection to a database.

Acquiring a Connection
When a Java JDBC-enabled applet or application initially starts, it must first acquire a connection to the JDBC database. It does this by calling the java.sql.DriverManager.getConnection() method. The syntax for this method is
public static synchronized Connection getConnection(String url,
 java.util.Properties info) throws SQLException;
As you can see, the getConnection() method takes as arguments a URL (to determine the database server location) and a Properties list (usually containing at least a user name and password). The format for the JDBC URL is still being debated, but at the present time it appears that it will look something like this:
jdbc:<subprotocol>:<subname>
As an example, if the connection is to use an ODBC bridge to connect to the EMPLOYEES database, the URL would be
jdbc:odbc:EMPLOYEES.
If the Oracle SQLNet protocol was being used to connect to an Oracle listener on www.someserver.com at port 2025, the URL would be
jdbc:sqlnet://www.someserver.com:2025/EMPLOYEES
Like ODBC, JDBC needs to know which database drivers are available for use. It can load drivers using two methods:
Creating a Statement
Now that the connection has been required, a statement needs to be created so that it can be passed to the database for processing. This is done by calling the connection class's createStatement() method. Here's an example of this:
java.sql.Statement statement = connection.createStatement();
Executing a Query
Using the statement you created, an SQL query can be executed by using the statement's executeQuery() method. This method returns a ResultSet object so that the query's results can be examined. Here's a simple example:
ResultSet result = statement.executeQuery("select NAME, SALARY,
 EMPLOYEE_ID from EMPLOYEES");
Examining the Result Set
Use the ResultSet object returned from the executeQuery() method call to examine the returned data. The ResultSet class has member methods to scroll the ResultSet lists, such as first(), last(), next(), and prev(), as well as individual data member methods for examining returned data such as getInteger(), getVarChar(), getVarBinary(), and so on. To examine the returned data, methods such as getInteger() can be called in one of two ways:
JDBC also can be used to modify data using common insert, delete, and update method calls.

Other JDBC Operations

JDBC supports a wide range of common database functions as well. Here are some examples:

In short, JDBC promises to be a tremendous tool for Java programmers interested in connecting to remote SQL databases. At the current time, many major database manufacturers are preparing JDBC drivers so that application programmers will be able to access databases from Java applets. The biggest change for developers might be the performance change, particularly when using untrusted JDBC applets communicating with databases across the Internet. In the Intranet case, however, the appearance of native Java compilers should make high-performance, cross-platform JDBC application development a reality in the near future.

Server-Side Database Access

The final Java/database access method to be discussed has been termed Server-Side Database Access. This functionality is similar to that of Java applets/applications calling CGI scripts, except that in this case, the CGI scripts have been replaced by "intelligent" Web servers designed for database access such as the Oracle WebServer and the Microsoft Internet Information Server. Both of these products allow code to be executed on the server that connects directly to a database (possibly on yet another server). This code can be called directly or indirectly from a remote Java applet, but unlike CGI, it is loaded in the Web server's process.

Oracle WebServer 2.0

Currently, the Oracle WebServer 2.0 supports the creation of server code using a number of languages including Oracle PL/SQL, Java, and C/C++. The Oracle WebServer is built around the Oracle Web Request Broker. This process bypasses the CGI interface and uses a high- performance native connection to the Oracle7 server. Unfortunately (for non-Oracle developers) at the current time, only the Oracle7 database server can be connected to using this product. This server supports end-to-end encryption using SSL 2.0 (Secure Sockets Layer) and is available for a variety of operating platforms including Windows NT and SPARC Solaris. For situations in which the database to be used is definitely going to be an Oracle7 database, the Oracle WebServer allows developers to write more advanced, better performing server code that can be used by Java applets to query databases and return information.

Microsoft Internet Information Server

The Microsoft Internet Information Server (IIS) promises to be one of the more widely used Web servers in the near future due to Microsoft's decision to include it for free as part of its Windows NT 4.0 product. This product is currently freely downloadable from the Microsoft Web site (http://www.microsoft.com). IIS uses ODBC by default to connect to any database that has an available ODBC driver. (Although this makes IIS a more open product, there is a performance penalty to be paid for using ODBC in most cases.) For developers not wanting to use ODBC or who want a cleaner interface than the default HTTPODBC.DLL implementation, Microsoft's ISAPI (Internet Services API) can be used using the OLEISAPI.DLL. OLEISAPI allows the developer to start a server application, pass it some parameters, and retrieve its return values to return to the Web client. To do this, the server application must be an in-process OLE DLL, which can be created using Microsoft Visual C++ or Visual Basic. In the near future, IIS will be able to run code written in a number of other languages including VBScript, Microsoft's scripting language, which bears a syntactical resemblance to Visual Basic. Unlike Visual Basic, however, VBScript will be thread-scalable and should offer better performance.

Summary

This chapter was supplied to give Java developers some insight into Java's database access capabilities. During Java's first year of growth (1995-1996), much of the framework was laid to make the Java environment available on virtually every popular operating system. In addition to this, a great deal of effort has gone into producing new tools specifically tailored for Java developers. The second year of Java's growth should see the introduction of a host of database-access related products that will open the business enterprise up to Java applet and application developers. (Java will no longer be used only to scroll text across Web pages!) Following the widespread acceptance of Java as a full-fledged, industrial-strength programming language, the following year should see the introduction of component models such as Java Beans that will allow Java developers to access other object model objects (such as CORBA and COM).

The JDBC API appears to be a unifying API that many database tool manufacturers will standardize on, much as Microsoft's ODBC API helped standardize Windows-platform database access. As more and more products begin to emerge and the marketplace grows even more competitive, Java developers should make every effort possible to refrain from adding platform-specific extensions to their Java code, particularly in environments where users of other operating systems could access your code (such as the Internet).