Previous Page TOC Next Page Home


9

A Guided Tour of HTML

One of the primary foundations of the World Wide Web is the HyperText Markup Language (HTML). HTML is the primary format in which documents are distributed and viewed on the Web. Many of its features, such as platform-independent formatting, structural design, and especially hypertext, make it a very good document format for the Internet and the WWW.

This chapter gives you a basic understanding of HTML and how you can create documents in this format. A brief description of the common tags and a style guide to creating good HTML documents help you on the road to getting your information onto the WWW. A few of the more advanced features, as well as a look to the future of HTML, are also covered.

Background

As one of the foundation specifications that define the Web (along with HTTP and URLs), HTML was originally developed by Tim Berners-Lee at CERN in 1989. HTML was envisioned to be a format that would enable scientists using very different computers to share information seamlessly over the network; several features were necessary. Platform independence, in which a document can be displayed similarly on computers with different capabilities (that is, fonts, graphics, and color) was vital to the varied audience. Hypertext, meaning any word or phrase in one document could reference another document, would allow for easy navigation between and within the many large documents on the system. Rigorously structured documents would allow for advanced applications such as converting documents to and from other formats, and searching text databases.

SGML and HTML

Berners-Lee chose to use the Standard Generalized Markup Language (SGML) as a pattern. As an emerging international standard, SGML had the advantages of structure and platform independence. Its status also ensured its long life, meaning that documents formatted in SGML would not need to be rebuilt a few years later.

SGML is platform-independent because it focuses on encoding the semantic structure, or meaning, of a document—not necessarily its appearance. Thus, a chapter title would be labeled, "Chapter Title," instead of "Helvetica 18pt Centered." Although the latter style breaks down if the document is viewed on a computer that doesn't have the Helvetica typeface or support for lettering of different sizes, the former style can be displayed (intelligently) on any system. Each reader defines the appearance of chapter titles in a way that is useful on his or her computer, and any text with that style is formatted accordingly.

Another feature of this structure is that semantically encoded text can be automatically processed more intelligently by the computer. For example, if every chapter title is marked with the label "Chapter Title," perhaps with the chapter number as an attribute, a reader could request to see just Chapter 18; the SGML software would automatically look for the Chapter 18 title and the Chapter 19 title and extract everything between them. This could not be done with the text marked with meaningless (to the computer) fonts and formatting codes.

A great advantage of SGML is its flexibility. SGML is not a format in its own right, but a specification for defining other formats. Users can create new formats to encode all the structure of certain types of documents (for example, technical manuals, phone books, and legal documents), and any SGML-capable software can understand it, simply by reading the definition first. A large number of Document Type Definitions (DTDs) have been created, both for common and very specialized documents. HTML is simply one DTD, or application, of SGML.

The Evolution of HTML

For several years, the use of HTML (and the WWW) grew slowly, despite these capabilities. This was primarily because it did not have enough features to do any kind of professional electronic publishing; it had some font control, but no graphics. Semantic encoding was not important to people when they couldn't make it look pretty.

Then everything changed. When NCSA first built Mosaic in early 1993, they added their own features to HTML, including inline graphics. This suddenly allowed people to attach logos, icons, photographs, and diagrams to their documents; the size and usage of the Web exploded. For the next year, the development of HTML happened on a very ad hoc basis. New pieces of HTML were introduced by one browser or another from time to time; some would catch on, and others would disappear. Some of the additions were poorly designed, and many were not even SGML-compliant.

By May 1994, it was apparent that HTML was growing out of control. At the first WWW conference in Geneva, Switzerland, an HTML Working Group was organized. Its primary task was to formalize HTML, as it was being used, into an SGML DTD known as HTML Level 2. (Level 1 was defined to be HTML as it was originally designed by Tim Berners-Lee.) Once standardized, it could then be safely extended to future levels, and still take advantage of the capabilities of true SGML and its formal structure. At the time of this writing, HTML Level 2 is nearing completion, having gone through several drafts, and is becoming the standard format that all WWW browsers can understand.

Even though it isn't standard, HTML 3.0 is already in wide use today and adds many needed features to the HTML 2.0 specification. Chapter 12, "Netscape Extensions and HTML 3.0," and Appendix H, "HTML Encyclopedia," give you the rundown on which features are in which versions. This chapter sticks to the basics of HTML 1.0 and 2.0 so you can ease into it.

HTML documents are in ASCII text format and can be created using most text editors. There are some Windows editors available specifically for HTML editing. We have included one on the CD that is very simple and intuitive: WebEdit.

A Basic Document

Let's first take a look at a simple HTML document to see how one normally appears. The easiest way to look at HTML is to let a Web browser interpret the file for us. Figure 9.1 shows a very simple HTML file as it would appear on the Web.


Figure 9.1. A simple HTML page showing text and graphics.

Listing 9.1 is the HTML code used to display Figure 9.1. As you look at the HTML code, you should notice that it isn't too difficult to match up the text with the appearance of that text in the Web browser. You can learn many things about the HyperText Markup Language from this basic document.

<HTML>

<HEAD>

<TITLE>Boston's Story</TITLE>

</HEAD>

<BODY>

        <H1>Welcome to Boston's Life</H1>

        Hi, my name is Boston. Here is a picture of me:<P>

        <IMG SRC="boston.jpg"><P>

        <H4>A Brief Autobiography</H4>

        <UL>

               <LI>Born in Bonsall, CA March 5, 1995.

               <LI>Got my shots and went to new home in San Diego, April 30, 1995.

               <LI>Now spend time catching Frisbees and looking out the window.

        </UL>

        <HR>

        <ADDRESS>Okay, so e-mail me: boston@xyz.com

        </ADDRESS>

</BODY>

</HTML>

It is always important to remember that HTML (as an application of SGML) encodes only the structure of the document. Much of the appearance of the document, such as type styles, color, and the window size, is under the ultimate control of the browser and the people using it. However, most browsers render things similarly; as different parts of HTML are described, their normal rendering is also given.

Basic HTML Syntax

An HTML document consists of two types of contents: normal document text and codes, or tags. Tags are text strings surrounded by a less-than and greater-than sign, such as <HTML> in the first line. Tags usually have the following structure:

<tagname attribute=value attribute=value . . . >

The tagname is the type of text being defined by the tag; the attributes (some tags have none, some have several, but most are optional) give additional information about how the element should behave.

For example, in the <HEAD> tag in the second line of the sample HTML file, HEAD is the tagname and has no associated attributes. Farther down in the file is a tag with the tagname IMG and a single attribute SRC that has the value "boston.jpg". It is important to remember that the tagname and attribute are not case-sensitive. You can use uppercase and lowercase letters as you want. The values assigned to the attributes may be case-sensitive, depending on the attribute.

The tags and text combine to form elements. Each element represents an object in the document, such as a heading, paragraph, or picture. An element consists of one or two tags and usually some associated text.

There are two types of elements: containers and empty elements. Container elements represent a section of text and consist of body text (or other elements) delimited by a tag at the beginning and the end. (The end tag is identified by a / before the tagname and never carries any attributes.) For example, in the third line of the sample file, the <TITLE> and </TITLE> tags define the text between them as a title.

On the other hand, an empty element consists of a single tag that does not alter any text; instead, it inserts something into the document. For example, the <IMG SRC=...> tag/element places the picture in the document.

Together, container elements and empty elements completely define how a document is to be formatted and displayed. Other things normally used to format text (such as tabs, extra spaces, and carriage returns) are treated as a single space in HTML. For example, the sample HTML files could have been typed with three blank lines after every tag and ten spaces between each word, but would appear exactly the same (just as it would if the entire file had been typed on a single line). Although this might make simple formatting more difficult, it enables writers to make the HTML document more readable by using programming style techniques such as extra blank spaces and tabs (as are used in the sample file), without affecting the display of the final document.

Description of Elements in a Sample Document

This section looks at the elements used in the sample document. The sample file contains the common tags used in most documents. (More thorough definitions of each element are given later in the chapter.)

First, three container elements should appear in every HTML file. You might imagine these container elements as sandwiches—like pieces of bread, each opening tag must be followed by the corresponding closing tag.

Together, these three elements create a template, which all HTML documents should follow:

<HTML>

<HEAD>

   Header Elements

</HEAD>

<BODY>

   Body of Document

</BODY>

</HTML>

The <HEAD> element can contain several unique elements; however, most documents contain only the one shown in the example:

The <BODY> element in the sample file contains several common elements:

These elements are described in more detail, along with many other valid elements, later in this chapter.

Writing Documents

Now that you have seen an HTML document in action, you're probably wondering, "How can I make one of these?" There are several options for creating HTML files, ranging from the powerful and difficult to the easy and simplistic. Most of the current HTML tools are not as useful as they could be, but the large demand for easy and powerful HTML tools ensures that they will become more robust in the near future.

Text Editors

Because HTML documents are really plain text files, the first (and currently most common) solution is to create them using a garden-variety text editor, such as Notepad. You create the HTML document by typing it exactly as it is to appear—including typing the tags by hand—and you finish with a file that looks just like the sample file shown earlier in this chapter.

The drawback of this approach is that because these editors are ignorant of the type of file you are entering, they cannot help you at all. They cannot correct poor syntax, offer any suggestions on element usage, or show how the finished product will appear in a WWW browser. You have to be careful to get the document right and often have to edit it many times to correct mistakes. If you decide to use a text editor to create HTML, you should also have a WWW browser available to check the document often and find any problems to be fixed.

HTML Editors

Between the two of us, we have tried over a dozen methods of HTML file creation. The one we agreed was the easiest is a simple but powerful program called WebEdit by Ken Nesbitt. WebEdit is included on the CD, and we discussed the installation process in Chapter 4, "Up and Running Fast." WebEdit is shareware. There are many other shareware and freeware HTML Editors available on the Internet, but after observing the difficulty of using some of the other packages, we welcomed WebEdit as a companion for most of our HTML editing tasks.

Word Processor Templates

Tools in this category are not programs in their own right, but exist as macros or accessories that operate within your favorite word processor or desktop publishing program. The advantage of these templates is that they enable you to create HTML documents using the same tools and interface you use for creating normal documents; they output files in HTML instead of the program's normal format. The disadvantages are that the templates are not currently available for most word processing software and that using a large word processor to create a small, one-page document can be slow and cumbersome. However, these templates are probably very good for working on large HTML documents. Here are some currently available:

HTML Converters

Many of the documents you want to contribute to the WWW likely already exist on your computer. Most people have a large number of documents previously created using a word processor or desktop publishing program; they do not want to have to re-create the documents or convert them to HTML by hand. To assist in this process, several tools can convert existing documents to HTML. They simply take the codes from the software's internal format and convert them into HTML elements.

For these converters to work cleanly, your original document should be constructed with the same philosophy used with HTML and SGML: using a clear, semantic structure. For example, if named styles (such as Chapter Title and List Item) are used in the original document, these styles can be converted directly into corresponding HTML elements (Chapter Title = <H1>, List Item = <LI>, and so on.) On the other hand, nonsemantic markup, (such as "Helvetica 14pt centered") is difficult or impossible for the converter to interpret. Almost every word processor and desktop publishing program has a styles feature.

Two types of HTML converter tools are discussed in the following sections.

Word Processor Macros

These operate within the word processor or desktop publisher program, going through the document line by line and converting each code to an HTML equivalent. In the end, the user sees a raw HTML file that can be saved as plain text. Here is one package that does this for Word:

Stand-Alone Conversion Programs

These tools are used outside the originating software. They read the original document from the disk, converting it and saving the result as an HTML document. Here are a few of them; if your software is not represented, you can probably convert the file into a format that can be used by one of these tools. (For example, you can convert the file into RTF format and then convert that into an HTML file.)

Document Style and Organization

As you begin to write HTML documents, it is important that you keep in mind the following tips. Having your document obey these general style rules should make them better looking, better and more frequently used by readers, and easier for you to maintain:

Element Reference

The following sections provide a brief guide to almost all the elements used in HTML Level 2. For a more comprehensive reference, see the official HTML 2.0 specification at http://www.w3.org/hypertext/WWW/MarkUp/html-spec/index.html. Remember that the tag and attribute names are not case-sensitive and can be in uppercase or lowercase letters.

<HEAD> Elements

The following tags are allowed in the header part of the HTML document.

Document Title

This is the name of the document. The title is generally written in a larger type size than the current document in order to give the user a frame of reference. For example, if the document is a chapter of a book, the <TITLE> would probably contain the title of the book as well as the chapter title. Thus, if someone followed a hyperlink from somewhere else directly to this chapter, he or she would not be lost, but would know that this file is part of a certain book. For example:

<TITLE>text</TITLE>
External Link

This establishes a relationship between the current document and another document. The name attribute gives the link a name, such as Mail to Author. The rel attribute describes the type of link, such as "made" (the author), "parent" (a larger document of which this is a part), "next" (the succeeding section of a multifile document), and "prev" (the previous section.) The HREF attribute points to the related document. Currently, most browsers don't make use of this tag, but future browsers will likely add a new button to the screen for each <LINK> to allow users to easily jump to the related document. For example:

<LINK name="text" rel="text" href="URL">
Document Meta-Information

This allows for extra information about a document, such as its modification data, copyright, or abstract. This is done by setting a name and value, such as <META NAME="copyright" CONTENT="1995, Sams.Net Publishing">. Separate <META> tags are included for each item of information. Currently, this tag is seldom used in browsers. For example:

<META NAME="text" CONTENT="text">
Location of Current Document

This lets you specify the full URL of this document. Although it might seem redundant, this information is useful if you use relative URLs in the hyperlinks. Using this base, the hyperlinks are resolved correctly even if this document is requested with a different URL than you expect (for example, if users save it on their local disk and try to use it there). For example:

<BASE HREF="url">
Searchable Document

This places a search field either in the document or elsewhere on the screen, enabling users to enter keywords to search through this document. You can't just add this tag to any arbitrary document and expect it to work. Your server must be set up to process this query, using a back-end search engine such as WAIS. For a full discussion of this topic, see Chapter 19, "Databases and the Web." An example follows:

<ISINDEX>

Empty Elements

As stated earlier in this chapter, empty elements are elements that insert objects into the document by themselves, regardless of the surrounding text. They each consist of a single tag. For example:

<IMG SRC="graphic.gif">
Horizontal Rule

This places a horizontal line across the page, with a blank line above and below, and is normally used to separate major sections of a document (for example, before an <H1> or <H2>). Some graphical browsers give the rule a 3-D chiseled look. For example:

<HR>
Line Break

This forces subsequent text to the next line. Unlike the <P> tag, the text before and after the <BR> tag is still considered a single paragraph. The <BR> tag is normally used to create tight blocks of short-line information, such as mailing addresses. For example:

<BR>
Inline Image

This places an image within the document, as found at the URL specified in the src attribute (which is mandatory). The most common format for these images is CompuServe's Graphics Interchange Format, or GIF. If the browser doesn't support inline images (for example, the Lynx browser does not), the text given in the optional alt attribute is displayed. If no alt attribute is given, a default placeholder such as [IMAGE] may be displayed in this situation. (To ensure that nothing is displayed if the graphic cannot be shown, use the alt ="" attribute.) The optional align attribute specifies how the image is to be aligned vertically with the current line of text. (The default alignment is most often BOTTOM, but this varies by browser.)

The ISMAP attribute lets you create interactive graphics, or imagemaps. If the syntax <A href="http://URL1"><IMG src="URL2" ismap></A> is used and you point to a spot on the image, the x and y coordinates are passed to the hyperlink (for example, http://URL1?x,y). However, the HTTP server must be able to handle imagemap queries. Chapter 10 gives step-by-step details for doing this with Purveyor and/or FolkWeb. For more information on imagemaps, look for The World Wide Web Unleashed, Second Edition, published by Sams.Net. For example:

<IMG src="URL" alt="text" align=TOP/MIDDLE/BOTTOM ISMAP>
Comment

Any text inside this element is ignored. This element is used to include notes that can be read by the writer but that are not part of the text of the document (which is especially useful if several writers work on the same document). The useful programming technique of temporarily commenting out sections of code cannot be done here; many older browsers use a single > as the closing character of the comment, so any tags included in the comment (such as <BR>) cause the comment to end early and interpret any remaining comment text as body text. For example:

<!-- text -->

Character Containers

Character containers enable you to format or describe words and phrases within paragraphs. Although they can be used inside non-body blocks as well as in normal text, all but the <A> tag can produce unattractive results on some browsers.

Hypertext Links

Hypertext links are the heart of HTML. These links let you, with a single mouse click, move from place to place within a document or even to an entirely different document anywhere on the Internet. This use of hyperlinks is how the World Wide Web gets its name—links form a spider's web of documents that covers the globe.

Hypertext Anchor

This is used to mark the reference or the target of a hypertext link. Either the href or name attribute must be included. (Both are allowed, but they don't appear together very often.) The href attribute specifies a URL to which the enclosed text attribute is linked. (The text is highlighted; selecting it requests the new object.) href can reference another HTML document, an image, or anything else that can be addressed using a URL. The hypertext anchor can also enclose an <IMG> tag, allowing inline graphics (such as icons) to become links.

The name attribute gives a unique name to the enclosed tag, allowing users and other HTML documents to point directly to this part of the document. For example, a URL such as http://.../thisdoc.html#part1 loads thisdoc.html and attempts to place the text marked with <a name="part1"> at the top of the screen. For example:

<A href="URL" name="text">text</A>
Logical Styles

Logical styles let you give a real meaning to sections of text. Currently, they are used only for formatting, but they can be used for more intelligent types of processing, such as automatic footnoting.

Emphasis. Used to highlight sections of text for miscellaneous reasons. Normally rendered in italics.

<em>text</em>

Strong emphasis. Another form of generic highlighting. Normally rendered in bold.

<strong>text</strong>

Citation. Used to mark a citation to another document, such as a printed book (for example, Great Expectations). Normally rendered in italics.

<cite>text</cite>

Computer code. Used to mark text from a computer (for example, hit any key.) Normally rendered in a fixed-width font such as Courier.

<code>text</code>

Variable. Used to mark a variable used in a mathematical formula or computer program (for example, z = x + y.) Normally rendered in italics.

<var>text</var>

Keyboard input. Used to mark text that is to be typed at the keyboard by a user (for example, hit the enter key.) Normally rendered in a fixed-width font such as Courier.

<kbd>text</kbd>
Physical Styles

Originally considered cheater versions of the logical styles, physical style elements have become very popular because they are similar to the way people are used to highlighting text (that is, literally instead of semantically).

Bold:

<b>text</b>

Italics:

<i>text</i>

Typewriter text, rendered in a fixed-width font such as Courier:

<tt>text</tt>

Block Containers

In HTML, a block is defined as a piece of marked text that by itself occupies a certain amount of vertical space in a document, such as a paragraph or a heading. The following elements can be adjacent to each other, but cannot be nested (that is, you can't have a <P> inside an <H1>—because they represent different types of blocks).

Headings (1 Through 6)

This acts as a title for a section of the document. The lower-number headings represent more important headings and are generally rendered in larger text. Because of a mixup in the distributed default settings, some browsers erroneously display <H5> and <H6> smaller than the body text. Until these two elements are displayed more consistently, they should probably be avoided when possible. Following is an example:

<H#>text</H#>
Paragraph

In most current browsers, this tag is used in the first form as a paragraph separator. Thus, it marks the boundary between two paragraphs of normal body text. You should not use this tag between body text and another element. (For example, do not use ...<p><h1>....) Because the second element implies a line break, some browsers put too much space between the elements. The second form (a container for each paragraph) represents a more valid SGML structure and will soon be the standard. However, the end tag </P> will be optional, so most documents that have been created using the first form will still work.

text<P>text

<P>text</P>
Extended quotation

Used for long quotations that exist as separate paragraphs. This is normally rendered similarly to a normal paragraph, but with both margins indented.

<BLOCKQUOTE>text</BLOCKQUOTE>
Mailing address

Specifically targeted to postal addresses, this tag is commonly used to mark bylines (name of the author) and e-mail addresses. It is normally rendered in a smaller font or in italics, and usually uses the <BR> tag to separate the individual lines of the address.

<ADDRESS>text</ADDRESS>
Preformatted text

Because extra spaces and tabs are ignored in HTML, some kinds of text, such as poetry, tables, and computer program listings, are difficult to encode. The <PRE> element is used with those types of text by formatting everything it contains exactly as it appears, including spaces, tabs, and line feeds. This is also useful for getting fields to line up in forms.

<PRE>text</PRE>

Lists

There are several HTML tags, which makes it convenient to display lists of items. Lists can be ordered (numbered), unordered (graphically displayed as bullet items), or appear as columns of terms and definitions. Also, list items can be hyperlinks to other documents on the Web.

Itemized List

This creates a list containing several items, each beginning with <LI> and normally indents each item one tab position. There are four types: <UL> is an unordered list (each entry is normally preceded by a bullet); <OL> is an ordered list (each entry is numbered); <MENU> is a menu of choices (similar to but sometimes rendered more compactly); <DIR> is a directory (designed to be a list broken into 2 or 3 columns like a disk directory; in most current browsers, the <DIR> element is rendered the same as <UL>). These lists can be nested within each other, allowing for complex list hierarchies such as outlines.

<TYPE>

        <LI>text

        <LI>text

        <LI>text

        ...

</TYPE>
Definition List

This syntax builds a list in which each entry has two parts, as in a glossary: a term (which follows the <DT>) and a definition (which follows the <DD>). It is normally rendered exactly the same as this section of this chapter, with the definition indented below the term. The optional COMPACT attribute was designed to produce a more vertically compact list in which the terms and definitions are placed in side-by-side columns, but it is ignored by most current browsers.

<DL COMPACT>

        <DT>term text

               <DD>definition text

        <DT>term text

               <DD>definition text

        ...

</DL>

Forms

The forms feature of HTML is one of the things that gives the Web real power for doing live, interactive applications. The HTML form, however, is only half of this feature. After the user fills out the form, it is submitted to a specialized program, or script, which takes the information and does something useful with it (for example, e-mail it to you). You must either write the script yourself (that means programming) or find a prewritten script that will suit your needs. This gets into the topic of the Common Gateway Interface (CGI), which is explored in detail in Chapter 11 and all through Part V. In this chapter, we stick to the HTML side of the process.

Form

The <FORM> element encloses the entire form and gives some basic definitions. The form might take up only part of the HTML document; in fact, a single document can contain several separate forms that perform different functions. The method attribute specifies the way in which information is sent to the HTTP server; the action attribute gives the URL of the script that is to process the submitted information (usually http://.../cgi-bin/scriptname).

<FORM method="[GET|POST]" action="URL">form body</FORM>
Form Input

This empty tag is used to place different fields in the form to enable users to enter information. The name attribute gives a unique name to the field; the optional value attribute gives a default value for this tag. When the form is submitted, the information is returned as a set of name-value pairs separated by ampersands, such as http://.../cgi-bin/script?name=me&address=here&time=now. The type attribute gives the style of object to be used. (See the following bulleted list.)

<INPUT name="text" type="" size=## value="text" CHECKED>

The CHECKED attribute is used with the CHECKBOX and RADIO types to signify whether the button is selected by default or not. The size attribute is used to set the window size of a text field (in characters).

<SELECT name="text" multiple>

        <OPTION value="text" selected>text

        <OPTION value="text">text

        ...

</SELECT>
Choice Selection

This presents a list of possible values for the field, itemized by the <OPTION> tag; normally it is displayed as a pull-down menu. The name and value fields are the same as for <INPUT>. The text following each <OPTION> tag is displayed in the menu. If no value attribute is given, text is returned, if that option is selected. The multiple attribute allows more than one option to be selected, and the selected attribute identifies the default choice. For example:

<SELECT name="text">

        <OPTION value="OPT1" selected>Option 1

        <OPTION value="OPT2">Option 2

</SELECT>
Multiline Text Input

This is similar to <INPUT TYPE="text">, but allows for many lines. The name attribute is the same as for <INPUT>, whereas the number values for the rows and cols attributes define the size. The text contained in the element is shown in the window by default.

<TEXTAREA name="text" rows=## cols=##>text</TEXTAREA>

Entities

Many characters that appear in documents can be impossible to enter in an HTML file, including characters that have special meaning to HTML (for example, the < and > characters) and international and typographic characters not found on most keyboards.

These characters can be included in documents using entities, pieces of text that together signify a single character. The general syntax includes an ampersand, a unique name for the character, and a semicolon. For example, Gr&ouml;ning produces Gr[am]oning. There are two general types, as described in the following sections. For a complete list, please see the Appendix.

Reserved Characters

Reserved characters are normal characters used for other purposes in HTML that can cause confusion if entered by themselves.

Entity


Displayed As


&lt;

Less-than sign (<)

&gt;

Greater-than sign (>)

&amp;

Ampersand (&)

&quot;

Quotation mark (") (usually not necessary)

International Characters

International characters are characters used in most European languages other than English, referenced by names from the ISO Latin 1 character set. A few examples follow:

Entity


Displayed As


&Aacute;

Capital A with acute accent (Á)

&ocirc;

Small o with circumflex accent (ô)

&AElig;

Capital AE ligature (Æ)

&ccedil;

Small c with a cedilla (ç)

The Future of HTML

By the time you read this, the specification for HTML Level 2 should be complete and most browsers should be using this specification as a standard. However, Level 2 does not represent the final form of HTML. This language will continue to evolve, adding new capabilities, for years to come.

Although the current version of HTML has many powerful features, it also has its disadvantages. Suggestions are constantly being given to the HTML working group, which considers them for inclusion into the standard. Enhancements will likely allow a larger variety of documents to be put on the Web, make documents look better, and easier to manage and use.

The Presentation Versus Structure Debate

The primary area currently evolving is in the formatting of documents. The debate is raging over how much control of the appearance of the document should rest in the hands of the user and how much should be decided by the publisher. Years of research have gone into graphic design and typography, and there are varied methods of using the appearance of text and graphics to communicate a particular message. To designers and publishers who have become experts in this art, it is important that the information contributor have a large degree of control over document appearance.

However, on the World Wide Web, the user can choose fonts, window sizes, colors, and many other presentation variables. Although this is a frustration to many publishers, it is an important part of the Web. Not all users have the same typefaces, colors, and screen area available, and must be able to make the WWW page fit their constraints. In addition, physical differences in users place special needs on the appearance of pages; for example, sight-impaired people might want to use very large type; a blind user does not see anything at all and has the document read aloud by the computer.

A compromise must be reached. Information providers need the capability to dictate a large part of the appearance of the document when it is important. On the other hand, users need to be able to override or alter this appearance when necessary. The primary goal of the Web is the dissemination of information; the content of the documents should always be more important than their appearance. Whatever can be done to improve comprehension by users, including both dictated and alterable appearance, is important to that dissemination as well.

For more information about proposed solutions to some of these problems, see Chapter 12, "Netscape Extensions and HTML 3.0."

Alternatives to HTML

It is doubtful that HTML will ever be able to provide all the creative design and functionality that true electronic publishing demands; it was never intended to do so. New file types have emerged to address these weaknesses. These file types are generally geared toward more specialized applications, and very little software currently exists for using them. However, some of them will likely become major (perhaps equal or superior to HTML) parts of the World Wide Web.

Although few of these alternatives are available today, they will soon be around, increasing the flexibility (and confusion) of the WWW. They will probably become most popular in niche markets that require very specialized information types (such as maps, diagrams, and technical illustrations) and with professional publishers who need detailed presentation control that can't, or shouldn't, be part of HTML.

What's Next

Now that we understand the basics of HTML, we are ready to move into some more advanced topics. The next chapter will cover graphics in much more detail and attempt to make sense of all the hype about multimedia on the Web. We'll have much more to say about HTML in Chapter 12, "Netscape Extensions and HTML 3.0" and in Chapter 13, "Putting HTML to Work Building a Sample Site."

Previous Page TOC Next Page Home