Previous Page TOC Next Page Home


1

Internet Technology Primer

For those who are new to all the jargon of the Internet, this chapter lays the foundation for the rest of the book. We introduce several terms that you will hear often as you go about choosing an Internet Service Provider (ISP) and building a Web site. If you come across any technical terms that we didn't define here, you might want to refer to the book's glossary.

Just about everywhere you look on the Internet, there are documents called Frequently Asked Questions (FAQs). The purpose of these text files is to answer common questions for newcomers. FAQs save the old hands from having to answer repeat questions and tell about their experiences over and over. Regardless of your level of experience, FAQs are always a good place to start whenever you are trying something new on the Internet.

Another advantage of the format of FAQs is that you can quickly scan through the questions to see what parts you are interested in, without necessarily having to read the entire document. For this reason, and because we have several miscellaneous topics to discuss in this chapter, we have borrowed the FAQ concept in our organization of this material. The questions we chose for this chapter are those that we wondered about when we first started using the Internet, and which we often get asked as we assist others coming to the Net.

In answering these questions, we have tried to go into somewhat more detail than the typical FAQ file on the Internet, but in no way do we cover each topic completely. Entire books are written about most of these subjects. What we have tried to do is provide only the fundamentals, because we know that your Web site project is waiting in the wings.

If you can't wait to get started or if you already have some experience with the Internet, feel free to skip or skim through this chapter. You can always come back to it later.

What Is TCP/IP?

If the Internet were alive, TCP/IP would be its bloodstream. This now-famous acronym stands for Transmission Control Protocol/Internetworking Protocol. TCP/IP provides a method for any computer on the Internet to send electronic packets of data reliably and efficiently to any other computer on the Internet. We are talking about millions of computers, with different CPUs and operating systems, being able to identify each other—even though they aren't directly connected. (In fact, some Internet computers aren't on a wire at all, but that's another story.) All packets must be able to cross company and international borders, and still find their destination.

The only way to accomplish all of this is for each computer to have an address. An address on the Internet is called a 32-bit IP address. The phrase 32-bit means that 4 bytes are used to hold the data, and the IP refers to Internetworking Protocol, as in TCP/IP. The addresses usually appear in dotted-decimal form, like this: 123.64.12.88. Each decimal number ranges from 0 to 255.

Most humans have better things to do than decode numbers such as those and determine who or what belongs to such an address. Therefore, the engineers and researchers who created the Internet devised several clever schemes that allow us to refer to computers by name, rather than IP address. This way, we can let the machines do the work of translating the numbers, and we can instead refer to computer resources in a somewhat more meaningful fashion, such as www.ibm.com and president@white-house.gov.

The first example (www.ibm.com) is called a fully qualified domain name, or FQDN. By now, you probably know that the second example is called an e-mail address. Even if you don't understand all the syntax, you can probably guess what these addresses refer to. www.ibm.com is the IBM home page on the Web, and the second (president@white-house.gov) is your high-speed direct link to the White House (handy when you've got something of national importance on your mind).

Both of these sample addresses include a domain name. The domain name is the last part on the right, such as com or gov. The Internet is divided into several domains, or hierarchies. This is part of the solution to the problem of how to accurately deliver electronic packets among the billion (or so) computers on the Internet. Think of it this way: A domain name is similar to the name of a state or country on the envelope of a piece of regular mail. In fact, some domain names are exactly that. Here are just a few of the common domain names you will come across:

See Figure 1.1 for an illustration of the domain name hierarchy.


Figure 1.1. The domain name system.

When we address an electronic packet of data, perhaps to a location on the other side of the world, the computers on the Internet will take turns passing the message along until it reaches its destination. Actually, not every computer between here and there gets involved—just the gateway computers. A gateway (also called a router) is a special kind of computer that is given the job of looking at an IP packet and determining whether to keep the packet for a computer on the local network or pass it to the next network in the chain and let it figure it out. Passing it along is called making a hop.

What if someone turns off one of the routers between point A and point B after our message has already begun its journey? What if a voltage spike flips a bit, which causes the address to become corrupted somewhere along the line? Or what if our data is so big that it has to be split up and sent in several partial packets? How will the recipient know when it has all arrived? These are just some of the reasons that TCP is used in conjunction with IP. IP carries the packets, and TCP is the accountant that makes sure they are all delivered. If a packet is lost or corrupted along the way, TCP will see to it that the packet is sent again.

What Is the OSI Networking Model?

TCP/IP delivers packets in a way that fits a standard model of how networks are supposed to work. The ISO (International Standards Organization) put together the OSI (Open Systems Interconnect) reference model so a convention would exist for the interfacing of network products. (Perhaps another reason they wanted to do this was to form an acronym palindrome.)

The model consists of seven layers, but not all of the layers are used in all cases. Nonetheless, as a reference model, it still provides an invaluable theoretical basis for all discussions of networking.

Table 1.1 provides a brief description of each of the seven layers of the OSI model. Note that layer 1 is contained in hardware and layers 2 through 7 are implemented in software. The list seems to be in reverse order. Network engineers generally view the flow of data as originating at the application layer and moving downward through the layers toward the hardware. Then the process is reversed when the packet arrives at the destination.

Layer


Implemented in


Description


Layer 7

Application

The programs users interact with to initiate network data transfer.

Layer 6

Presentation

Encrypts or decrypts data, packs or unpacks data, and converts data between formats.

Layer 5

Session

Determines when data transmission will start and stop.

Layer 4

Transport

Concerned with the quality of data on the circuit; includes error-checking protocols.

Layer 3

Network

Establishes the network route from the sender to the recipient.

Layer 2

Data Link

Provides for the bundling of several bits into a data frame.

Layer 1

Physical

Includes the specifications for the electrical signals and the transmission of bits.

Fortunately, there is a simple analogy that might help explain all of this. Imagine that you want to send a letter by regular mail, not e-mail. In this case, the piece of paper is the application. The envelope plays the role of layer 6. The mailbox is the session (layer 5), and the postal carrier is the transport (layer 4). The mail bag emulates layer 3, the network. The mail truck is the data link (layer 2), and the road serves as the physical medium (layer 1). Figure 1.2 illustrates this analogy. If these concepts still seem fuzzy, don't worry. It usually takes lots of experience working with different network products before the layer concept becomes clear.


Figure 1.2. The OSI reference model.

The work performed by each layer on the sending side is done in reverse by the corresponding layer on the receiving side. For example, if the presentation layer performs encryption on the sending side, the decryption will be done by the presentation layer on the receiving side.

The manner in which these layers are stacked on top of each other is why you often hear TCP/IP software vendors speak of the protocol stack. At each level, there are several alternative or complementary protocols from which to choose. In many cases, protocols at one level can interface interchangeably with protocols at the next level.

The seven-layer structure isn't always strictly observed. In fact, TCP/IP is a very prominent example of divergence from the standard. Many experts consider these five layers to more closely reflect how TCP/IP actually works:

Layer 5

The Application layer, same as OSI (for example, FTP). If encryption or compression is needed, it is done in the application layer, rather than the absent Presentation layer.

Layer 4

The Transport layer; TCP builds or reads a packet.

Layer 3

The Internet layer; IP builds or reads a packet.

Layer 2

The Network layer, similar to the OSI model.

Layer 1

The Physical layer, same as OSI.

What Is a Listserver?

Listservers (also called listservs and mailing list servers) support a group of people (called subscribers) who like to share e-mail with each other on a given topic. For example, the San Diego Windows NT User's Group runs a listserv that is intended for posting items of interest to NT users. Basically, when you send e-mail to the listserver, it will send it to all the other subscribers of the listserv. See Figure 1.3.


Figure 1.3. How a listserver works.


It is important to understand that a listserv has two e-mail addresses. The first address is used to start or cancel subscriptions. The second address is used for posting messages to all subscribers. Be sure to check on the details of how to subscribe to a listserv before you accidentally send your message to the wrong address. Sending inappropriate e-mail to an Internet group, even accidentally, can result in your own inbox filling up with angry protests. This is called being flamed. If it should ever happen to you, it is generally best to let the issue wither rather than reply again. Some useful Windows 95 and Webmaster list servers are mentioned in Appendix C.

What Is a Newsgroup?

Newsgroups are carried by a part of the Internet called Usenet. Like a listserver, each newsgroup is dedicated to a particular subject matter. By last count, there were more than 16,000 newsgroups on the Internet. The topics cover just about everything, ranging from discussions of computers to politics to sports to the very, very strange. Most newsgroups fall under one of the following top-level classifications:

The classifications tell you a little bit about what kind of newsgroups you might expect to find in a certain area. Newsgroup names are organized in a hierarchy (computer scientists just love tree structures) so that the nature of the topics covered gets somewhat clearer as you follow the name from left to right. For example, there are hundreds of comp newsgroups that discuss computers. Underneath comp are several newsgroups which discuss protocols. One of them is called comp.protocols.tcp-ip, which as you might expect, carries conversations (also called articles) about TCP/IP.


The first time you get exposed to the newsgroups, you will likely be in awe. There are two reasons for this, and you may experience both in the same day. You'll realize the incredible potential of the newsgroups as a business resource and research tool; and you'll be shocked by some the profanity or pornography content of some of the alt newsgroups. Just remember that the Internet is used by all kinds of people.

We'll have more to say about how you can tap the potential of the business and technical newsgroups in Chapter 7, "Webmaster's Guide to FTP and the Newsgroups."

What Are FTP and Anonymous FTP?

The File Transfer Protocol (FTP) was invented by the UNIX community for the simple purpose of bidirectional file transfers between computers. Like most Internet software, FTP operates in a client/server fashion. Until a couple of years ago, you had to know a little bit about UNIX to run an FTP client program. Today, there are excellent GUI versions—such as the shareware Windows FTP client included with this book—that make it as easy as drag and drop. Windows NT also includes a command-line FTP client and, of course, an FTP server.

A server that runs FTP will usually designate a user account for each of the people it intends to grant access. Each client is required to enter his or her name and password when logging into the FTP server. This is to protect the server from having sensitive files taken by unknown users. After you have an account with an FTP server, you can copy files to and from the server whenever you want.

The problem with this is that some sites maintain libraries of public domain software and information. Such sites like to make these files available to anyone who is interested, but they don't know beforehand who their clients will be. And even if they did know, it would be a nightmare to try and maintain such a user database, because some sites are visited by tens of thousands of users.

The answer to this dilemma is anonymous FTP. It works like regular FTP, but with one simple twist. You, the client, sign in with the word anonymous for the FTP user name and, by convention, enter your e-mail address as the FTP password. (This convention was developed to permit FTP servers to track who is visiting a site, but we have never heard of a server that actually did any processing of the e-mail addresses.) After you log in anonymously, you will usually have restricted access to the file system of the server machine. It should, however, permit you to navigate to the directories you need.

FTP file transfers are always initiated by the client and can be executed in ASCII or binary mode. If you know the file you are going to upload (send to the server) or download (copy from the server) is an ASCII text file, you can actually use either method because all computer files are binary files—but ASCII transfers will be slightly more efficient for ASCII files. On the other hand, if the file isn't ASCII, you must transfer it using binary mode.

How Can I Find Things on the Internet?

Long before the Web, there was Gopher. Although it is still widely used, many folks consider Gopher to be the aging predecessor of the World Wide Web. Gopher servers provide menus for selecting text documents that are available online.

As with a Web browser, a Gopher client enables you to navigate deeper and deeper into the Internet until you find what you are looking for. Gopher differs from the Web in that you must follow layers of menus until you finally reach a document; whereas with the Web, the documents themselves can provide you with links to other documents.

With all the Web and Gopher servers, the Internet has gotten so enormous that no one can possibly keep track of everything it has to offer. When you want to find some information, any kind of information, it's probably a safe bet to assume that it exists on the Internet, somewhere. The problem is finding it. Over the years, the Internet community has developed several solutions to this problem. Each solution is a tool, and it's best to know the right tool for the job. Here are three tools for searching the Internet:

Archie

Archie searches for filenames or directory names at anonymous FTP sites that contain the word you specify. The Windows Archie client program included with this book includes a nice GUI and is preloaded with a list of archie servers. Archie can also be invoked from a Web browser, a Gopher menu, or even through e-mail if you know the e-mail address of an archie server.

Veronica

This stands for Very Easy Rodent-Oriented Net-wide Index to Computerized Archives. Whew! Veronica is usually run from a Gopher menu. You give it a word to search for, and it will come back with a list of Gopher menu items that contain that word. You can also search more specifically for directory names or filenames.

WAIS

This name stands for Wide Area Information Server. WAIS clients conduct searches of databases that are indexes of files contained on the server. Webmasters use WAIS to make their Web sites searchable by keyword. We will cover this topic more fully in Chapter 19, "Databases and the Web."

As you are probably aware, the Web is a great way to find things on the Internet. This is becoming even more true as the number of sites using WAIS and the number of dedicated search sites increases. For more information about Web search pages, see the section later in this chapter titled "How Can I Learn More About the Internet?"

What Is the Difference Between SLIP and PPP?

SLIP, which stands for Serial Line Internet Protocol, was invented in the early 1980s to transmit packets over a serial interface, such as a modem. It was designed for simplicity and efficiency. SLIP's lack of error-checking and flow control led to the recent development of PPP, which is a more robust protocol.

PPP stands for Point-to-Point Protocol. Among other things, PPP includes authentication, error-checking, and flow control. These features enable PPP to deliver link-layer functionality similar to that found in an Ethernet LAN. When you have a choice, PPP is the preferred way to go. Windows 95 Dial-Up Networking supports both SLIP and PPP.

By the way, SLIP and PPP are referred to as line protocols because they are concerned with the reliability of the network circuit, whereas TCP and IP are referred to as data protocols because they are designed for the purpose of application data transfer. The line protocols operate at level 2 in the OSI model, and the data protocols operate at levels 3 and 4.

Which Is Right for Me: Switched-56, X.25, Frame Relay, ISDN, T1, T3, or ATM?

Before explaining each of these services separately, we'll start by laying the foundation for all of them. Other than the analog phone lines used by modems and dedicated digital lines, such as T1 (see below), there are three kinds of switching services offered by the phone companies for computer networks:

The term switching means that you don't own or lease a dedicated line, although you do pay for the availability of a certain minimum bandwidth. Switching technology is available thanks to high-speed computers and very sophisticated software developed by the phone companies. Because the bandwidth or resource unit is constantly switched from one customer to another based on demand, the cost to each individual is significantly reduced. Switching is similar to time-sharing.

Remember, all of these technologies transmit digital data directly without conversion to analog. Some of them carry voice traffic as well, but that is also digital over fiber-optic media.

Switched-56

Switched-56 is a circuit-switched technology that cannot carry voice. For several years, this has been an intermediate cost and performance point between analog lines and dedicated lines (T1). A connection option using this service is offered by many Internet Service Providers (ISPs.)

X.25

X.25 is a set of packet-switching protocols that include extensive error-checking designed for when networks were less reliable than they are today. It is not usually offered by Internet Service Providers.

Frame Relay

Frame relay is a fairly new packet-switching technology. It is much more efficient than X.25 because it avoids packet acknowledgment and error-checking, thereby saving the overhead incurred by those features. Frame relay is becoming very popular for WANs in and around a metropolitan area, because it is much less expensive than T1.

ISDN

With a modem and phone line, you can send computer data to anyone in the world. The purpose of the modem is to convert digital data to analog so it can be carried on a standard phone line before being converted back to digital at the receiving end. If you think that process sounds somewhat convoluted, you're right.

Integrated Services Digital Network (ISDN) promises to change all that for business users and home users alike. ISDN is a set of protocols that enables the phone companies to carry computer data directly in its native digital form without having it converted back and forth between analog. Transmission of digital data provides for better performance and a significantly reduced likelihood of errors. Furthermore, ISDN can carry voice as well as data, and some ISDN hardware for PCs will even let you make two or three calls at once!

When purchasing ISDN, there are two price points: Basic Rate Interface (BRI) and Primary Rate Interface (PRI). BRI consists of two 64-Kbps data channels and a third channel used by the phone company for call management. The two data channels are called B channels, and the call control channel is called a D channel. The D channel rate is 16 Kbps.

PRI consists of 23 B channels and one 64 Kbps D channel.

T1 and T3

These are leased-line services that provide for very high-speed dedicated connections between two points. The Internet backbone relies on T3, which has a throughput rate of 45 Mbps. T1, which runs at a rate of 1.544 Mbps, is used in regional backbones and is also offered by many ISPs for customers with the need—and the money—for a wide-bandwidth connection.

ATM

Like Frame Relay and X.25, Asynchronous Transfer Mode (ATM) enables multiple logical connections to be multiplexed over a single physical interface. The information flow on each logical connection is organized into fixed-size packets, called cells. As with Frame Relay, there is no link-by-link error control or flow control. ATM takes full advantage of the high data rate of fiber-optics, and it allows for the dynamic selection of data rates. ATM delivers the greatest performance today and has the greatest potential performance, but it is still considered too futuristic by some. To be quite honest, we aren't ATM experts—so we can't tell you why some people consider ATM impractical. We can tell you that there are several books dedicated to subject of ATM, and this isn't one of them.

How Fast Is the Internet?

This is actually an open-ended question because there are many measures of speed. But as it turns out, there is one answer which is quite fascinating.

A program called ping shows that an average Internet packet travels the distance from California to Japan and back in about one second! Knowing this distance is about 6,000 miles, you can say that this happens at roughly the speed of 12,000 miles per second, give or take a smidgen. This speed is slightly slower if the packet must pass through an orbiting satellite.

Even more amazing is the fact that the packet could be picked up by perhaps 20 gateway computers (or routers) along the way (as reported by the program tracert), each one trying to figure out where to send it next. Then another 20 gateways, not necessarily the same as those on the first leg of the journey, go to work sending the packet back to the origin—where your ping program can calculate the round-trip time.

And if that isn't mind-boggling enough, consider that each of those 40 computers had to wait until the network wire was completely clear of all traffic before they could put the packet back on the line and aim it at the next computer! Not only is this stunning speed, but the fact that it works at all is remarkable.


Ping and tracert are mentioned in Chapter 16, "Maintaining and Tracking Your Web Site" and in Appendix A of the Windows 95 Resource Kit published by Microsoft Press.

To answer the question of Internet speed another way, we have prepared Table 1.2 to show the speeds of many commonly found network technologies.

Network Technology


Bandwidth,


Time (in seconds) to


or connection speed


transfer 30K Web page


V.32 or V.42 modem

14.4 Kbps

17.067

V.34 or V.FC modem

28.8 Kbps

8.533

Switched-56

56 Kbps

4.285

ISDN BRI

56 Kbps–128 Kbps

1.875 to 4.285

Frame Relay

56 Kbps–128 Kbps

1.875 to 4.285

ISDN PRI

56 Kbps–1.5 Mbps

0.156 to 4.285

T1

1.544 Mbps

0.152

Ethernet LAN

10 Mbps

0.0234

T3

45 Mbps

0.0052

FDDI

100 Mbps

0.0023

Fast Ethernet

100 Mbps

0.0023

ATM

50 Mbps—622 Mbps

0.0004 to 0.0015

Because a typical Web page might consist of about 8 KB of text and 22 KB of image data, we have arbitrarily chosen a document size of 30 KB to estimate the time it would take to travel from the server to the client. We caution the reader that this is for illustration purposes only! This is not a realistic example because there are many other factors that will contribute to the total time to deliver a document on the Internet. In fact, the total time will always be slower than what is shown in the third column of Table 1.2 because other network traffic will prevent a single file transfer from owning the whole bandwidth. Also, it is usually safe to assume that there are several intermediate hosts between the server and the client, and you might not know the link speed in use between each of those hosts.

Is a Windows 95 Web Server More at Risk from Internet Hackers than an NT-Based Server?

The Department of Defense defines seven levels of security for computer systems: D1 (least secure), C1, C2, B1, B2, B3, A (most secure). Windows NT has been rated C2. DOS and Macintosh System 7.x weigh in at the D1 level. Most UNIX systems are rated C1 (below Windows NT).

The C1 level requires that users must log into the system with protected passwords. Once logged in, users are not given unlimited access to the file system unless the system administrator has chosen to configure their account with such a level of permission.

C2 extends the C1 level through auditing. This is an important concept because it permits the system administrator to track key events in the system and analyze them for security holes. Windows NT has excellent auditing capabilities; but alas, this will not protect the Web site from the threats of hackers.


It is important to point out that not every machine running Windows NT would be automatically classified as meeting the C2 standard. NT is always capable of hosting a C2 certified computer site, but many factors go into the rating—some of which are beyond the realm of the software. For example, a secure server isn't C2 if it is located in a public area.

Windows NT is a much more secure operating system than Windows 95, but many of those features aren't actually vital to protecting your site from hackers. Hackers use many techniques to break into computers—and they are inventing new techniques all the time. We suggest that you build your security policy around the server software you run on the Internet, rather than the operating system itself. For example, the Web servers, FTP servers, and SMTP server included with this book provide several security configuration options to protect your site.

Still, there is one kind of security risk that neither these servers nor Windows 95 can help you with—the risk of an attack by a software virus. For this, we recommend that you run an anti-virus program in the Windows 95 StartUp group and take care when you install new software from the Internet onto your hard drive.

See Chapter 20 for more information about Windows 95 security issues. For a detailed discussion of Internet security, see the excellent reference "Internet Firewalls and Network Security," published by New Riders.

What Is so "Hyper" About HTML?

HyperText Markup Language (HTML) is a subset of Standard Generalized Markup Language (SGML). HTML embeds codes into a document to highlight its features and structure for subsequent display. HTML was invented in 1990 at CERN, the European Particle Physics Laboratory in Switzerland. The invention of Mosaic, the famous Web browser that runs on several different computer platforms, fueled the growth of the Web because it finally made the Internet seem more graphical than cryptic to the average user.

The purpose behind HTML is to permit documents to contain electronic links to other documents of relevancy. A document can link to another text, image, audio, or video file. HTML may have acquired the "hyper" in its name from the Asteroids video game, popular during the 1980s, which included a hyper-space button for vaulting a player in danger to some random location. It was a fascinating game because the player was never sure there would be any less danger upon his or her arrival at the surprise destination.

An electronic link from one document to another is similar to pressing the hyper-space button in Asteroids because you can be vaulted away from the file you are currently reading. Of course, you hope that you aren't going to be thrown randomly. Unfortunately, Web browsing can sometimes seem random to the newcomer. Some pages on the Web give you little information to go on when you are considering a leap to another document. This is usually only a matter of unorganized Web page design. At its best, the Web can quickly carry you to the exact information you are looking for. This situation will continue to improve as more sites adopt HTML style guidelines and incorporate search tools such as WAIS. For those who still doubt that the Web is a fast way to look up information, consider the search time of a trip to the public library to find a book.

We should mention two other Web browsers: Lynx is a text-only browser for computers that lack graphical displays, and Netscape Navigator is generally considered king of the hill. Netscape followed in the footsteps of Mosaic and blazed new trails, such as incorporation of FTP download, newsreader capabilities, and now Java, just to name a few. If any of these terms are new to you, please be patient as we will get to them in due time.

Let's get back to the point of how the Web really works with hypertext documents. The interaction between HTML and HTTP can be explained most easily with an illustration. (See Figure 1.4.)

  1. The user enters the URL (Uniform Resource Locator, sort of like a symbolic name for an IP address) of a neat new Web page into their Web browser (client). Or the user clicks an underlined word that serves as a link to another Web page.

  2. The request is carried through the Internet to the Web server referenced by the Fully Qualified Domain Name.

  3. The Web server looks up the requested HTML page using the supplied pathname and sends the file back through the Internet to the client.

  4. The client Web browser stores the file on the local machine temporarily, interprets the HTML contents of the file, and displays it on the screen.


Figure 1.4. How HTML works.

How Can I Learn More About the Internet?

Actually, the Internet itself is a great way to get more information. Throughout this book we reference the URL of Web pages and FTP sites where additional information and files can be found to assist you in building a Web site.


When entering a URL into your Web browser, remember that some parts of it might be case-sensitive. Also don't follow a URL with a period. Finally, some URLs require a trailing slash at the end.

For the technical-minded reader, the FQDN portion of a URL is not case-sensitive, but the pathname portion is case-sensitive on UNIX systems.

Sometimes sites will move information during reorganization, and the URL will no longer work. This is often called an expired link. We try to list only links that are known to be stable; but if you have trouble with any of the links in this book, try using one of the online search pages to look for alternatives.

Search pages are special Web sites dedicated to helping you find anything that the Web has to offer. Here are a few good search pages:

What's Next

You are armed with this basic information about the Internet, and we are now ready to talk about the hardware and software you'll need. We'll finish up Part I with a discussion about getting connected to the Internet.

Previous Page TOC Next Page Home