Perl is an interpreted language designed for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It is also a good language for many Web site system management tasks. This chapter
shows you how to use it for a few handy CGI scripts. WWWusage is an excellent example of a powerful CGI script written in Perl.
Perl was developed to cover the gap between low-level programming languages, such as C, and the high-level languages such as AWK, SED, and the UNIX Shell. Although C is a very powerful language, it requires a steep learning curve to master. Perl does
not offer the speed of a compiled language, but it does offer very good string-handling capabilities and a faster learning curve. Most people with experience in any of the languages mentioned will find Perl to be an easy migration.
Perl currently comes in two versions: Perl 4 and Perl 5. Version 5 is the new kid on the block, and it comes with object-oriented extensions. Perl was invented on UNIX by Larry Wall. Keep in mind that Perl has always had strong roots on UNIX platforms,
and most of the Webmasters who use it and post public-domain Perl source code work on UNIX. Fortunately, some nice folks have ported Perl to Win32 and made it available by anonymous FTP.
Dick Hardt of Hip Communications led the port of Perl 5 to Windows NT/95. We have run Perl 5 on two Windows 95 test platforms with no difficulties, and we have included it on the CD. You can also obtain the latest version from
ftp://ntperl.hip.com/ntperl/. This site contains the Visual C++ 2.0 source code for Perl, binary files, and documentation.
You can retrieve Perl 4 for Windows NT at the FTP site of Intergraph. Point your Web browser or CuteFTP to this URL: ftp://ftp.intergraph.com/pub/win32/perl/. You may want to download the file ntperlb.zip (if you only want the compiled version) or
ntperls.zip (if you want the Visual C++ 2.0 source code). Using Perl 4 on Windows 95 requires a patch (which you can find at Yahoo) developed by Bob Denny.
This book is already covering a great deal of information, and the authors don't intend to deluge you with a complete course on the Perl language. What this chapter does do is give you a quick introduction to the Perl syntax and show you how to put some
Perl scripts to work so you can jump right in.
There's so much to cover and only one chapter to do it. We caution you that this material is not intended for those who are new to programming. In fact, we are going to take somewhat of a hit-and-run approach and present the material largely in
reference format. The second half of the chapter contains a very useful sample application.
Table 21.1 presents the most common symbols unique to Perl and their meaning.
Symbol |
Purpose |
$ |
For scalar values. |
@ |
For indexed arrays. |
% |
For hashed arrays (associative arrays). |
* |
For all types of that symbol name. These are sometimes used like pointers in perl4, but perl5 uses references. |
<> |
Used for inputting a record from a filehandle. |
|
|
This section lists the basic components of a Perl script. The first line of every Perl program is a required special comment to identify the file location of the Perl interpreter itself. For example:
#!/usr/local/bin/perl
This list shows the predefined data types:
Used for characters, strings, and numbers.
Table 21.2 lists several predefined variables and reserved characters in Perl.
Variable |
Purpose |
$0 |
Contains the name of the script being executed. |
$_ |
Default input and pattern search variable. |
$/ |
Input record separator, newline by default. |
@ARGV |
Contains command line arguments. $ARGV[0] is the first argument. |
@INC |
Contains the list of places to look for scripts to be evaluated by the do or require commands. |
%INC |
Contains entries for each file included by the do or require command. |
%ENV |
Contains your environment settings. Changes made affect child processes. |
STDIN |
Default input stream. |
STDOUT |
Default output stream. |
|
|
Table 21.3 lists the common mathematical operators.
Operator |
Example |
Meaning |
+ |
$a + $b |
Sum of $a and $b |
- |
$a - $b |
Difference of $a and $b |
* |
$a * $b |
Product of $a times $b |
/ |
$a / $b |
Quotient of $a divided by $b |
% |
$a % $b |
Remainder of $a divided by $b |
|
|
|
Perl supports a rich array of assignment operators for many purposes. If the list in Table 21.4 seems overwhelming, try to stick to the easy ones and learn about the others after you have more experience with Perl programming.
Operator |
Example |
Meaning |
= |
$var = 5 |
Assign 5 to $var |
++ |
$var++ or ++$var |
Increment $var by 1 and assign to $var |
-- |
$var-- or --$var |
Decrement $var by 1 and assign to $var |
+= |
$var += 3 |
Increase $var by 3 and assign to $var |
-= |
$var -= 2 |
Decrease $var by 2 and assign to $var |
.= |
$str .= "ing" |
Concatenate "ing" to $str and assign to $str |
*= |
$var *= 4 |
Multiply $var by 4 and assign to $var |
/= |
$var /= 2 |
Divide $var by 2 and assign to $var |
**= |
$var **= 2 |
Raise $var to the second power and assign to $var |
%= |
$var %= 2 |
Divide $var by 2 and assign remainder to $var |
|
|
|
The logical operators in Perl (shown in Table 21.5) are useful in If statements typical of nearly all programming languages.
Operator |
Example |
Meaning |
&& |
$a && $b |
True if $a is true and $b is true |
|| |
$a || $b |
True if $a is true or if $b is true |
|
|
|
Pattern matching is one of the areas in which Perl shows its strength. These operators, shown in Table 21.6, are very useful for string operations.
Operator |
Example |
Meaning |
=~ // |
$a =~ /pat/ |
True if $a contains pattern "pat" |
=~ s// |
$a =~ s/p/r |
Replace occurrences of "p" with "r" in $a |
=~ tr// |
$a =~ tr/a-z/A-Z |
Translate to corresponding characters |
|
|
|
String operators in Perl, as shown in Table 21.7, are the mainstay of the language.
Operator |
Example |
Meaning |
. |
$a . $b |
Concatenate $b to the end of $a |
x |
$a x $b |
Value of $a strung together $b times |
substr() |
substr($a, $o, $l) |
Substring of $a at offset $o of length $l |
|
|
|
The relational operators shown in Table 21.8 are essential to If and While statements.
Numeric Operator |
String Operator |
Example |
Meaning |
== |
eq |
$str eq "Word" |
Equal to |
!= |
ne |
$str ne "Word" |
Not equal to |
> |
gt |
$var > 10 |
Greater than |
>= |
ge |
$var >= 10 |
Greater than or equal to |
< |
lt |
$var < 10 |
Less than |
|
|
|
|
Here are several predefined Perl commands that you will come across repeatedly.
Formatting is as follows:
Conversion Character |
Definition |
%s |
string |
%c |
character |
%d |
decimal number |
%ld |
long decimal number |
%u |
unsigned decimal number |
%ul |
unsigned long decimal number |
%x |
hexadecimal number |
%lx |
long hexadecimal number |
%o |
octal number |
%lo |
long octal number |
%e |
floating-point number in scientific notation |
|
|
A Perl module is a set of functions grouped into a package that deal with a similar problem. You can use module functions in a Perl script by telling your script the name of the module with the use command (For example, use CGI;.)
One example of a Perl module is the CGI.pm module. This file includes functions that provide an easy interface to CGI programming, enabling you to write HTML forms and easily deal with the results. For more information about CGI.pm, visit its home page
at http://www-genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html. This page has information about the functions available and examples of how they are used.
To run a Perl program, you can type the script name at the command prompt. Here are several example commands that you can use for debugging scripts:
Two of the most common uses of Perl by Webmasters are statistical analysis and forms processing. This section and the next present two Perl CGI scripts that prove very useful for these purposes.
As a Webmaster, you want to know who's coming to your site, how often, and what they are doing there. To accomplish this, the examples use the Perl programming language interpreter and the WWWusage CGI application.
Actually, before getting into Perl, let's mention a very interesting tool that can help you chart your Web site statistics without requiring any custom programming. It will analyze your Web page usage based on your server log files. A company called
Logical Design Solutions has invented a cool program called WebTrac. If you try the program and you like it, you can get it for free. However, they are accepting donations for Save The Children, which they claim is a top-ranked charitable
organization. This is a very innovative way to distribute software, and it's for a good cause. Visit their home page at http://www.lds.com.
WWWusage is a Perl script written by Richard Graessler (rickg@pobox.com) to analyze and calculate monthly usage statistics from log files generated by World Wide Web servers. This application is designed for use with
Windows NT, but we made a few modifications to allow it to run on Windows 95. Once the script is customized for your Web server (which we are going to show you how to do), WWWusage should work on any Windows 95 system with NT Perl 5.001 installed.
WWWusage will process HTTPS access log files in the Common Log Format and output monthly statistics in HTML format ready for publishing on the Web. It creates reports on any or all of the following:
WWWusage does not make any changes to the access log files or write any files in the server directories (with the exception of two output HTML files per month).
Gone are the days when every Web server used its own proprietary log file format. That made it difficult to write general statistics collectors. Therefore, the Web designed the Common Log File Format, which will soon become the default, if it hasn't
already.
Here is the format of each line in the logfile: remotehost rfc931 authuser [date] "request" status bytes
At present, many Windows 95 Web servers follow the Common Log File Format. Here are just a few that do:
One point of difference among these servers is how they handle the log file. Some of the servers use a single log file, which can be automatically or manually closed (sometimes called cycled). Others have a single log file for each day, so that
there is no need to cycle the log file.
There are some other great Perl analysis scripts for HTTPS log files on the Net. Just check Appendix D, "Internet Resources for the Win95 Webmaster," or search Yahoo for "CGI" or "PERL".
WWWusage will generate a new statistics page each month. Figure 21.1 shows you a sample of the HTML page generated by WWWusage.
Figure 21.1. The output of WWWusage is easy to read.
Listing 21.1 shows the configuration section of the file wwwusage.pl. You can find the file on the CD. All you need to do is read the comments in the source code to determine the modifications you need to make to customize the program for your site.
# WWWusage - Perl script: script to calculate monthly usage statistics # from log files generated by the Windows NT World Wide Web servers #(https). # Copyright (c) 1995 Richard Graessler (rickg@pobox.com) # For the latest version, DOCUMENTATION and LICENSE AGREEMENT see # <URL: http://pobox.com/~rickg/rickg/wwwusage/wwwusage.html> # This program is provided "AS IS", WITHOUT ANY WARRANTY (see License # Agreement) # Bug reports, comments, questions and suggestions are welcome. Please # mailto rickg@pobox.com with the "subject: WWWusage" but please check # first that you have the latest version. # CREDITS: # There are some other Perl log file analysis scripts on the net: # Roy Fielding's wwwstat # <URL: http://www.ics.uci.edu/WebSoft/wwwstat/> # Nick Phillips's musage # <URL: http://www.blpes.lse.ac.uk/misc/musage.htm> # Steven Nemetz's iisstat # <URL: #ftp://ftp.ccmail.com/pub/utils/InternetServices/iisstat/iisstat.html> # Looking into these scripts helped me to write this script and there # might be still # some parts based on them. # Requires timelocal.pl which is included in the Perl disribution #package. # Thanks to the authors! # ###################################################################### # Program internal variables (please do not change!) ###################################################################### $VERNAME = 'WWWusage'; # Program name $VERSION = '0.97'; # Program version $VERDATE = '3 November 1995'; # Program version date ###################################################################### # Present setting ###################################################################### # In Perl for Windows NT you can use forward slash (/) or double #backslash (\\) # in pathnames (e.g. C:/LOGS/ or c:\\LOGS\\). File and path names #could be absolute # (e.g. C:/LOGS/) or relative to current directory (e.g. ./LOGS/). # hostname of www server (HTTPS) $ServerName = 'Your Web Servers Domain Name'; # flag - specifies the log file format # 1 : common log file format, 0 : EMWAC HTTPS # Both Purveyor and FolkWeb use Common log format $LogFormat = 1; # file containing the country-codes to allow expansion from domain to #country name. $CountryCodeFile = 'c:/purveyor/country-codes.txt'; # Pattern used to recognise log files translated into a Perl regular #expression, e.g. ('.+\.log' = *.log), ('ac.+\.log' = ac*.log). if #your https #have only one log file simply set "access.log" # Note: If you have more than one log file the script assumes that the #alphabetical order of the filenames is the same as the chronological #order. $Log filePattern = '.+\.log'; # Directory containing log files (without ending slash!) This is a #change from the original version for Windows NT. Using WWWUsage on #Windows 95 requires that you not end with a trailing slash. $Log fileDir = 'c:/purveyor/log'; # This var was added for Windows 95 so we could add the trailing slash # later on. $slash = '/'; # filename (incl. path and arguments if necessary) of shell for #unpacking archives. # Note: If you use this feature please note that the archive contains #only the log files # for a single month and that you didn't analyse archives and normal #log file at the same time. $Gzip = 'gunzip -c'; # Gzip Format: *.gz, *.Z $Zip = 'unzip -p'; # Zip Format: *.zip $Tar = 'tar -x -O -f'; # Tar Format: *.tar # WWWusage directory to write statistics reports (without ending #slash!) $OutPutDir = 'c:/purveyor/wwwusage'; # WWWusage Error file name including path $ErrorFile = 'c:/puveyor/wwwusage/WWWusage.log'; # Filename without extension for HTML main output file (e.g. #"WWWusage" or "index") $MenuFile = "wwwusage"; # Extension for HTML output files $HTMLextension = "htm"; # show top nn statistics in main output, the detail output contains #all (e.g. 20) $Top = 20; # format of the output HTML page (0 = <PRE></PRE>, 1 <TABLE></TABLE> $HTMLOutput = 1; # flags - disable if you don't want that output $DoDomain = 1; # Transfers by Client Domain (top level) $DoDomain2 = 1; # Transfers by Client Domain (second level) $DoSubdomain = 1; # Transfers by Client Sub-domain $DoHost = 1; # Transfers by Client Host $DoFileType = 1; # Transfers by File Type $DoFileName = 1; # Transfers by File Name (URL) $DoHTTPSMethod = 1; # Transmission Statistics HTTPS Method $DoStatusCode = 1; # Transmission Statistics Status Code $DoDaily = 1; # Transmission Statistics Day $DoWeekdaily = 1; # Transmission Statistics Weekday $DoHourly = 1; # Transmission Statistics Hour $DoIdent = 2; # Transfers by Remote Identifier # NOTE for $DoIdent: For security reasons, you should not publish to #the web # any report that lists the Remote Identifiers (rfc931 or authuser): # 0 : no display, 1 : real user name, 2 : cookie name # flag - disable if you don't want to create the detail statistics to #save time $DoDetail = 1; # flag - disable if you don't want to create links to your accessed #pages $FileNameHREF=0; # User specific parameters for the TABLE tag $HTMLTable = 'Border=2 CELLPADDING=8 CELLSPACING=5'; # User specific backgrounds for all returns. Here you can set all #elements of the # body tag which can appear between "<BODY ... > in HTML format. $HTMLBackground = 'BACKGROUND="/gif/bg0.gif" BGCOLOR="#63637b" TEXT="#ffffff" '.'LINK="#00ffff" ALINK="#ff0000" VLINK="#ffff00" HRCOLOR="#ff0000" '; # User specific header for all returned HTML pages in HTML format @HTMLHeader = ( '<P><CENTER><A HREF="/image/ntrick.map"><IMG BORDER=0 HSPACE=10 ALIGN=MIDDLE ','SRC="/gif/ntrick.gif" ALT="Rick\'s Windows NT Info Center" ISMAP WIDTH=550 ','HEIGHT=44></A></CENTER></P>' , '<H1><CENTER>World Wide Web Server Usage Statistic</CENTER></H1><HR>' ); # User specific footer for all returned HTML pages in HTML formats @HTMLAddress = ('<HR><HR><A NAME="Bottom"></A><A HREF="/image/address.map" >', '<IMG BORDER=0 HSPACE=10 ALIGN=MIDDLE SRC="/gif/address.gif" ',' ALT="Addressbar" ISMAP WIDTH=293 HEIGHT=31></A>'); # flag - disable if you don't want a detailed output on the console $VerboseMode = 1; # flag - disable if you don't want to see the skipped lines of the #log files on the console. $ShowSkippedLines = 1; # flag - disable if you don't want to show unresolved addresses. $ShowUnresolved = 1; # flag - 0: disable if you don't want to look up dns name if ip address #is given. # 1: if you don't want to look up new dns name but used the #saved dns names. # 2: if you want to look up new and old unresolved dns name # 3: if you want only to look up new dns name $LookupDnsNames = 3; # file containing DNS names (will be created and updated by the #script). $DnsNamesFile = 'c:/purveyor/wwwusage/dns-names.txt'; # flag - disable if you don't want to sort the host list to save time. $SortHostList = 0; # flag - disable if you don't want to encode filenames. $UrlEncode = 0; # flag - disable if you don't want to detect on disk if filename is a #directory or file. If flag is set, you should run the script on your # HTTPS machine. $FileCheck = 0; # flag - enable it if you https automatically add a "/" to slashless #dirs. #(1 for EMWAC HTTPS, Netscape, Purveyor, and FolkWeb // 0 for Alibaba) $DirWorksWithSlash = 1; # Real directory name of document root of the www server (without #ending slash!). $DocumentRoot = 'c:/purveyor'; # list of configured "default/index" filename(s) for your HTTPS @DefaultHTML = ('index.html','index.sht','default.htm').; # flag - enable to convert all filenames (URLs) to lower case. $FileNamesToLowerCase = 1; # Time zone information. Only necessary for EMWAC log file format. # If not set it will be computed. Format: "+0100" or "-1100" $TimeZone = "+0800"; # Exclude filter: optional list of IP addresses to ignore, please #include ip number as well as dns name(s) in the list! IP number will # be checked forward, DNS names will be checked backward. Perl #expressions are possible. # (e.g. "137.226" = "137.226.*.*", "rwth-aachen.de" = "*.wzl.rwth- #aachen.de") @IgnoreHost = ('Your IP Address Here','Your Domain Name Here'); # Include filter: optional list of IP addresses to focus on, please #include IP number as well as dns name(s) in the list! IP number will #be checked forward, DNS names will be checked backward. Perl #expressions are possible. # (e.g. "137.226" = "137.226.*.*", "rwth-aachen.de" = "*.rwth- #aachen.de") # @FocusOnHost = ('137.226.92.4', 'wzl-ps4.wzl.rwth- #aachen.de','rick.wzl.rwth-aachen.de'); # Exclude filter: optional list of paths/files to ignore. Paths will #be checked forward from the beginning of the url filename. Perl #expressions are possible. @IgnorePath = ('/gif/','/images/'); # Include filter: optional list of paths/files to focus on. Paths will # be checked forward from the beginning of the url filename. Perl #expressions are possible. # @FocusOnPath = ('/rick/'); # Exclude filter: optional list of file extensions to ignore. #Extension will be checked backward from the beginning of the url #filename. Perl expressions are possible. @IgnoreExt = ('gif','jpeg','jpg'); # Include filter: optional list of file extensions to focus on. #Extension will be checked backward from the beginning of the url #filename. Perl expressions are possible. @FocusOnExt = ('.htm','html'); # Alias list for virtual paths. Key: path names relative to disk #root. Value: path names relative to HTTPS document root. %WWWAlias = ( 'c:/purveyor/', '/', 'c:/purveyor/ICONS/', '/ICONS/', 'c:/purveyor/CGI-BIN/', '/CGI-BIN/', ); # List of used file types and its extensions. The extensions must be #written in regular Perl expression. If @FileTypesSort is given it #determines the search order. %FileTypes = ( 'CGI Scripts', '(\/cgi32\/|\/cgi-32\/|\/cgi-shl\/)', 'DOS CGI Scripts', '(\/cgi-bin\/|\/cgidos\/)', 'WinCGI Scripts', '(\/wincgi\/|\/winbin\/)', 'DllCGI Scripts', '\/dllalias\/', 'Images', '\.(bmp|gif|xbm|jpg|jpeg)$', 'Movies', '\.(mpg|mov|scm)$', 'Archive files', '\.(gz|z|zip|tar)$', 'HTML files', '\.(htm|html)$', 'Image maps', '($\/image\/|\.map$)', 'Server side includes', '\.(sht|shtm|shtml)$', 'Text files', '\.txt$', 'Binary Executables', '\.(com|exe)$', 'Script Executables', '\.(pl|sh|cmd|bat)$', 'Readme files', '\/README.*$', ); @FileTypesSort = ( 'HTML files', 'Images', 'CGI Scripts', 'Server side includes', 'Text files', 'DOS CGI Scripts', 'WinCGI Scripts', 'DllCGI Scripts', 'Movies', 'Archive files', 'Imagemaps', 'Binary Executables', 'Script Executables', 'Readme files', ); # Response Codes taken from <draft-ieft-http-v10-spec-01.ps>, August #3,1995 Normally you don't need to change! %StatusCode = ( '200', '200 OK', '201', '201 Created', '202', '202 Accepted', '203', '203 Non-Authoritative Information', '204', '204 No Content', '300', '300 Multiple Choices', '301', '301 Moved Permanently', '302', '302 Moved Temporarily', '303', '303 See Other', '304', '304 Not Modified', '400', '400 Bad Request', '401', '401 Unauthorized', '402', '402 Payment Required', '403', '403 Forbidden', '404', '404 Not found', '405', '405 Method Not Allowed', '406', '406 None Acceptable', '407', '407 Proxy Authorization Required', '408', '408 Request Timeout', '409', '409 Conflict', '410', '410 Gone', '411', '411 Authorization Refused', '500', '500 Internal Server Errors', '501', '501 Not implemented', '502', '502 Bad Gateway', '503', '503 Service Unavailable', '504', '504 Gateway Timeout', ); ###################################################################### # END CONFIG ######################################################################
The WWW MailTo & CommentTo gateway is a Windows NT HTTP CGI Perl script. (Whew!) It enables you to send a message by SMTP and/or to log the message to a local file. Once again, this application is designed for use with Windows NT, but we made a few
modifications to get it to run on Windows 95.
Using the GET method, the script creates a predefined or user-supplied fill-out form with a self-reference by the action tag. After the form is submitted, the script will be executed a second time by the POST method to create the mail and send it by
SMTP if mail is enabled, and/or save it in the comment file if comment is enabled.
The features depend on the configuration. The script can do any of the following:
You need to put mailto.pl into your cgi-bin directory. Some HTTPS servers use a different CGI directory for DOS CGI, Win32/NT CGI, or WinCGI binaries. If so, put the scripts in your Win32/NT CGI binaries directory, for example, the CGI32
directory. If your HTTP server does not support ALIAS, it must be in your WWW data directory or its subdirectories.
Now would be a good time to install BLAT from the CD, if you have not done so already.
To install the WWW Mailto&Commentto Gateway, you only need to modify the configuration as described in the following section titled "Configuring the Script." Beyond the simple configuration, the main issue is how to call it properly. This
depends on how your HTTP server executes scripts.
If your HTTP server can execute scripts directly (for example, Alibaba), you can use HTML such as this:
<A HREF="http://rick.wzl.rwth-aachen.de/cgi32/mailto.pl">
If your HTTP server must execute a program binary (for example, the EMWAC HTTPS), you can use HTML such as this:
<A HREF="http://rick.wzl.rwth-aachen.de:8001/cgi32/perl.exe?cgi32/mailto.pl">
Alternatively, you can use Rick's CGI2Shell Gateway. In this case, you could do the following:
<A HREF="http://rick.wzl.rwth-aachen.de:8001/cgi32/cgi2perl.exe/ cgi32/mailto.pl?">
The last way is much easier if you want to specify parameters. See the following "Usage" section for more information about parameters.
First of all, you must create an HTML tag for WWW Mailto&Commentto Gateway in your HTML document, which calls the script by the GET method. When called by the GET method, the script displays a standard e-mail form. Here is one example of the HTML
code:
<A HREF="http://rick.wzl.rwth-aachen.de/cgi32/mailto.pl">Mailto</A> <A HREF="/cgi32/mailto.pl">Mailto</A>
You can also include command-line parameters in the HTML tag where parameter is source, or one or more pairs of variables and values each separated by one ampersand. The variable and its value are separated by "=". Note that all parameters
must be HTML-encoded. That means that all spaces are replaced with plus signs ("+"). Also note that plus signs must then be specified in hexadecimal with %2B. Other HTML-reserved characters must also be encoded similarly.
The source parameter returns the script source code if source viewing is enabled and source is the only parameter. The pairs of variables and values could be all reserved variables except from and HTTPpage. These variables can be supplied in the GET
request when linking to the mailto script. If you simply want your mail address to be given in the mail form as the default value, make your HTML look something like this:
<A HREF="/cgi32/mailto.pl?to=rickG@pobox.com">
If you want your default subject to be "This is a subject!", give the subject variable separated by an ampersand. For example:
<A HREF="/cgi32/mailto.pl?to=rickG@pobox.com&subject=This+is+a+subject!">
Notice that it must be URL-encoded.
There are several reserved variables that the script will check for explicitly.
All of these variables (except from and HTTPpage) could be set to default values, which can protect against overwriting. All of these variables can also be set at the command line following the "?" (which will then be inserted into the CGI
environment variable QUERY_STRING).
These reserved variables have a special meaning for the script and must be set by either the Webmaster or the user. With the exception of the to and from variables, all variables are set to default values if they are undefined.
For easy questionnaires, all other CGI variables will be logged after the body portionregardless of whether the values are hidden or part of the fill-out form. Remember that the GET method is limited on the number of characters passed. The
variable and its value are separated by =, different variables/values by &. Spaces are replaced with +; plus signs and other HTML-reserved characters must then be specified in hexadecimal with %2B. Every non-reserved CGI variable will be logged after
the mail body in variable/value pairs. To use the user-defined variables, you need to first create a user-defined form.
Before starting to use the script, you must configure it. All configurable variables are in the first section of the script, as follows:
You can set default values to all reserved variables (except from and HTTPpage) by configuring the default values with the $def{} variables in the script. All of these variables could also be found in the first section of the script. If the variable
$default is set, these variables are fixed. They cannot be overwritten by given parameters to the script tag in an HTML page or the user input when filling out the form. If $default is not set, these default variables are used only if the reserved
variables are not set by command-line parameters or user form input. For example:
You can restrict mail addresses to one address if you set the def{'to'} variable to an e-mail address and prevent overwriting of this value by setting the $default.
You can also restrict the to mail addresses to certain addresses by setting the %defto variable array. This variable can be found in the first section of the script. For this feature, you must run a separate copy of the script because the standard form
always includes a selection list for the addresses.
You can create your own forms without modifying the script. You must define form files, which are also small Perl scripts. You can create two kinds of form files. The first will be executed when the main script is executed with the GET method. It must
create the form. If the second form exists, it will be executed when the main script is executed with the POST method (after the user submitted the mail). It is intended for preparing the mail. To use the form file feature, the first (GET) form must exist.
The second (POST) is optional.
You can specify the name of the form with the predefined variable $defto{form}=form name inside the script or with the parameter form=form name. Form name is the filename of the form without the path and the file extension.
Inside your form files, you can use all the variables and subroutines of the main Perl script. You can overwrite variables from the main script, for example $commentfile. You can even write your own mailto application.
As mentioned before, another excellent use for Perl is writing code to manage the Common Gateway Interface (CGI) forms, which have become the mainstay of the World Wide Web for interactive communication.
cgi-lib.pl is a simple Perl library designed to make writing CGI scripts in Perl easy. Many Perl CGI scripts that you find on the Web use cgi-lib.pl. You will find a copy of cgi-lib.pl on the CD; see Appendix I. See Listing 21.2 for an example.
#!/usr/local/bin/perl # minimal.cgi # Copyright (C) 1995 Steven E. Brenner # $Header: /cys/people/brenner/http/docs/web/RCS/minimal.cgi,v 1.2 #1995/04/07 21:36:29 brenner Exp $ # This is the minimalist script to demonstrate the use of # the cgi-lib.pl library -- it needs only 7 lines # -- # This is NOT intended to be a "typical" script # Most importantly, the <form> key should normally have parameters #like # <form method=POST action="minimal.cgi"> require "cgi-lib.pl"; if (&ReadParse(*input)) { print &PrintHeader, &PrintVariables(%input); } else { print &PrintHeader,'<form><input type="submit">Data: <input name= "myfield">'; }
Perl 5 added many features to the language that we were unable to include in this short introduction. Some of the more noteworthy enhancements are: references, object-oriented extensions, general cleanup, support for modules, and importing.
Like any programming language, Perl will take some time to master. Alas, this is not a subject we can completely cover in this book. However, we can give you some information about where to look. This information will also tell you how you can quickly
use existing Perl applications. The first thing you might want to do is check out these three text files that come with Perl.
To learn more about Perl, try the University of Florida's Perl Archive at http://www.cis.ufl.edu/perl/. Users in the UK might like to try something closer to home, such as the NEXOR Ltd Perl Page at http://pubweb.nexor.co.uk/public/perl/perl.html.
Here are a few other Perl resources on the Net; the last one consists of a few newsgroups dedicated to Perl topics.
http://www.metronet.com/perlinfo/perl5.html http://www.perl.com/perl/faq/ http://www.ee.pdx.edu/~rseymour/perl/comp.lang.perl
In the next chapter, you will continue empowering your Web site by exploring the possibilities of programming in C. As you recall from Chapter 11, the Common Gateway Interface is a method of building server applications for the processing of HTML form data. C and C++ are frequently the languages of choice for building those applications.