This appendix contains basic information about 36 Internet robots.
Maintained by Jonathan Fletcher, e-mail: j.fletcher@stirling.ac.uk.
Jumpstation's goal is to generate a Resource Discovery database. The HTTP User-agent field is set to JumpStation-Robot, and the From field is also set. Usually run from *.stir.ac.uk. The proposed standard for robot exclusion is supported.
Maintained by David Eichmann, e-mail: eichmann@rbse.jsc.nasa.gov.
RBASE Spider's purpose is to generate a Resource Discovery database and generate statistics. The HTTP User-agent field is set to RBSE Spider v. 1.0, and the From field is also set. Usually run from rbse.jsc.nasa.gov (192.88.42.10). The Proposed Standard
for Robot Exclusion is supported.
Run by Brian Pinkerton, e-mail: bp@biotech.washington.edu.
Runs from webcrawler.cs.washington.edu and uses WebCrawler/0.00000001 in the HTTP User-agent field.
Run by Fred Barrie, e-mail: barrie@unr.edu and Billy Barron.
Recent runs will concentrate on textual analysis of the Web versus GopherSpace (from the Veronica data), as well as indexing. Run from frognot.utdallas.edu, possibly other sites in utdallas.edu, and from cnidir.org. Now uses HTTP From fields and sets User-agent to NorthStar.
Run by Matthew Gray mkgray@mit.edu.
Run initially in June 1993, its aim is to measure the growth of the Web. W4's purpose is to discover resources on the fly. The HTTP User-agent field is set to Fish-Search-Robot, but the From field isn't set. This is usually run from www.win.tue.nl. The
HTTP User-agent field is set to WWWWanderer v3.0 by Matthew Gray. The Proposed Standard for Robot Exclusion is not supported.
Run by James E. Pitkow pitkow@aries.colorado.edu.
html_analyzer-0.02's aim is to check validity of Web servers.
Maintained by Roy T. Fielding, fielding@ics.uci.edu.
MONspider's purpose is to validate links and generate statistics. The HTTP User-agent field is set to MOMspider/1.00 libwww-perl/0.40, and the From field is also set. This is usually run from anywhere. The Proposed Standard for Robot Exclusion is
supported.
Maintained by Andreas Ley, e-mail: ley@rz.uni-karlsruhe.de.
This is a mirroring robot, configured to stay within a directory and sleep between requests. The next version will use HEAD to check if the entire document needs to be retrieved. The HTTP User-agent is set to HTMLgobble v2.2, and it sets the From field.
This is usually run by the author, from tp70.rz.uni-karlsruhe.de.
Maintained by Oliver McBryan, e-mail: mcbryan@piper.cs.colorado.edu.
Run from piper.cs.colorado.edu.
Maintained by Christophe Tronche tronche@lri.fr.
W3M2's purpose is to generate a Resource Discovery database, validate links, validate HTML, and generate statistics. The HTTP User-agent field is set to W3M2/x.xxx, and the From field is also set. This is usually run from anyhost.lri.fr. The Proposed
Standard for Robot Exclusion is supported.
Maintained by Charlie Stross charless@sco.com.
Websnarf is a WWW mirror designed for off-line browsing of sections of the Web. It is run from ruddles.london.sco.com.
Run by Lee McLoughlin at L.McLoughlin@doc.ic.ac.uk.
It was first spotted in Mid February 1994 and is run from phoenix.doc.ic.ac.uk.
Owned by Dr. Michael L. Mauldin, fuzzy@cmu.edu at Carnegie Mellon University.
Lycos is a research program providing information retrieval and discovery in the WWW, using a finite memory model of the Web to guide intelligent, directed searches for specific information needs. The HTTP The User-agent is set to Lycos/x.x. This is run
from fuzine.mt.cs.cmu.edu. The Proposed Standard for Robot Exclusion is supported.
Written and run by Fred Johansen fred@nvg.unit.no.
Currently under construction, this spider is a CGI script that searches the Web for keywords given by the user through a form. The HTTP User-Agent is set to ASpider/0.09, with a From field. Contact
fredj@nova.pvv.unit.no
Introduced by Peter Beebee: ptbb@ai.mit.edu, beebee@parc.xerox.com.
This has run since June 27, 1994, for an internal XEROX research project. The HTTP User-agent is set to SG-Scout, with a From field set to the operator. The Proposed Standard for Robot Exclusion is supported. This is run from beta.xerox.com.
Written by Jim McGuire mcguire@eit.com.
Announced on July 12, 1994, this is a combination of an HTML form and a CGI script that verifies links from a given starting point (with some controls to prevent it from running with no limits or going off-site). From version 0.2 up, the User-agent is
set to EIT-Link-Verifier-Robot/0.2. This can be run by anyone from anywhere.
Maintained by Robert Olson at olson@mcs.anl.gov.
Web Forager's purpose is to generate a Resource Discovery database. The HTTP User-agent field is set to NHSEWalker/3.0, and the From field is also set. This is usually run from *.mcs.anl.gov. The Proposed Standard for Robot Exclusion is supported.
Written and run by James Casey at jcasey@maths.tcd.ie.
WebLinker is a tool that traverses a section of Web, doing URN to URL conversion. It will be used as a post-processing tool on documents created by automatic converters such as LaTeX2HTML or WebMaker. The HTTP User-agent is set to WebLinker/0.0
libwww-perl/0.1.
Maintained by William M. Perry at wmperry@spry.com.
Emacs-w3 Search Engine's purpose is to generate a Resource Discovery database. The HTTP User-agent field is set to Emacs-w3/v[0-9\.]+, and the From field is also set. This is usually run from a variety of machines. The Proposed Standard for Robot
Exclusion is not supported.
Run by Vince Taluskie at taluskie@utpapa.ph.utexas.edu.
Arachnophilia's purpose is to collect approximately 10 K HTML documents for testing automatic abstract generation. This program will honor the robot exclusion standard and wait one minute between requests to a given server. The HTTP User-agent field is
set to Arachnophilia. This is run from halsoft.com.
Written by Sebastien Lemieux at lemieuse@ERE.UMontreal.CA.
Mac WWWWorm is a French keyword-searching robot for the Mac, written in HyperCard. No other information currently available.
Maintained by Justin Yunke at yunke@umich.edu.
This is a URL-checking robot that stays within one step of the local server.
Run by Olaf Schreck at chakl@fu-berlin.de.
This is a mirroring robot. It sets User-agent to tarspider version and From to chakl@fu-berlin.de.
This is run by Jim Richardson, jimr@maths.su.oz.au.
This robot, written in Perl V4, commenced operation in August 1994 and is being used to generate an index called MathSearch of documents on Web sites connected with mathematics and statistics. It ignores offsite links, so it does not stray from a list
of servers specified initially. The HTTP User-agent field is set to Peregrinator-Mathematics/0.7. Peregrinator also sets the From field. The Proposed Standard for Robot Exclusion is supported.
Maintained by Hans de Graaff at j.j.degraaff@twi.tudelft.nl.
Checkbot's purpose is to validate links. The HTTP User-agent field is set to checkbot.pl-x.xx, and the From field is also set. This is usually run from dutifp.twi.tudelft.nl. The Proposed Standard for Robot Exclusion is not supported.
Maintained by Rich Testardi at rpt@fc.hp.com.
Its purpose is to generate a Resource Discovery database, validate links, validate HTML, perform mirroring, copy document trees, and generate statistics. The HTTP User-agent field is set to webwalk, and the From field is also set. The Proposed Standard
for Robot Exclusion is supported.
Run by hardy@bruno.cs.colorado.edu.
Harvest is a Resource Discovery Robot, part of the Harvest Project. It runs from bruno.cs.colorado.edu and sets User-agent and From fields.
Maintained by Michael Newbery at Michael.Newbery@vuw.ac.nz.
The HTTP User-agent field is set to Katipo/1.0, and the From field is also set. The Proposed Standard for Robot Exclusion is not supported.
Maintained by Steve Kirsch at stk@infoseek.com.
InfoSeek's purpose is to generate a Resource Discovery database. The HTTP User-agent field is set to InfoSeek Robot 1.0, and the From field is also set. This is usually run from corp-gw.infoseek.com. The Proposed
Standard for Robot Exclusion is supported.
Maintained by James Burton at burton@cs.latrobe.edu.au.
GetURL's purpose is to validate links, perform mirroring, and copy document trees. The HTTP User-agent field is set to GetURL.rexx v1.05 by burton@cs.latrobe.edu.au, and the From field is not set. The Proposed Standard for Robot Exclusion is not
supported.
Run by Tim Bray at tbray@opentext.com.
This sets User-agent to OMW/0.1 libwww/217. The Proposed Standard for Robot Exclusion is supported.
Implemented by Scott Spetka at scott@cs.sunyit.edu.
TkWWW is designed to search Web neighborhoods to find pages that might be logically related. The robot returns a list of links that looks like a hot list. The search can be by keyword or all links at a distance of one or two hops may be returned.
Maintained by De-mailly at dl@hplyot.obspm.fr.
Tcl W3's purpose is to validate links and generate statistics. The HTTP User-agent field is set to dlw3robot/x.y, and the From field is also set. This is usually run from hplyot.obspm.fr. The Proposed Standard for Robot Exclusion is supported.
Maintained by Yoshihiko Hayashi at hayashi@nttnly.isl.ntt.jp.
Titan's purpose is to generate a Resource Discovery database and copy document trees. The primary goal is to develop an advanced method for indexing the WWW documents. The HTTP User-agent field is set to TITAN/0.1, and the From field is also set. This
is usually run from nttnly.isl.ntt.jp. The Proposed Standard for Robot Exclusion is supported.
Maintained by Budi Yuwono at yuwono-b@cs.ust.hk.
CS-HKUST's purpose is to generate a Resource Discovery database and validate HTML. The HTTP User-agent field is set to CS-HKUST-IndexServer/1.0, and the From field is also set. This is usually run from dbx.cs.ust.hk. The Proposed Standard for Robot
Exclusion is supported.
Maintained by Spry at info@spry.com.
WizRobot's purpose is to generate a Resource Discovery database. Neither User-agent nor From HTTP fields are set. This is usually run from tiger.spry.com. The Proposed Standard for Robot Exclusion is not supported.