Previous Page TOC



- D -
Index Server Frequently Asked Questions (FAQ)


This appendix contains answers to some of the questions most frequently asked by Index Server administrators and users. This is not an all-encompassing list of questions, but you'll find answers to some of the most common basic questions and a few advanced issues as well.

The questions answered in this appendix include



Resource

Microsoft maintains a newsgroup dedicated to Index Server in which you can peruse threaded discussions of many issues relating to Index Server problems, workarounds, questions and answers, and unique implementation details. This can be an invaluable resource because it gives you have the opportunity to learn from the successes and experiences of other administrators and developers. The news group name is microsoft.public.inetserver.iis.tripoli.

Note that tripoli refers to the code name given to Index Server prior to its public release by Microsoft.



Q&A


Q: What querying capabilities are provided and performed by Index Server?

A: Index Server provides extensive querying capabilities and functionality. With Index Server, you can perform queries against the content of documents within your corpus as well as properties of those documents. Complex query restrictions can be developed that employ content and property queries simultaneously. To support these querying capabilities, Index Server provides:

Additionally, Index Server provides the capability to perform complex queries through the use of a very complex query language. Using the query language, you can:

Q: What is a virtual root?

A: A virtual root is simply an alias name for a physical path to a directory on disk. For example, the virtual root /e_books could point to the physical directory E:\e_books. Note that virtual roots always start with a /. Virtual roots are also known as virtual directories.

Q: What is a catalog and how does it differ from an index?

A: A catalog is a directory (named Catalog.wci) of indexes and other files used internally by Index Server to locate documents that meet a query restriction. An index is a data structure used to store words and information extracted from files during filtering. Indexes can be non-persistent, in-memory, lightly compressed structures (wordlists), or they can be persistent, on-disk, highly compressed structures (shadow indexes and the master index). There are typically several indexes in a catalog.

Q: Is it possible to prevent documents from specific directories from being included with a result set returned from a query?

A: Because Index Server indexing and query scopes are based on virtual roots, it is not possible to explicitly exclude certain directories from indexing. You can, however, structure your .idq files so that documents from specific directories are excluded from the result set returned to the user.

Suppose the virtual root /e_books points to the physical directory E:\e_books. You want to exclude subdirectories E:\e_books\TeachHTML32 and E:\e_books\TeachVBScript from the result set. To do so, modify the query restriction passed to the .idq file as follows:


CiRestriction=%UserRestriction% AND NOT #path E:\e_books\TeachHTML32AND NOT #path E:\e_books\TeachVBScript.

Q: How do I establish indexing of remote UNC shares?

A: Virtual roots that point to UNC shares are automatically indexed by Index Server. Index Server utilizes automated change notifications if they are supported by the remote share. Otherwise, the remote share is scanned periodically for changes based on the value of the registry parameter ForcedNetPathScanInterval. Also, be sure to specify the user ID and password properly.

Q: What is a corpus? How does it differ from a scope?

A: Corpus refers to the entire set of documents that are indexed and represented in a catalog. A scope, on the other hand, refers to a set of documents that will be searched during a query. A scope is specified by a virtual root. The virtual root can be defined to include the entire document corpus if desired. Likewise, scopes can be defined to include only a portion of the corpus.

Q: What is the main difference between scanning, indexing, and filtering?

A: Scanning, filtering, and indexing are closely related steps in the process of building the index used by Index Server to satisfy query requests. Scanning is the process by which Index Server identifies files within indexed virtual roots that have been modified. Filtering is a two-stage process by which (1) the CiDaemon process determines which filters are appropriate for use on a changed document and (2) the filters are used to extract information (words) for use in the index. Indexing is simply the process by which information extracted from documents is stored in wordlists and shadow indexes and eventually merged into the master index.

Q: What are word breakers used for?

A: Word breakers are language-dependent modules that Index Server uses during the process of filtering to identify words in a document.

Q: Can the number of messages Index Server writes to the NT event log be limited somehow?

A: Specific events can have their messages enabled or disabled through the use of bit-field masking. See Appendix A, "Index Server Registry Parameters," for details.

Q: Some indexing engines can search for text that is not an exact match, but is similar to the text of the query restriction. These type of queries are sometimes referred to as fuzzy queries. Does Index Server support fuzzy queries?

A: Index server supports fuzzy queries by searching for words and text similar to those in the query restriction. Rather than looking for only exact matches, the query engine modifies the words in the query and looks for these modified forms. Fuzzy query support is provided in one the following ways:

Q: What are some steps I can take to improve performance?

A: Several things can be done to improve Index Server performance:

Q: How Is It Possible That the Files To Be Filtered Counter Shows a Value Greater Than the Total # Documents Counter?

A: The Files To Be Filtered counter value represents the number of documents that have been changed and need to be filtered. It is simply a list of changed documents. It is possible that some files were modified more than once and thus have multiple entries in the changed-documents queue.

Q: Why do unreadable files show up in query results? Can this be avoided?

A: Because Index Server indexes roots that do not have read permissions but are located under a root that does, you will need to employ a workaround to prevent files in the unreadable root from showing up in a user's result set. This is easily done within .idq files. Suppose you have an unreadable root named /_myroot. To prevent any documents or files in this directory from showing up in a user's result set, append the CiRestriction parameter in the .idq file as follows:

This tells Index Server to append the query restriction passed by the user with a query language directive to not include any results with the string /_myroot as part of its virtual path.

Q: Why is the files to be filtered counter non-zero even though my system is sitting idle?

A: This occurs when some files have failed to filter, which typically happens when files that are to be filtered are in use by some other process when the CiDaemon process attempts to filter the document's contents. When this occurs, the file is relegated to a lower priority queue to be filtered at a later time. The time interval between retries on these files is controlled by the registry parameter .

Q: Why don't documents that are known to exist show up in result sets as expected?

A: There are several circumstances where documents that are known to exist do not show up in a result set. The most obvious circumstance is when a query restriction is used that prevents the document(s) from appearing the result set. Assuming this in not the case, however, the following list details some other instances where this problem may occur:

Previous Page Page Top TOC