12

Controlling the Proxy Server Cache

The Web Proxy server side of Microsoft Proxy Server can perform data caching of any and all objects that pass through it. The WinSock Proxy server side of Microsoft Proxy Server performs no caching. That is one of the main advantages the Web Proxy server has over the WinSock Proxy server. Rumor has it that Microsoft will develop a caching feature for the WinSock Proxy server in future releases of Microsoft Proxy Server. In the current release of Microsoft Proxy Server (1.0), only the Web Proxy server performs caching.

Caching is the process of storing objects, such as graphics, sound bites, and document text on a local hard drive. If a client requests information from the Internet that has already been cached by Microsoft Proxy Server, Microsoft Proxy Server will pass the cached information to the client rather than going out to the Internet site and retrieving the information. Caching affects Microsoft Proxy Server's performance in the following ways:

Without caching, Microsoft Proxy Server has to retrieve all requests directly from the Internet. This creates a great deal of traffic even when clients are requesting the same information over and over again. Web browsers perform their own caching of data, but this is generally done on a small scale. Cache sizes for client browsers are usually set to around five megabytes. There is no default cache size for the Microsoft Proxy Server; however, the suggested amount is 100 megs plus 1/2 meg for every proxy client that will be supported. If your network has 25 proxy clients, the minimum cache size that should be set is 113 megs. Because Microsoft Proxy Server will be forwarding requests from many clients on a network, individual clients may benefit from the activity of other Microsoft Proxy Server users. Microsoft Proxy Server also has some advanced caching features that allow it to ensure the objects in the cache are always current. Client browser caches are only able to check how current an object is when the object is directly requested.

When the Microsoft Proxy Server starts, it performs integrity checks on the data contained within the cache. If the cache is very large and spread out over many drives, it may take a while for the integrity check to complete. You will be unable to alter the cache settings in any way until the check is complete. If you jump in to configuring the cache immediately after the NT machine starts, you may get a message stating that you cannot configure the cache. You should be allowed to reconfigure the cache settings after waiting a few minutes and retrying the action.

The Cache

In a default installation, Microsoft Proxy Server stores its cache in five sub directories from the primary directory C:\URLCACHE. These sub directories are DIR1 through DIR5. Microsoft Proxy Server breaks up the cache content between multiple smaller directories to increase search speed. Searching a single huge directory can take longer than searching multiple smaller directories.

Each sub directory will be used equally by the Web Proxy server. All cached objects will be stored equally between these five sub directories. For example, if the default cache size for a drive is 50 megs, each cache sub directory will contain about 10 megs of data when the cache is full.

The objects stored in the cache are not stored under their original names. They are renamed to a coded format that ensures that conflicts between objects of identical names do not overwrite each other. The cross reference data of coded name to actual name is stored in data files also held in the cache directories. These data files are the elements which are searched when clients request Internet objects that might be found in the cache. This increases performance, but does run the risk of wasting the entire contents of a cache directory should the data files themselves become corrupt. Another benefit of using data files to reference cache objects is the ability to set time to live (TTL) values for objects. Once an object's TTL has expired, Microsoft Proxy Server will no longer pass it out to clients. Microsoft Proxy Server will retrieve a more recent copy from the Internet and update the object in the cache.

Microsoft Proxy Server cache directories can be on any local hard drive. They should be located on the quickest drives and if possible should be held on drives with the NTFS format. The NTFS format is a faster and more reliable disk architecture than FAT and also provides greater security features for a network environment. The primary draw back to NTFS is that it is inaccessible to non-DOS systems. If you dual boot your computer between NT server and a DOS system, changing a disk to NTFS will render that disk unreadable to DOS.

Modifying the Cache Size

The cache location and size can be modified from the caching tab of the Web Proxy properties dialog box. The Change Cache Size button allows you to manually alter or add to the cache structure and size. Figure 12.1 shows this dialog box.

Figure 12.1. The Web Proxy cache size and location.

Setting the cache size and location is as straight forward as highlighting the desired drive for a piece of the cache and entering the maximum size for the cache that drive should hold. Once a size has been indicated, clicking the Set button will set the indicated cache size for the selected drive. The total cache size between drives should meet or exceed the suggested total size of 100 megs plus 1/2 meg for each Web Proxy client that will be supported.

Modifying Maximum Object Size and Filters

If clients are constantly requesting pages containing large graphics and sound bytes, the performance of the cache will be poor because it will contain few overall objects, just a few larger ones. If you would like to set a maximum size for cache objects, the Advanced button on the Caching tab of proxy properties will allow you to do this. Figure 12.2 shows this section of Web Proxy properties.

Figure 12.2. Microsoft Proxy Server's Advanced Cache Policy settings.

If you wish to limit the size of cached objects, check the Limit Size of Cache Objects check box and indicate an object size. I do suggest limiting the size to about 100 KB. If any single object exceeds 100 KB, it is unlikely that it will be requested time and time again by clients. It is more likely a large graphic that will only need to be seen once.

The Return expired objects when site is unavailable check box controls whether the Web Proxy will send an expired cache object to a client when the Web Proxy server cannot contact the target site. Many sites will go up and down constantly. Another beneficial responsibility of the Microsoft Proxy server is to simulate a response from down sites.

The lower portion of the Advanced Cache Policy settings dialog box allows you to configure the cache filtering of specific Internet sites.

Some Internet sites have data that changes daily. The normal document header should contain page expiration information that will allow Microsoft Proxy Server to automatically expire these types of pages almost immediately and not cache them. However, not all sites include page expiration information and therefore should be manually excluded from being cached. This dialog box allows you to indicate such sites and/or sub branches. An asterisk can be used here as a wild card to indicate sites (for example, www.windows95.com/newfiles/*).

Clicking the Add button (or the Edit button to alter an existing filter) will allow you to indicate a site for special filtering. Figure 12.3 shows this Add (or Edit) dialog box.

Figure 12.3. Adding a special filter condition in the Cache Filter Properties dialog box.

To complete a filter, simply indicate the site to filter and indicate whether the site is to be expressly filtered or not. The general cache policy for the Web Proxy server can be to cache or not cache any data. By default, the Web Proxy will cache all HTTP data it can. If the general cache policy is for no caching, the sites indicated here would be for special caching. As you can see in the figure, wild cards can be used to indicate partial branches.

Methods of Caching

When the Enable Caching check box is checked on the Caching tab, caching will take place. If it is unchecked, the only data that will be cached will be from those sites expressly indicated for caching in the cache filter.

There are two forms of caching that Microsoft Proxy Server uses. The first is passive caching and the second is active caching. These two caching methods are significantly different.

Passive Caching

Microsoft Proxy Server uses passive caching most often. This form of caching requires no extra activity on the part of Microsoft Proxy Server. When objects are originally requested from the Internet by client applications, Microsoft Proxy Server retrieves them and first places a copy of the object in the cache (and sets a TTL for the object) and then passes the object on to the client requesting the data. This is done only if the object is cacheable at all. The following criteria must be met before an object or HTML document can be passively cached:

The sequence of events when a client requests an Internet object through the Internet proceeds like this:

  1. The client requests a page or object from the Internet.
  2. Microsoft Proxy Server intercepts the request.
  3. Microsoft Proxy Server checks the cache to see if the object is present.
  4. If the object is present, Microsoft Proxy Server checks to see if the object's TTL has expired. If it has, the object is again retrieved from the Internet, and the TTL is updated. If the object's TTL has expired but the object has not changed on the Internet site, Microsoft Proxy Server simply updates the TTL of the object. The object is then passed to the client.
  5. If the object is not in the cache, Microsoft Proxy Server retrieves it and if it can be cached, stores it and then forwards it to the client.

If the Internet server returns an error of 403 or 404, the error condition response is stored in the cache. This is referred to as negative caching because Microsoft Proxy Server is storing the negative result from the server. Error 403 is returned when a client is attempting to access a page that the user does not have authorization for. Many sites on the Internet are now requiring paid membership for access. When non-authorized users attempt to access information on such sites, the server will often respond with a result code of 403. Result code 403 is a standard result code indicating that the request URL cannot be found.

Many sites use something known as cookies to personalize the data they send to clients. Cookies can vary web pages and make a site seem more dynamic. They can also be used as a form of simple authentication. Cookies will be ignored by Microsoft Proxy Server, and pages with cookies in the header will be cached as long as none of the non-cacheable situations apply.

Passive caching only caches objects when a client requests them. The opposite of passive caching is active caching, which is when Microsoft Proxy Server takes an active role to ensure that the objects within the cache are current.

Active Caching

The Caching tab of Web Proxy properties allows you to control how Microsoft Proxy Server handles object TTL and the active caching policy. By default, Microsoft Proxy Server will perform some active caching (when the Enable Active Caching check box is checked). The active caching policy can be increased so that Microsoft Proxy Server will work harder at ensuring the objects in the cache are current.

Active caching allows Microsoft Proxy Server to update cache objects on its own during certain times without having to rely on clients to request objects before they can be verified. Microsoft Proxy Server will perform active caching according to the following guidelines:

When Microsoft Proxy Server performs active caching, the objects in the cache are more likely to be current. This helps to increase performance for clients because Microsoft Proxy Server is less likely to have to go out to the Internet to retrieve data when current copies of objects are present in the cache. Non-peak times are used to update cache objects so active caching spreads the workload out somewhat by using time when the connection is less stressed.

Figure 12.4 shows the Cache tab of the Web Proxy properties dialog box.

Figure 12.4. The Caching tab of Web Proxy properties .

The Cache Expiration Policy slider controls the TTL value assigned to objects. The further to the right the slide is set, the longer objects can live in the cache before they will be updated. All the way to the left on this slider indicates that objects will always be retrieved from the Internet, which effectively nullifies the cache.

The Active Caching Policy slider controls how energetically Microsoft Proxy Server actively caches. When the slider is to the left, Microsoft Proxy Server will perform little active caching and only during off-peak periods. When the slider is to the right, Microsoft Proxy Server will actively cache a greater number of cache objects and will do so during periods of higher Internet accessing by clients.

Cascading Proxy Servers

In the 1.0 release of Microsoft Proxy Server, cascading Web Proxy servers are not supported. This doesn't mean that users can't access a chain of Web Proxy servers is succession, but this is done through a name server, such as a DNS or WINS server. It's not an ability that is internal to Microsoft Proxy Server. Future releases of Microsoft Proxy Server will have built-in Web Proxy cascading, which will allow multiple Microsoft Proxy servers to share a common cache. Currently, when DNS or WINS is used to group a set of Microsoft Proxy Servers for daisy-chained access, each server maintains its own cache and operates completely independently. Chapter 9 contains detailed information on setting up a name server to perform chained access to a group of Web Proxy servers.

Summary

The Microsoft Proxy Server cache is neither complicated nor difficult to maintain, but it is a vital part of the proxy process. Without it, Internet client performance will be poor for many users through a small connection. Because the intent of Microsoft Proxy Server is to provide inexpensive Internet access to a LAN through a smaller connection, the cache is an important part of the equation. If you can spare the hard drive space, it would be a good idea to double or even triple the available cache space that Microsoft Proxy Server can use. If there are many Internet users on your LAN accessing many different sites on the Internet, increasing the cache size can greatly help your Microsoft Proxy Server performance.