This is a very complex problem and there is no perfect solution; mainly since not all web content is uniquely identifiable.
Research has shown that distributing requests by destination IP address is generally the most effective algorithm. This kind of algorithm is usually implemented as destination-IP hash.
Web content is generally associated with the actual server that provides it. Since a server is identified by its IP address, the affinity between the content and the server’s IP address – in other words the destination IP address – is strong.
This holds true in the case of large providers that use content delivery networks or similar distribution schemes. They tend to structure their content in a manner that leaves this content-server affinity intact.
It is sometimes claimed that CARP (IETF draft is located at http://tools.ietf.org/id/draft-vinod-carp-v1-03.txt) is preferable to destination-IP hashing when load-balancing traffic to caching devices. CARP was in fact designed for this task; however, CARP is an obsolete protocol developed in 1998; Blue Coat has conclusively determined that it does not provide good performance when used with the contemporary advanced caching techniques employed by the CacheFlow 5000.
An additional advantage of using a distribution algorithm based on IP addresses rather than URLs or similar is that the information is available at layer 4. Load-balancing at L4 is more efficient and requires fewer resources on the switch, so a given switch can typically load balance more traffic.
In some cases using other algorithms in addition to destination-IP hash can enhance bandwidth savings or otherwise benefit the caching solution. Keep in mind though that this comes at a cost of increased complexity of the solution and increased computational demands on the switch.
In the following examples layer 7 inspection can be useful:
- Adding layer 7 logic for large and popular content providers that are transparent in how they route requests to the nodes (YouTube et al) can potentially increase bandwidth savings.
- Adding layer 7 logic to avoid redirecting non-HTTP traffic to the caches. If non-HTTP traffic is a significant percentage of the port 80 traffic, there is a potential to reduce load on caches without decreasing bandwidth savings.
Additional information on how to configure L4/L7 switches is available in the Deployment Guide and in the Sample Switch Configurations available from https://bto.bluecoat.com/cacheflow.