With Reporter version 9.x , every Log line in the access logs we receive is considered a HTTP request. ( See the 8.x section in this document for how this is slightly different) However, many of the log lines were merely referrals to other web sites and hence unrelated to a real human being actually reading and viewing that web site. Hence, Blue Coat came up with a new statistic called Page Views. A Page view is Blue Coat's attempt at compressing these HTTP requests, or log lines, down to something we can rely on as a more accurate representation of humans viewing web sites. If, for example, a user visits CNN.COM he will receive about 100 log lines of HTTP referrals. For both versions of Reporter, we combine these referrals into one page view.
NOTE: With all versions of Reporter, the access log has to confirm to the standard mentioned in this KB article for page views to be accurate:
Reporter, version 9:
With this version, the Page View Combiner algorithm was revised to better reflect viewed content by users.
Requests will be combined into page views, if they meet the following criteria in the access log:
- sc-status = 200
- content type = text/html.
- verdict does not start with "denied".
- category is not "Web Advertisements" or "Non-viewable".
- referrals fall inside of the 30 second PVC cache window. If they do not, they will be shown as if they were a normal Website with Page view(s).
Turning off the page view combiner:
If there is a concern about the page view combiner, and it's affect on page view calculations, requests, or refferals, it can be turned off. See 000014656 on how, and what affect this may have.
Note on version 8.x:
With version 8.x we swapped out the Origon Content Server's DNS name ( cs-host field) for the cs-referer field, in some cases. This caused an inflated page view , in the strictest terms, because referrals in the HTTP protocol are rarely, if ever, actualy directly viewed by the user. This practise was dis-continued in version 9.x.
With version 8,x we also droped any request that was not deemed valid vewable data, such as a the HTTP codes of 301 or a 401, meaning the value of requests may be less than with version 9..x. However, this will only ever affect the total requests in the database, and it cannot be taken to mean that all websites will show less or more requests totals Some websites will show more totals when viewed in version 9.x and some will show less and this is primarily due to the type of traffic it atracts. In version 9.x, we count all requests and categorgize them appropriately so users can do reports these network errors. In other words, on some customers datasets, certain reports, such as the top 20 user report, will be calculate requests against referrals for non viewable sites. You won't see the same values reported in version 9.x for this same report because these are not calculated anymore.
In Summary, both the algoritjms for Page views, and the way we count requests have changed between version 8.x and 9.x with one intent in mind- to more accurately tally the viewing habits of a user as he or she browses the Website of their choice.
Note on the Web Advertisements category:
If a site has more than one category rating, one of which was "Web Advertisements", we will still count it as a page view if the request meets the other criteria for a page view. If a site has only one category rating of ratinglly "Web Advertisements" , then we should never be marking that request as a page view. Being able to combine web advertisements is peculiar to BlueCoat Web filter and Reporter. In other words, using third party Web rating tools with Bluecoats SG appliances, and Reporter, will cause Web Advertisments to be counted as a Page View. With Reporter, version 8, no such filtering occured, and virtualy all combinations of the above 4 items were injected into the db, and combined into the appropriate number of Page Views.
Note on spyware category: In version 8, Websites with the category of spyware were included in the PVC algorithm. This means some websites, that were actualy referrals-not viewed by humans- and spyware may appear as page views in Version 8, but not in version 9.
Note about Categories: Each page view that is added to the db has its own category rating. If there are non-page views, that are referred from a page view, the categories ratings, if any, will not be added to the Reporter db. If there are referred requests that are themselves page views, they will be added to the db as separate log lines, each with their category rating.
Once the value of Page views has been calculated, the next value that can be calcualted is the browse time based on a user ID, or client IP address. We calculate browse time on the fly during log processing, once we've determined a Page view has occured. For more information on this value please see 000012088
Categories that appear as "NONE": A rating of "NONE" is viewed as a normal log line, with page views being counted as normal log lines.
Here are some examples:
1. If the category is 'none', sc-status=200, content type=text/html, verdict=Allowed -- will be counted.
2.. If the category is 'none', sc-status=200, content type=text/xml, verdict=Allowed -- will not be counted.
3. If the category is 'none", sc-status=404, content type=text/html, verdict=denied -- will not be ounted.
4. If the category is empty, sc-status=200, content type=text/html, verdict=Allowed -- will be counted.
Note on full log detail reports: In a full log detial report a log line will represent a page view. However, there may be 100s of referrals for popular sites such as cnn.com. This can also be explained through the above discussion, as a lot of activity on the cnn.com site is not actualy the user reading these websites but merely adverstising sites being accessed by referrals.