eCommons@Cornell - Statistics
Explanation of Statistical Information
We gather information for eCommmons from two sources. The primary information gathered for our statistics comes from our Apache Web Logs. We use a locally produced statistical system (created by Peter Hoyt and Adam Chandler of the Cornell University Library) that gathers the information from our server on a daily basis. This system then looks for activity in the web logs caused by recognizable harvesters or robots such as Google, Yahoo, MSN, etc., and then eliminates them from the stored statistics. Any information that would identify an individual (outside of geographic location) is discarded before this information is stored, as well. This system then allows us to query the data in a variety of different ways.
The secondary method is the statistical information gathered from the DSpace system that is the platform that eCommons runs under. DSpace gathered statistics from it's own logs and generated monthly reports. These reports include harvesters and robots.
For our purposes, an item hit is when a person views the item page which lists the metadata for an item. This is the primary page which contains a link to the actual entity (document, video, software, etc.) which we call a bitstream. A download is recorded when one tries to view or download the actual bitstream.
In the past, we recorded all hits and downloads even if they were generated by robots and not by individuals. It was decided by our organization that eliminating these software-generated hits and downloads would give us a better picture of the real activity that our repository is generating.