Comparison with competitors is a fundamental element of business; even innovators need to know how far ahead they are in their market. The Internet seems to offer fertile terrain for capturing accurate marketing statistics on website usage and position relative to other players in a given market. Indeed, most of us have often heard web statistics from Nielsen//NetRatings, Alexa or comScore cited in the press and elsewhere. Practitioners of Search Engine Optimization and web marketing know that web analytics is not just silo analysis of a company's website: it also entails looking at how a website and its business performance metrics measure up in the overall web ecosystem.
So how valid are the web usage and site ranking statistics offered by web measurement companies? Despite the existence of organizations such as the Web Analytics Association (WAA) and the Interactive Advertising Bureau (IAB), the ugly answer is "We don't know". What we do have is a proliferation of the "mine's bigger than yours" syndrome as one website cites its high ranking using one set of statistics and a competitor responds, citing a different, incompatible, source.
Most of us don't have a background in statistics. We easily fall for the authority of pretty pictures and lines that go up and down. Darell Huff covered the topic well in his 1954 classic, How to Lie with Statistics [UK edition]. That his book is still in print is a testament to the power of his message. More recently, Edward Tufte has tried to teach us the errors of our ways in his renowned The Visual Display of Quantitative Information. [UK Edition]
To be able to effectively evaluate the reliability of the Internet statistics published by the various measurement companies, we would need to have access to information on the complete methodology used to derive the final statistics. Basic elements include the sample selection, sample size, and any corrections made to offset sample bias and skew. Yet even when the major collectors of internet usage data do discuss methodology, they usually don't get beyond sample size and a very general discussion of sample data sources. Indeed, the IAB has recently challenged two of the companies, Nielsen//NetRatings and comScore, to accept a standardized audit and accreditation process. In this challenge exists the risk that processes which are fundamentally flawed in their conception may end up being accredited simply because the method becomes documented.
There are three principle ways to measure overall Internet usage. A panel of users can be measured at their computers with installed software (user-centric), marketers can monitor how visitors interact with a specific website (site-centric), or data can be collected directly from ISP networks (network-centric).
User based measurement involves convincing users to install software which will track most, if not all, their Internet usage. Browser toolbars are one of the most obvious approaches. Toolbars have many limitations. They are limited to tracking standard website traffic. They won't know about Skype and other non browser based internet applications. Toolbar based measurement is limited to a self-selected population which decides to install the toolbar, although there are some cases where a toolbar may be bundled with a new computer. Toolbar suppliers often exclude Firefox and other non-Internet Explorer browsers.
A second approach is to convince users to install an application which tracks both browser and other Internet application usage. Panel based measurement takes this approach. In some cases the users are well aware of their participation in a panel (and may alter their behavior accordingly); in other cases, users have chosen to install software to receive a specific benefit – the user might not be aware of the software's true purpose.
Measurement at the website level involves cooperating with website owners to install a web analytics system, usually based on web server log files or on tracking code inserted in all of a site's web pages. In Italy, Audiweb publishes website-centric data of major media companies (requires registration). Due to the need for site owners to cooperate, use of this approach is limited. Some web analytics hosting companies make limited website-centric data available as an activity secondary to their primary business.
Intercepting data between users and websites at the network level potentially offers sample sizes which are much larger than traditional panels (but smaller than the data sample available through collection directly at a website).
The first consideration in evaluating the data's potential reliability is to ask about the sampling methodology. Which ISPs have been selected? Is their typical user demographic different from those not included? Users of Telco ISPs, such as Telecom Italia's Alice, are usually following the path of least resistance. Those who have chosen economical ISPs such as Tele2 will have a different marketing profile. At the oposite end of the spectrum are the ISPs which focus on premium high speed connections, such as Italy's Fastweb. Do the selected ISPs include corporate traffic, or just small business and residential users?
Our look at web Statistics continues in part II
Table of Contents
Part I:
Part II: