Internet market research using web statistics: pick a number, any number, continued.

Web Analytics Services

Tool Selection, Implementation and Training

...continued from part I of Web Statistics.

The measurement companies

Alexa

Amazon.com's Alexa is one of the better known supplier of web statistics, and one of the few offering broad international data. Data is collected through the Alexa Toolbar installed by those wishing to view site ranking data for competitor sites, rank the sites they visit, and more recently, perhaps due to incorporated anti-phishing tools. It not difficult to imagine that this self-selected community is comprised of web masters in the know looking to increase their own site's ranking.

Alexa's methodology is fairly well documented, including significant disclaimers.

Although Alexa should be lauded for documenting their methodology, the disclaimers inaccurately minimize the IE Windows centric nature of Alexa data. While Alexa states they collect data from toolbars on Macintosh and Linux platforms, Alexa fails to note that they offer just one toolbar: for Internet Explorer on MS Windows. While there are some third party tools for Firefox, it is doubtful that they have the same uptake percentage as the official toolbar for IE. One, the "About this site" extension, would only ping Alexa's servers upon a manual rank verification request, certainly not the same as the automated data collection offered by the IE toolbar. One semi-official solution, the A9 toolbar, has been abandon by Amazon. As Alexa's site and toolbar are only available in English, English speaking regions may be overrepresented.

Update: Alexa has released "Sparky", a toolbar for Firefox, 16 July 2007.

Notable has been Alexa's commitment to web developer API's, allowing alternative services to provide alternative visual representations of Alexa's base data. One example is Statsaholic, although it been accused of scraping Alexa's data.

Alexa does offer some pretty icons to illustrate a site's ranking in the Alexa community – particularly useful for those who have manipulated Alexa's rankings.

corriere.it vs repubblica.it: page views of the past 6 months.

Antezeta considerations: Alexa's data, based on a self-selected sample, lacks scientific rigor. We wouldn't make financial investments based on Alexa rankings. Admire the pretty graphs and move on.

Compete

Founded in 2000 by Bill Gross of Overture fame, Compete employs a browser toolbar (Internet Explorer and Firefox supported), panels and ISP data to capture two million plus community members in the US. Sites whose audience is primarily outside the US are not officially supported; limited non-US tracking appears to have begun in July 2006 as seen in a comparison of visits to Italy's two leading newspapers:

Corriere vs Repubblica

Compete's methodology is vaguely discussed but not publicly documented.

Antezeta considerations: Judicious leveraging of data from three different sources could potentially lead to more reliable data than that provided solely by small panels or self-selected toolbar audiences. With a US focus, Compete data is not reliable for sites whose main visitor population is outside the US.

comScore

comScore uses a panel of over 2 million participants who have been recruited by software offering other benefits such as an E-mail antivirus and free prizes. comScore's methodology has proved very controversial, with some labeling the now discontinued MarketScore antivirus software spyware. In theory this somewhat self-selected sample (it is not clear how many participants signing up for free software really understand the true motive behind the free software) is normalized to represent overall Internet demographics, but, as in the case of the other services, we'll have to take comScore at their word on this.

Through a Canadian subsidiary SurveySite, comScore conducts quantitative & qualitative research online.

Antezeta considerations: From a marketing demographic perspective, is an educated, savvy Internet user going to install software of dubious provenance? Are companies with IT processes in place going to allow their employees to install this software?

Quantcast

Launched in September 2006 by "a team of engineers and scientists from NASA, Stanford and AltaVista". Quantcast primarily collects data from ISPs and advertisers with a US focus. Site owners can tag their pages with Quantcast code for more accurate site level reporting. As with most web statistics suppliers, Quantcast produces some impressive reports, but the exact methodology used is not publicly disclosed. Quantcast do describe their overall approach as a mix of panal data sampling and pixel tracking. Unfortunately, the devil is, as always, in the details: how is the panel recruited? How representative is it of the public as a whole? Are panel members aware that their navigation will effect Quantcast's data, and if so, does that influence their navigation? How is the data processed, and with what assumptions?

View Quantcast profiles for la repubblica and corriere della sera.

Antezeta considerations: Without full disclosure of Quantcast's methodology, it is unclear if their results could withstand peer scrutiny. Benchmarking of sites using Quantcast tracking code should be fairly reliable assuming (big assumption here) site pages have been properly tagged.

Nielsen//NetRatings

Nielsen//NetRatings, long established as Nielsen in the audience measurement field, is perhaps one of the most cited source of web statistics in the general press.

Nielsen captures high level internet data using sample panels selected by the random digit dialing (RDD) method. In Italy, the sample size was just increased from 5,000 to 15,000 with a stated goal of 20,000 by the end of 2007. A press release also notes rather vaguely that the RDD selection is being augmented by on-line recruiting; it is not clear as to how nor why. It is this panel data which is usually referred to in press statements on overall Internet usage trends.

Nielsen also captures web analytics data at the website level, i.e. much more accurate, for clients who install Nielsen//NetRatings' "SiteCensus" tracking code. SiteCensus is based on the former Red Sheriff product.

Nielsen//NetRatings offer a great sourcing Guidelines document for their clients and other consumers of their data but this document seems to be focused primarily on protecting Nielsen//NetRatings' image and revenue streams rather than clarifying the underlying methodology and statistical error. Try to find a comprehensive discussion of Nielsen//NetRatings' methodology used to justify their statements. Does Nielsen metering software run on Macintosh and Linux computers? Or does it track just Windows users? Does RDD call both cell and landline phone numbers? How many people refuse to participate? Are they more affluent, time pressed professionals who can't be bothered? The best you'll find is a marketing document touting "Nielsen//NetRatings' precise information".

So want does precise mean? Consider the Italian panel of 15,000 members. Italy's population is about 59,000,000 (Source: Istat). Website X is leader in the home mortgage business, a very important sector both for home buyers and lending institutions. Website Y is a valid contender. Nielsen's sample size is .025% of the entire Italian population, .038% of the 15-64 year olds. Yes, you understood correctly: Nielsen//NetRanking's calculations as based on a whopping 0% of the population, after rounding. In the case of our Home Mortgage example, site X has ~120,000 unique visitors every month. Site Y, the closest competitor, attracts ~75,000 unique visitors a month. So how many of these real visitors can Nielsen//NetRankings track? About 46 and 29 respectively, assuming the site traffic and panel composition is between 15 and 64.

Italy Population Nielsen//NetRatings Sample Size Sample as % of Population Unique Monthly Visitors Site X Number of visitors captured by Nielsen//NetRatings Unique Monthly Visitors Site X Number of visitors captured by Nielsen//NetRatings
Adult (15-64) 39,058,000 15,000 0.038% 120,000 46 75,000 29

Antezeta considerations: Nielsen//NetRatings does not publicly release data for specific sites other than what appears in their press releases. Consider if a small panel is an appropirate measurement technique for the Internet.

Hitwise

Australian based Hitwise uses ISP data to report on Internet use in its primary markets: the United States, United Kingdom, Australia, New Zealand, Hong Kong and Singapore. While the main focus appears to be on data collected from ISPs, there is also a brief mention of "opt-in" data.

See the previous discussion of network-centric data collection to understand some of the advantages and limitations inherent in using ISP data.

A few number of sample reports, such as top search engines, are available in Hitwise's data center. Hitwise does not currently have partnerships with Italian ISPs, thus Hitwise does not report on the Italian Market.

Antezeta considerations: Hitwise does not publicly release data for specific sites other than in press releases and blog entries. Evaluate Hitwise's sampling techniques for your markets before utilizing their data.

Netcraft

Netcraft has tracked web server statistics across the internet since 1995. In December 2004, Netcraft began to track website popularity through a user installed toolbar, promoted as an anti-phishing tool. According to Netcraft, the site ranking is based on the weekly hit rate. Netcraft estimates that the toolbar usage is in the hundreds of thousands and reflects the general population of the internet.

Antezeta considerations: Data suffers all the limitations of a toolbar based self-selected sample.

Ranking.com

Ranking.com collects data via a browser toolbar which offers a "trust gauge", site ranking information and a "browser accelerator" (search engine search box) among its features. The toolbar is only available in English for Microsoft's Internet Explorer on Windows. The tech savvy Firefox crowd is not part of this demographic nor are Macintosh aficionados. Data methodology discussion says updates are monthly.


Ranking.com Website

Antezeta considerations: Data limited by self-selection methodology and Internet Explorer only support.

Website ranking services at a glance

Company Since Primary metrics (publicly available) Sample Size (world-wide) Sample Selection Methodology Click data sources Modifiable Site Profile? Data API Geographic Scope
Alexa Logo 1996 Rank and Reach "an installed based of millions of toolbars"; 180,000 (third party estimate) Self selection Browser Toolbar (IE / Windows only) Yes Yes. Fee based. World-wide
Compete Logo 2000 Visitors, Engagement 2,000,000 Self selection Browser Toolbar No Yes. US
comScore Logo 1999 Visitors, Rank 2,000,000 Self selection User installed software / spyware from opinion square, Permission Research and potentially others. No No US, Canada, UK, France, Germany and unspecified others.
Hitwise Logo 1997 Rank, market share not disclosed Hitwise ISP agreements ISPs No No US, UK, Australia, New Zealand, Singapore
netcraft Logo 1995 Rank not disclosed Self Selection Browser Toolbar (IE, Firefox) No No World-wide
Nielsen//NetRatings Logo 1997 not disclosed (15,000 in Italy) Mostly RDD? Opt-in Panels No No? World-wide
quantacast Logo 2006 Rank "1.5 million U.S. Internet users, working to grow that by an additional one million in the near term"1 Mixed panels, ISPs, web site publishers Yes – demographic profile; measurement method. Site name and ranking for top million sites can be downloaded US focus. International data provided for publisher submitted sites.
Ranking.com Logo 1998 Rank, Trust Gauge, Links 215,000 Self selection Browser Toolbar (IE / Windows only) Yes No English centric due to English only toolbar.

1E-mail from Krista Thomas, Vice President, Marketing Communications, Quantcast

This data was complied from information on the services respective websites, augmented in a few cases by queries to a company. Please do contact us with updates and clarifications. Last update: April 2007.

Additional Web Statistics Data Sources

The following are additional sources of web analytics data. Bear in mind that each of these sources is probably subject to limitations based on many of the issues discussed earlier in this article.

Blog measurement

BlogBabel

BlogBabel, covering the Italian and Spanish markets, aggregates third party blog metrics to arrive at a BlogBabel ranking.

FeedBurner

FeedBurner, now a Google property, manages the RSS feeds for many blogs. RSS Feed usage statistics are available to site owners and if enabled by the site owner, via a FeedBurner API.

Technorati

Technorati's authority ranking of a blog is classified as the number of incoming links to a blog in the last 6 months. The 100 most popular blogs are listed in order.

Click Fraud

Click fraud, also called "invalid clicks", is a term used in pay per click marketing to refer to clicks on links intended to defraud the advertiser paying for the proportional link. One company, Click Forensics, publishes a Click Fraud Index™. As you might imagine, not everyone thinks these numbers are accurate.

Web Analytics Suppliers

By virtue of tracking most, if not all, clicks on a client's website, suppliers of hosted web analytics systems have an excellent view of how a segment of internet sites are performing. Data sampling is limited to companies which have selected to use one of these hosting services.

Fireclick

Fireclick provides web analytics services to clients in a variety of market segments. Selected Business and Marketing Conversion metrics derived from Fireclick customer data are published on a weekly basis. Business measures include Shopping Trolley / Cart Abandonment, global conversion rates and visit information. Marketing conversion data includes e-mail, keyword and affiliate program tracking.

Shiny Stats

A Web Analytics provider based in Italy, Shiny Stats publishes a daily ranking of top sites based on visits to sites using Shiny Stats tracking. Data can be viewed by category or by a free text search on a site's description in the Shiny Stats system.

Audiweb

Aggregate data from major portals and media properites in Italy are published by Audiweb. (In Italian. Registration required.)

Google Analytics

Google Analytics began to support benchmarking of a site's web statistics against various industry categories in March 2008. Benchmarking data is only available for sites which have opted in to this program. (added 2008-03-06)

Showdown: How do Web Ranking Services Rank each other?

For each of the public ranking services listed in the left column, see how they rank across different services. Sort by a column to rank the rankers by a specific ranker!

Ref. Ranker Alexa Rank Compete Netcraft Quantcast Ranking.com
1. Alexa Logo na 2763 911 7509 54
2. Compete Logo 2789 12058 305916 2313 22387
3. comScore Logo 7772 79780 132068 233255 35429
4. Hitwise Logo 7141 58920 185040 132543 7384
5. netcraft Logo 4227 104099 175 98094 8963
6. Nielsen//NetRatings Logo 5825 73290 134677 264770 76523
7. quantacast Logo 2768 8129 135517 20670 21534
8. Ranking.com Logo 19147 7379 230187 229094 14293

Survey 2007-04-16. Compete: value for march 2007

Technical Website Performance Benchmarks

At first glance, technical benchmarks may seem to be more of an IT purview. Yet marketing professionals should be concerned as well. A slow loading website makes for a frustrating user experience. Google advises website page loading time will be considered when assigning an Adwords Quality Score to a landing page, certainly a point of acute interest to search marketing practitioners. It isn't too hard to imagine page loading time impacting organic search results as well. Many companies provide website technical performance measurement and monitoring; the Apdex (Application Performance Index) membership is a good list. Few offer publicly available benchmark data. Section added 2008-03-23.

AlertSite Market Index

AlertSite offers limited benchmarking data for the Computer, Financial Services, Information Services, Manufacturing, Retail and Telecommunications markets in the US.

Gomez Website Performance Benchmarks

Gomez calculates technical performance benchmarks for response time, availability, and consistency for selected websites in some of the most popular business sectors on the web. Data is available for Canada, China, Germany, the UK and the US. Gomez simulates typical business transactions, such as looking for a hotel room, from diverse geographic regions. To understand the exact steps Gomez measures for a given sector, you'll probably need to contact Gomez.

Keynote Industry Benchmarks

Keynote provides a similar service to Gomez in the US and UK. Data limited to "top firms" (mostly national) is also available for the Benelux, French, German, Portuguese and UK markets. According to Keynote, sites selected

should be part of the index based on the following criteria: online brand awareness, third-party published traffic to the site, ability to be measured by Keynotes measurement computers and percentage of revenues driven via the online channel.

Conclusion

As the above overview demonstrates, there are many sources of Internet statistics. Unfortunately, most of these numbers fail to offer any degree of confidence as to their reliability – rendering them practically useless. Few of the documented methods employ scientific rigor (e.g. they exclude all Internet users who don't use Internet Explorer, etc.); worse still, many suppliers don't document their methods to any degree that lends itself to outside validation.

In the absence of anything better, there's a strong temptation to think these statistics might be better than nothing. Yet, numbers derived without scientific rigor are worse than nothing as they provide a false sense of confidence in business decision making, a confidence which lacks a solid foundation based in reality.

It is our hope that industry associations like the IAB and the Web Analytics Association (WAA) will spotlight the need for modern methods and accountability in internet statistics gathering and reporting.

Feedback and Comments

We welcome your feedback on this article. If you have information to help us clarify or improve any part of it, please contact us directly. If you would like to leave a public comment, please use our related web statistics blog posting. Should you require web analytics or search engine optimization consulting, please let us know.

Table of Contents

Part I:

  1. Web statistics for internet market research: pick a number, any number
    1. How to perform competitor research using web statistics while avoiding lies, damned lies, and ...statistics?
    2. Statistics reliability
    3. Is there a statistician in the House?
    4. Methodology transparency
    5. Common measurement approaches
      1. User-centric internet usage measurement
      2. Website-centric usage measurement
      3. Network-centric usage measurement

      Part II:

  2. Internet market research using web statistics: pick a number, any number, continued.
    1. The measurement companies
      1. Alexa
      2. Compete
      3. comScore
      4. Quantcast
      5. Nielsen//NetRatings
      6. Hitwise
      7. Netcraft
      8. Ranking.com
    2. Website ranking services at a glance
    3. Additional Web Statistics Data Sources
      1. Blog measurement
        1. BlogBabel
        2. FeedBurner
        3. Technorati
      2. Click Fraud
      3. Web Analytics Suppliers
        1. Fireclick
        2. Shiny Stats
        3. Audiweb
        4. Google Analytics
    4. Showdown: How do Web Ranking Services Rank each other?
    5. Technical Website Performance Benchmarks
      1. AlertSite Market Index
      2. Gomez Website Performance Benchmarks
      3. Keynote Industry Benchmarks
    6. Conclusion
    7. Feedback and Comments
Home · Contact Us · Site Map & Search · Keyboard shortcuts · Top ↑