This area focuses on resources to enhance the functionality of the web analytics tool AWStats.
These resources have been developed based on our client needs. As a contribution, we offer them here. Some may even make it into a future version AWStats!
The information here is provided on a „worked for us” as-is basis for your testing, verification and potential adoption.
ExtraSection Samples
AWStats has an excellent custom report syntax called ExtraSection which enables an organization to both extend standard AWStats and add organization specific reports. Below we offer ExtraSection samples useful for sites involved in search engine optimization web marketing and / or monitoring of traffic from external sites.
Web server log analysis can be memory and CPU resource intensive. AWStats documentation notes that each ExtraSection reduces AWStats speed by about 8%. Proceed with caution.
New report sections will appear for web log data processed after the AWStats configuration file has been updated. To retroactively generate reports, you must delete the AWStats statistics files and regenerate them as well as the reports that run off of them.
AWStats and Search Engine Optimization
Except for a few “destination” sites on the net, most sites need to include merit-based Search Engine Optimization (SEO) techniques in their ongoing site design, monitoring and enhancement activities. As installed, AWStats provides:
- A count of how many pages were seen by visitors from each detected search engine, by search engine. Over 100 search engines are detected.
When a visitor arrives from a search engine, only the first page they see is counted. Successive pages will count as internal site referrals. - An aggregate listing of the most popular search words (key words) for all search engines
- An aggregate listing of the most popular phrases (groups of key words) for all search engines
Organizations focused on merit-based search engine optimization will also want to monitor:
- Search engine crawling activity
- How often a search engine crawls your site
- What pages are crawled
- Successful user searches by search engine
- Key words
- Key word phrases
Search Engine Activity Example
Monitoring of successful search key words and phrases by search engine shows in detail what words are driving traffic to your site. Differences among search engines will lead you to evaluate multiple factors including crawling activity and your site’s design and content.
To see what key words and phrases are driving traffic from a particular search engine, add the following code to the ExtraSection part at the bottom of your AWStats configuration file.
You’ll need to change all occurrences of 5= to the next free sequential number if you already have existing ExtraSections.
ExtraSectionName5="Google Searches - Top 50"
ExtraSectionCodeFilter5="200 304"
ExtraSectionCondition5="REFERER,(.*www\.google.*)"
ExtraSectionFirstColumnTitle5="Search"
ExtraSectionFirstColumnValues5="REFERER,p=([^&]+)||REFERER,q=([^&]+)||REFERER,as_p=([^&]+)||REFERER,as_q=([^&]+)"
ExtraSectionFirstColumnFormat5="<a href='http://www.google.com/search?q=%s' title='Click to execute search'>%s</a>"
ExtraSectionStatTypes5=PHBL
ExtraSectionAddAverageRow5=0
ExtraSectionAddSumRow5=1
MaxNbOfExtra5=50
MinHitExtra5=1
2006-04-25: added link to open search terms in Google. Change google.com to your most used variant.
The key piece of the puzzle here is exemplified by the p= in REFERER,p=([^&]+). It is a query word delimiter. While most search engines have just one, Google is more complex – four have been seen. Common delimiters are p=, q=, key=, query=. Check the AWStats search_engine.pm file to find the syntax for the search engine of your choice.
Download a text file containing AWStats ExtraSection samples for Google, Yahoo, Ask and MSN and more.
Ideally, ExtraSectionCondition5 would be an AND conditional, perhaps using &&, taking all *.google.* traffic but not mail.google.* which is gmail. However, we have not yet found a way. Write if you have a suggestion which works.
ExtraSectionCondition5="REFERER,(.*google.*)&&REFERER,^http:\/\/([^mail\.google\.])"
Search Engine Crawling Example
Monitoring of search engine crawling activity is important to establish both that a search engine knows of your site and can “see” all of your pages.
ExtraSectionName1="Google crawls - Top 50"
ExtraSectionCodeFilter1="200 304"
ExtraSectionCondition1="UA,(.*Googlebot.*)"
ExtraSectionFirstColumnValues1="URL,(.*)"
ExtraSectionFirstColumnFormat1="<a href='http://www.mysite.com%s' title='Item Crawled'>%s</a>"
ExtraSectionStatTypes1=PHBL
ExtraSectionAddAverageRow1=0
ExtraSectionAddSumRow1=1
MaxNbOfExtra1=50
MinHitExtra1=1
Some internet users and robots pretend to be Googlebot when they navigate a site by spoofing the user agent value sent to the site. To insure more accurate tracking of Googlebot visits, you could change the ExtraSectionCondition1=”UA,(.*Googlebot.*)” value above to ExtraSectionCondition1=”HOST,(\.googlebot\.com$)”. This will track all site items downloaded by someone or something from Google, i.e. Googlebot, accessing your website from the IP *.googlebot.com. For Yahoo! use ExtraSectionCondition1=”HOST,(\.inktomisearch\.com$)”; for Microsoft use ExtraSectionCondition1=”HOST,(msnbot\.msn\.com)”; for Ask use ExtraSectionCondition1=”HOST,(egspd.*\.ask\.com)”. Ideally, you could/should also specify the useragent, but AWStats does not yet support two conditions.
We needed to modify AWStats to use the HOST parameter as an Extra Section condition. We changed the line
if ($HostResolved =~ /$conditiontypeval/) { $conditionok=1; last; }
in awstats.pl to
if ($field[$pos_host] =~ /$conditiontypeval/) { $conditionok=1; last; }
If AWStats is run on a different host than your site, you’ll need to replace http://www.mysite.com with the name of your site. Otherwise, just remove http://www.mysite.com.
The key piece of the puzzle here is Googlebot which identifies the Google crawler. It changes from search engine to search engine. Check the robots.pm file to find the crawler for the search engine of your choice. In addition to Googlebot, the primary crawlers are Yahoo! Slurp (Yahoo!), Ask Jeeves/Teoma (Ask), msnbot (MSN). See our related article Search Engine Crawlers: Who’s visiting my site and why?.
Download a text file containing AWStats ExtraSection samples for Google, Yahoo and MSN and more.
By now, you may be saying, “Great, but I need detailed reports for each search engine”. Unfortunately, this is not possible with the current ExtraSection syntax.
Search Engine sitemap.xml Example
Google recently instituted the concept of an xml site map description file. Particularly useful for sites with complex (convoluted?) navigation schemes, it is also useful to help search engines quickly identify new and updated content. See our article “The Google Webmaster Dashboard, a.k.a. Google Sitemaps” for more detail on this excellent search engine optimization tool.
This ExtraSection helps monitor who is using your site map.
ExtraSectionName13="sitemap.xml.gz downloads by Useragent"
ExtraSectionCodeFilter13="200 304"
ExtraSectionCondition13="URL,(^\/sitemap\.xml\.gz)"
ExtraSectionFirstColumnTitle13="UA"
ExtraSectionFirstColumnValues13="UA,(.*)"
ExtraSectionStatTypes13=HBL
ExtraSectionAddAverageRow13=0
ExtraSectionAddSumRow13=1
MaxNbOfExtra13=10
MinHitExtra13=1
Similar reports can be created for Yahoo’s URL List urllist.txt, urllist.gz and Amazon’s A9 / Alexa Site Info siteinfo.xml
AWStats and Top Referrals
AWStats provides a nice referral report, indicating both the domain and page which brought in a visitor. Often marketing campaigns need to report on aggregate traffic from a domain. The following ExtraSection will list each domain except our own. To use it, change www\.mysite\.com to your domain, adding a \ before each dot.
ExtraSectionName1="Referring Sites by domain - Top 25"
ExtraSectionCodeFilter1="200 304"
# Filter on ANY REFERER except "mysite". Change mysite to your domain name.
ExtraSectionCondition1="REFERER,^(?!http:\/\/www\.mysite\.com)"
ExtraSectionFirstColumnTitle1="Site"
ExtraSectionFirstColumnValues1="REFERER,^[hH][tT][tT][pP]:\/\/([^\/]+)\/"
ExtraSectionFirstColumnFormat1="<a href='http://%s/' rel='nofollow' title='http://%s/ [new window]'>%s</a>"
ExtraSectionStatTypes1=PHL
ExtraSectionAddAverageRow1=1
ExtraSectionAddSumRow1=1
MaxNbOfExtra1=25
MinHitExtra1=1
2006-01-13: Added logic to filter out own site referrals thanks to Jean-luc Halleux.
2005-09-20: Added rel=’nofollow’ to referrer link to minimize referrer spam problems. This is not needed in recent versions of AWStats as it is inserted in the global page meta tags.
AWStats and Top RSS Readers/Spiders
If your site offers RSS feeds, you may want to track which browsers and spiders are downloading them. The following logic will show the top Readers and Spiders for any file ending in .xml, .rdf or .rss. By changing the URL parameter to specific filename, you can create a “content group” to track a single feed.
ExtraSectionName2="Top 30 RSS Readers/Spiders"
ExtraSectionCodeFilter2="200 304"
ExtraSectionCondition2="URL,\.xml|\.rdf|\.rss"
ExtraSectionFirstColumnTitle2="RSS Reader/Spider"
ExtraSectionFirstColumnValues2="UA,(.*)"
ExtraSectionStatTypes2=HBL
ExtraSectionAddAverageRow2=1
ExtraSectionAddSumRow2=1
MaxNbOfExtra2=30
MinHitExtra2=1
Added 2005-12-22.
Monitor specific pages
Often a site may want to keep an eye on a specific page. This example will list stats for any page which contains javascript in the URL. One line will appear for each page.
ExtraSectionName24="Pages with javascript in name"
ExtraSectionCodeFilter24="200 304"
# Filter on specific URL, including possible jsessionid
ExtraSectionCondition24="URL,(^\/.*javascript.*\.html)"
ExtraSectionFirstColumnTitle24="URL"
ExtraSectionFirstColumnValues24="URL,(.*)"
ExtraSectionStatTypes24=PBL
ExtraSectionAddAverageRow24=0
ExtraSectionAddSumRow24=0
MaxNbOfExtra24=1
MinHitExtra24=1
Added 2006-03-22.
AWStats – Monitor top file downloads
The following example will allow you to track the top downloads for a list of file types. Modify the list to suit your needs.
# To do: Ideally parameterize from not page list.
ExtraSectionName15="Downloads (diff,doc,pdf,rtf,sh,tgz,zip) - Top 10"
ExtraSectionCodeFilter15="200 304"
ExtraSectionCondition15="URL,(.*((\.diff)|(\.doc)|(\.pdf)|(\.rtf)|(\.sh)|(\.tgz)|(\.zip)))"
ExtraSectionFirstColumnTitle15="Download"
ExtraSectionFirstColumnValues15="URL,(.*)"
ExtraSectionFirstColumnFormat15="%s"
ExtraSectionStatTypes15=HBL
ExtraSectionAddAverageRow15=0
ExtraSectionAddSumRow15=1
MaxNbOfExtra15=10
MinHitExtra15=1
Added 2006-03-22.
AWStats Known Limits
ExtraSections have the following current limitations:
- Only one user defined column may be specified in addition to the pages, hits, bandwidth and last visit options.
- Column order, sort order and width are fixed. See our section on sort order for a work-around.
- No graphs are produced.
- Calculated fields, such as summing the bandwidth used for a particular file, is not supported.
- In the “FRAMES” version of AWStats, where the Navigation frame is on the left, the ExtraSections appear in a list called Extra/Marketing but not in any apparent order.
- i18n internationalization support is not present.
- not all request statuses are available. Error reports on status 301 (permanent redirect) or 404 (not found) do not work.
You can contribute enhancements to remove these limits.
Track pages translated by Google Translate
Companies operating in multiple markets may want to track which website pages are being translated by a translation service such as Google Translate. See our discussion including an ExtraSection example.
Added 2006-06-23.
Perl REGEXP resources for AWStats
Regular expression syntax is obscure at best for those who do not use it on a regular basis. We have found a few quick tutorials which may help you.
- Regular-Expressions.info Suggested by Christian Lederer. (2006-01-13)
Have you created AWStats ExtraSections you want to share?
If you have created ExtraSections which provide non site-specific analysis, paste a copy in our feedback form and we will post the syntax here if we deem it to be generally applicable.
AWStats Enhancements of Interest
Many contributors make plugins and patches for AWStats. Those we find more interesting are described here.
Rolling 12 month view
By default, AWStats reports the current month with overall statistics for the current year to date. It becomes more difficult to spot overall trends at the beginning of a new calendar year as there are very few months of data. You need to open a second instance of AWStats to view the previous year’s data.
A contribution by rkodey solves this problem by changing the default view to the last twelve months.
The original patch is posted as AWStats patch ID 1103597. You’ll need to match the correct patch version with your AWStats version:
| AWStats Version | Patch Version | Patched awstats.pl |
|---|---|---|
| 6.2 | awstats.pl-1.783_last_12_months.patch | |
| 6.4 | awstats.pl-1.814_last_12_months.patch | |
| 6.5 | awstats.pl-1.857_last_12_months.patch | awstats.p.l.gz |
| 6.6 (1.887) | awstats.pl-1.857_last_12_months.patch |
See our instructions for patching a file if you are unfamiliar with patches.
For your convenience, we’ve provided the patched AWStats 6.5 version we’re using. Download the file, unzip it, backup your current AWStats program, replace your old awstats.pl file. MS Windows users may want 7-Zip or gzip to uncompress the file.
See AWStats patch ID 1103597 for complete instructions. Updated: 2006-11-27.
Unique visitors and Number of visits available in Countries table
Thanks to a contribution from Josep Ruano, it is now possible to see a breakdown of visits and unique visitors by country in the countries table. (AWStats 6.5+)
To enable extended country reporting, you must modify your awstats.mysite.conf configuration file, adding the letters U and V (users and visits) to ShowDomainsStats:
# Show domains/country chart
# Context: Web, Streaming, Mail, Ftp
# Default: PHB, Possible column codes: PHB
ShowDomainsStats=UVPHB
The change is retroactive and is effective immediately.
Sort Tables by Columns
nettoyeur25 posted a nice enhancement to AWStats which adds sortable tables to ExtraSections. This is accomplished by a JavaScript plugin.
- Download the Javascript source code, placing it in your Web Statistics server root directory (or modify the path in awstats.pl, after applying the patch below, to specify a better location).
- Download and apply our awstats.pl.sortable.diff patch to awstats.pl, after making a backup of your current copy.
We made several change to nettoyeur25’s original instructions:
- We encapsulated ExtraSections data in table body statements (TBODY) to keep summary data information from being included in the sort.
- We added column sorts to the Countries, Hosts, Page-URL, Robots, Search Phrases and Worms reports. We did not yet update all of the full detail reports. Check back for an updated patch.
See the plugin documentation for further details.
If you want to see this code in action, take a look at SeoMoz’s Search Engine Optimization Blog Web Analytics survey. We suggested they enhance their comparison tables with this code – and it has really improved the comparison tables.
We have not verified compatibility with PDF output produced by htmldoc. If you do check this, let us know.
Added 2006-10-19.
Filter Out HEAD Requests
By default, AWStats tracks several types of http requests. Besides the common GET and POST requests, AWStats also processes HEAD requests – requests to get HTTP header information without fetching the entire html page or other object.
These requests are used by legitimate tools such as the W3 link validation tool. Unfortunately, HEAD requests are often deployed by those with less than virtuous intentions, such as spammers trying to add referral links to blogs.
One reader, Che Dong, suggested filtering out HEAD requests. Fortunately, this is easy to do. Download and apply our awstats.pl.head.diff patch to awstats.pl, after making a backup of your current copy. For AWStats 6.5. Added 2006-10-19.
AWStats 6.6 uses should comment out line number 6322, || $field[$pos_method] eq ‘HEAD’. You may want to comment out other non-POST and GET requests as well. Added 2006-11-27.
Track Visitors by IP and Useragent
By default AWStats tracks visitors based on IP/host address. This can lead to improper visitor recognition, for example when an ISP changes a user’s IP during a navigation session. While the ultimate solution is to use a SESSION ID, a patch by Antoine EMERIT attempts to improve recognition by adding the User Agent (browser and operating system, including version numbers) to the IP.
This patch also includes logic to recognize visitors on non-page requests. While this can be useful for search engine cache page views (a user will request images from our server but the page will be served from the search engine), it will also count as a visit when a file on our server is downloaded from another site. Since the visitor didn’t actually view content on our site, it may not be a good idea to count them as a visitor in this case. For this reason, we are not actually using this patch. Added 2006-10-19.
Enhanced Object Detection
Antezeta has enhanced AWStats to improve recognition of operating systems, robots and search engines. Some of changes described below have been included in version 6.5. The latest robots and search engine changes must be applied manually.
Robot Detection
Consult our list of additional robots and download our enhanced robots.pm database to recognize additional robots.
Search Engine Detection
We have updated our AWStats search_engines.pm search engine data base to improve the accuracy of search engine recognition. Consult our list of updates and download the data base. Updated 2006-05-23.
AWStats currently aggregates search referrals from each country variant of most major search engines into a single total for that search engine, i.e. referrals from google.ca and google.co.uk will all show up as Google. Current exceptions are AOL.de, AOL.fr and all ASK.com variants. We are considering a future enhancement to break out Google, Yahoo and MSN searches by country.
Browser Detection
We have updated our AWStats browsers.pm browsers data base to improve the accuracy of browsers recognition. Download our browsers.pm.tgz browser database. Updated 2006-10-15 to include the Mozilla SeaMonkey browser. Installation is similar to that for robots.pm; you’ll need to add the Sea Monkey icon to your AWStats brower icon directory. This version is based on AWStats 6.6 and does not appear to be backwardly compatible. For AWStats 6.5 and earlier, we have an older version available.
AWStats Operating System Detection
As Linux is starting to become a more common client platform, we have made enhancements to group Linux systems as Windows and Mac are in the main AWStats report and to break down Linux systems by distribution in the operating system report. These enhancements are included in the final version of AWStats 6.5. This has been accomplished by adding the following distributions to the operating_systems.pm file. The additional known systems are:
- Centos
- Debian
- Fedora
- Mandrake
- Mandriva
- Red Hat
- Suse
- Ubuntu
- All others will be listed as Linux (unknown/unspecified).
For each distribution we also added a logo and, as documentation, a link to the operating system home page. The link will appear in the AWStats operating systems detail report. We also added links for each of the other currently known operating systems. These links will appear in the main AWStats report if the operating system is not part of the Windows, Macintosh or Linux families; otherwise they appear in the detail report.
AWStats 6.5 also adds a breakdown of BSD operating systems, not included in the patches here.
We have not yet verified Mandrake/Mandriva in a user agent string; detection may not work.
To install:
- Update the operating system list
- Adjusting the syntax to match your operating system, backup up the existing AWStats operating system database by moving
awstats/wwwroot/cgi-bin/lib/operating_systems.pm to awstats/wwwroot/cgi-bin/lib/operating_systems.pm.bck - Download and extract updated operating_systems.pm in the AWStats lib directory.
Now part of AWStats 6.5. Should be backwardly compatible. Last updated: 2005-08-20.
- Adjusting the syntax to match your operating system, backup up the existing AWStats operating system database by moving
- Add icons for the new operating systems
This step is optional. Reports will still work without the additional icons.- In the icons directory, copy
linux.png to lin.png - Download and extract awstats_os_icons.tgz in the AWStats icons directory used for your reports.
- In the icons directory, copy
- Update the main awstats.pl program. There are two options.
This step is optional – and potentially risky. If you do not do this step, each of the individual Linux distributions will appear independently in the main report rather than being grouped together as an operating system family.- Use the diff of our changes against your current version. This is the safest approach for almost all users as we worked with a CVS version of AWStats 6.5.
- Back up your current version of awstats.pl, moving
awstats/wwwroot/cgi-bin/awstats.pl to awstats/wwwroot/cgi-bin/awstats.pl.bck - Apply the differences indicated in awstats.pl.diff either manually, or using the patch command if available on your system:
patch --dry-run < {path to downloaded file}/awstats.pl.diff patch < awstats.pl.diff
- Back up your current version of awstats.pl, moving
- Alternatively,
- move
awstats/wwwroot/cgi-bin/awstats.pl to awstats/wwwroot/cgi-bin/awstats.pl.bck - Download and extract the updated awstats.pl in the AWStats cgi-bin directory.
- move
- Use the diff of our changes against your current version. This is the safest approach for almost all users as we worked with a CVS version of AWStats 6.5.
Known Issues
- Linux systems have only been broken down by distribution. Further investigation is required to see if version strings exist for each distribution and browser combination to make additional breakdown feasible.
- The operating system detail report list is currently reverse sorted by name. Ideally it would be sorted by top hits, pages or visits.
Top Level Domain Detection
The .cat top level domain was launched in 2005 to provide a domain specific to the Catalan region of Europe.
Francesc Roca Tugas has prepared a flag to support the .cat domain in AWStats.
Right click to save the cat.png image to your AWStats icon/flags folder, i.e. awstats/wwwroot/icon/flags or similar.- Locate the AWStats domains.pm file (usually in the lib directory under the awstats.pl program). Make a backup copy of the file. With a text editor, replace the line
'bz','Belize','ca','Canada','cc','Cocos (Keeling) Islands',with
'bz','Belize','ca','Canada','cat','Catalan Linguistic and Cultural Community','cc','Cocos (Keeling) Islands',and save the file.
How to patch a file
Many enhancements entail modifications to the standard AWStats program. A patch file simply contains lines to add and/or delete from an existing text file, such as the main program awstats.pl. If you’re curious, you can simply open an uncompressed patch file in a text editor.
- From the command prompt, change your working directory to that containing the file to patch, i.e. awstats.pl.
- Download the patch file to the current directory.
- Uncompress the patch file, i.e. gunzip awstats.pl.search.patch.gz. MS Windows users may want 7-Zip or gzip to uncompress the file.
- Back up the file to patch, i.e. cp awstats.pl awstats.pl.bck.
- Apply the patch, i.e. patch -i <patch file, i.e. awstats.pl.search.patch>. MS Windows users can download a version of patch for their systems.
In some cases you may see a message that patching the first hunk (section) failed. This will happen if the patch contains introductory comments right after the AWStats internal version number. In most cases, you can ignore the error. If you are not sure, revert to your back up file.
You can also use a dry run syntax, i.e. patch –dry-run -i <patch file, i.e. awstats.pl.search.patch>, to test if the patch can be applied without errors.
AWStats 6.5 New Features
AWStats 6.5 was recently released. Antezeta takes a look at new functionality in 6.5.
Exclude Spam Referrers
Some unscrupulous sites attempt to increase their Search Engine rankings and general visibility by automatically creating links to their sites on other sites. The primary target is blog sites which publish the latest referring URLs. A secondary target is sites which publish their web analytics statistics.
Consider a fictitious example: A site called www.dreamingdamsels.xxx has an automated program which requests the home page from www.mysite.com. If www.mysite.com publishes the most recent referrers on their site, then www.dreamingdamsels.xxx has just created a link to www.dreamingdamsels.xxx from www.mysite.com. Similarly, the automated program will try make enough requests from www.dreamingdamsels.xxx to become one of the top referrers, landing in the AWStats Web Analytics Top Referrals report. If a site publishes it’s web analytics reports, www.dreamingdamsels.xxx will appear on the site. In either case, the end game is to procure an automatic, free link from your site to theirs, in a parasitic approach to Search Engine Optimization.
Thanks to a contribution from Rod Begbie, AWStats version 6.5 has a referral spam filtering feature.
To enable the filtering, add a SkipReferrerBlackList to your awstats.mydomain.conf configuration file:
# Use SkipReferrersBlackList if you want to exclude records coming from a SPAM
# referrer. Parameter must receive a local file name containing rules applied
# on referrer field. If parameter is empty, no filter is applied.
# An example of such a file is available in lib/blacklist.txt
# You can download an updated version at Need new site. Old list is no longer available.
# Change : Effective for new updates only
# Example: "/mylibpath/blacklist.txt"
# Default: ""
#
SkipReferrersBlackList="/usr/share/awstats/wwwroot/cgi-bin/lib/blacklist.txt"
Change the path of blacklist.txt to match that on your system. Current files can be obtained from Need new site. Old list is no longer available.
Notes:
- You should review the filters to ensure there won’t be false positives. The longer the exclusion string, the better. If it is short, you risk filtering out valid traffic. For example, referrers which include analysis….
- The “change log” file in AWStats beta version 1.845 2005/09/19 notes the new feature as SkipReferrersBlackList. The correct syntax is without the second s in this version. Successive versions use SkipReferrersBlackList.
- Peformance hit: every line of your web server log file will be checked against every line in the blacklist file. A small log sample took 5 times longer to run with the default blacklist enabled.
There are several tactics for managing spam referrals:
- Deny access to your site from known spam referrers. This is usually done in the web server configuration file. For Apache, modify either the main configuration file or a top level directory .htaccess file. See www.spywareinfo.com/articles/referer_spam for more information.
- If you must publish referrers on your site, use rel=”nofollow” in the link syntax, i.e. <a href=”http://www.wwf.org/” rel=”nofollow”>World Wide Fund for Nature</a> to tell search engines to ignore the link. This negates any value a link might get from your site without impacting end users.
- Don’t publish web statistics on your site unless nofollow is used in the referral links or in the robots meta tag (<meta name=”robots” content=”noindex,nofollow”>). Recent versions of AWStats include nofollow in the robots meta tag.
Hourly, daily, (and yearly?) reporting period support debuts
Traditionally AWStats has followed a monthly reporting model – except for the monthly history, report sections present an overview of activity for a given month – either the current month to date or a previous month. While this works for many sites, a common need is to see what is happening on a more granular level. With version 6.5, sites will be able to run hourly, daily, monthly and/or yearly reports.
Expanding on a previous unsupported work-around documented in the AWStats FAQ (ID FAQ-COM600), version 6.5 introduces a new configuration option, DatabaseBreak.
DatabaseBreak automates the process of creating the correct AWStats intermediary statistics files necessary for hourly, daily, monthly and yearly report generation. While the functionality is still rough around the edges, we note what is possible with the current implementation.
1. Generate hourly, daily, monthly and yearly statistics database files.
Currently, support to generate the intermediary files works well.
For sites using the command line interface, there is one change to make: the new option DatabaseBreak should be specified for each reporting granularity desired. DatabaseBreak can take values of year, month, day and hour.
awstats.pl -config=antezeta_com -configdir=/etc/awstats -update -debug=0 -LogFile=access_log -DatabaseBreak=month
awstats.pl -config=antezeta_com -configdir=/etc/awstats -update -debug=0 -LogFile=access_log -databasebreak=day
awstats.pl -config=antezeta_com -configdir=/etc/awstats -update -debug=0 -LogFile=access_log -DatabaseBreak=year
awstats.pl -config=antezeta_com -configdir=/etc/awstats -update -debug=0 -LogFile=access_log -DatabaseBreak=hour
This will create statistics database files for the configuration file awstats.antezeta_com.conf.
| File | Description |
|---|---|
| awstats2005.antezeta_com.txt | 2005 Yearly file |
| awstats082005.antezeta_com.txt | August 2005 Monthly |
| awstats08200515.antezeta_com.txt | 15 August 2005 Daily |
| awstats0820051500.antezeta_com.txt | 15 August 2005 Hourly from midnight 00 to 01 am |
| awstats0820051501.antezeta_com.txt | 15 August 2005 Hourly from 01am to 02am |
| … | …additional hourly files … |
| awstats0820051522.antezeta_com.txt | 15 August 2005 Hourly from 22 to 23 (10 to 11pm) |
| awstats0820051523.antezeta_com.txt | 15 August 2005 Hourly from 23 to 00 (11pm to midnight) |
DatabaseBreak is case insensitive; month is the default value.
Do not put DatabaseBreak in your awstats.mysite.conf file. Specifying DatabaseBreak on the command line does not currently seem to override a configuration file value; no statistics file will be generated if the command line value does not agree with the configuration file value.
We have placed all of the statistics files in the same directory. While this does not seem to confuse AWStats, you could assign DirData=”__VarDirData__” in your AWStats configuration file, assigning an appropriate value to VarDirData each time you generate statistics with different DatabaseBreak values, i.e. export VarDirData=/awstatsdata/month to keep each set of statistics files in separate directories.
We have not yet verified statistics generation using the CGI on demand update. It should work with the databasebreak=hour&hour=18&day=22&month=08&year=2005 syntax described below.
2. Run reports
on-demand CGI reports
The on-demand CGI report drop down interface has not yet been updated to take advantage of the new statistics files, but there is a URL work-around. To generate
- monthly reports: use the drop down list as usual.
- daily reports: insert databasebreak=day&day=XX& after the ? in the URL address. XX is a two digit number for the day of interest.
- i.e. http://www.mysite.com/awstats.pl?databasebreak=day&day=22&month=08&year=2005&config=mysite
- hourly reports: insert databasebreak=hour&hour=YY&day=XX& after the ? in the URL address. XX is a two digit number for the day of interest. YY is a two digit number for the hour of interest.
- i.e. http://www.mysite.com/awstats.pl?databasebreak=hour&hour=18&day=22&month=08&year=2005&config=mysite
- yearly reports: do not yet seem to be supported. We tried
- databasebreak=year&year=2005
but got the message “Never updated (See ‘Build/Update’ on awstats_setup.html page)”. This is too bad, as the current yearly reporting is not able to recognize returning visitors across monthly boundaries. A yearly statistics file would change this to a yearly boundary problem.
- databasebreak=year&year=2005
To start AWStats reports in databasebreak mode without entering a long URL, use an intermediary html file with JavaScript to create the URL and, with a redirect, start AWStats for you. We have created an example AWStats html start file you can save and modify to fit your needs.
- Right-click to save it to disk. Open the file in a text editor.
- Change all occurrences of antezeta_com to the name of your AWStats configuration file.
- Change the AWStats program path from /awstats/awstats.pl to your path, i.e. /cgi-bin/awstats.pl, if different.
Known issues: The script will attempt to start AWStats using yesterday as the starting date, as long as today is not the first of the month. More elegant logic would set the month to the previous month and the day to the last day of that month (28, 29, 30, 31). In the case of January, the year would be reduced by one. There may be a need to prefix single digit days and months with a 0, i.e. 08 for August. This has not been done. If someone wants to add these enhancements, write us and we will post them here.
To run hourly or daily reports on historical data, you will need to generate hourly and daily statistics files for the historical data. You will not need to regenerate existing monthly statistics files.
Static report generation
Once the statistics files have been generated, two changes are necessary when generating static reports:
- Set -DatabaseBreak=interval, i.e. hour, day, month or year, on the awstats_buildstaticpages.pl command line (or awstats.pl as the case may be).
The year option does not yet seem to be supported in reporting. - If using awstats_buildstaticpages.pl, set the -builddate and / or -dir options so hour, daily, monthly and yearly report output will have unique names.
GeoIP data location becomes configurable (and must be configured)
If you are using the maxmind GeoIP plugin, you will need to specify the full path for GeoIP.dat:
LoadPlugin="geoip GEOIP_STANDARD /usr/local/share/GeoIP/GeoIP.dat"
While you’re updating your GeoIP file path, see our Windows and Linux AWStats GeoIP installation instructions for information on MaxMind’s newly available GeoLiteCity database.
There are a few other new options in AWStats which we’ll be looking at shortly.
What is your experience with AWStats 6.5? Write us with your feedback.
City, Organization and Country GeoIP Lite plugins
The GeoIP Lite plugins provide country and city information about users (”hosts”) connecting to your website. Organization information, either a large company or an ISP, is available using the AS Numbers database. Review our Windows and Linux AWStats GeoIP installation instructions.
Additional AWStats resources
- AWStats documentation online
- AWStats project page
- AWStats user community support mailing list. Before you post a question, you should search the archives to insure it hasn’t already been addressed.
- AWStats developer community – patches to fix and enhance AWStats
Articles on AWStats
- Carlos, Sean, Analyzing Web Logs with AWStats (O’Reilly Media), 2005-12-01 and 2006-01-09. Two part step-by-step guide to getting up and running with AWStats.
AWStats Limitations
AWStats is an excellent tool for small to medium sized websites. Larger sites may want to consider more advanced commercial tools at least to have better visitor recognition and user path (click stream) navigation analysis.
Clickstream Analysis
A few open source clickstream (user path navigation) analysis tools are available, albeit none currently integrate with AWStats. We have written rudimentary StatViz and Pathalizer installation and configuration instructions.
AWStats can show where visitors went after a certain page, or how they arrived at a certain page, using ExtraSections. Last updated: 2006-04-06.
# Assumes default page is "/" and is always referenced as /, not index.html etc.
# Assumes default page extension is html. This will thus exclude directory pages which appear as \
# Change html to your page suffix if different, i.e. htm.
ExtraSectionName25="Navigation from Home Page - Top 25"
ExtraSectionCodeFilter25="200 304"
ExtraSectionCondition25="REFERER,http:\/\/www\.mysite\.com\/"
ExtraSectionFirstColumnTitle25="URL"
ExtraSectionFirstColumnValues25="URL,(.*html$)"
ExtraSectionFirstColumnFormat25="%s"
ExtraSectionStatTypes25=PHBL
ExtraSectionAddAverageRow25=0
ExtraSectionAddSumRow25=1
MaxNbOfExtra25=25
MinHitExtra25=1
# Assumes default page is always linked to as "/". Some sites need to add index.html or default.asp as the case may be.
ExtraSectionName26="Navigation to Home Page from within site - Top 25"
ExtraSectionCodeFilter26="200 304"
ExtraSectionCondition26="URL,(^\/$)"
ExtraSectionFirstColumnTitle26="REFERER"
ExtraSectionFirstColumnValues26="REFERER,^http:\/\/www\.mysite\.com\/(.*)"
ExtraSectionFirstColumnFormat26="%s"
ExtraSectionStatTypes26=PHBL
ExtraSectionAddAverageRow26=0
ExtraSectionAddSumRow26=1
MaxNbOfExtra26=25
MinHitExtra26=1
Books on Web Analytics
AWStats uncovers a wealth of data about your website. Yet, to the untrained eye, report interpretation can be overwhelming. The following books on the field of Web Analytics can help you better interpret existing AWStats reports and provide inspiration for new reports through the extra section extensibility support.
- Peterson, Eric T., Web Analytics Demystified: A Marketer’s Guide to Understanding How Your Web Site Affects Your Business (Celilo Group Media and CafePress, 2004), 240 pages.
- Peterson, Eric T., Web Site Measurement Hacks (O’Reilly Media, 2005), 352 pages.
- Sterne, Jim, Web Metrics: Proven Methods for Measuring Web Site Success (John Wiley & Sons, Inc, 2002), 430 pages.
- Inan, Hurol, Measuring the Success of your Website – A Customer-centric Approach to Website Management (Pearson Education Australia Pty Ltd., 2002), 232 pages.
- Novo, Jim, Drilling Down: Turning Customer Data Into Profits With A Spreadsheet (Booklocker, 3rd edition, 2004), 356 pages. Focuses on customer behavior and the customer lifecycle. Software available for download from companion site Drilling Down project
- Duane Wessels, Web Caching (O’Reilly Media), ISBN: 1-56592-536-X, 318 pages. Improper cache management, all too common, can impact both correct content delivery and web statistics.
Additional Antezeta featured books
Similar Posts:
- Google rolling out much improved Google Analytics V2
- Web Analytics Embedded JavaScript Page Tracking Code: Place at the top or bottom of the page?
- Tracking Search Engine Cache Page Views with Web Analytics
- Blog statistics with BlogBabel at ZenaCamp in Genoa, Italy
- BLVD Status Analytics in free public beta test
If you're new here, you might subscribe to my feed by Email, RSS feed and/or follow me on Twitter, which is updated on a more frequent – and more meaningless – basis in English and Italian. Finally, if you're a Sphinn user, Sphinn love is welcome :-). Thanks for visiting!
Share


2 responses so far ↓
1 Dave // Oct 6, 2009 at 20:31:06
Would like to know how to add an extra report to track 1. clicks on a specific banner image and overall stats on visits to pages within a specific sub-directory on the site.
I appreciate any help.
Dave
2 Laptop Repair Birmingham // Feb 4, 2010 at 14:37:13
I’m looking for away to get an email every time Google bot searches my site. I’m thinking this post might lead me it the right direction using awstats. Could the author come up with any suggests to my requirement ? thanks
Leave a Comment
Warning: Comments are welcome insofar as they add something to the discussion. Anonymous and/or polemical comments without a rational justification of the author's position risk being mercilessly deleted at the sole discretion of the administrator. Yes, life is hard :-).