
Accurate recognition of Internet robots is essential for separating automated traffic out from real human page views when processing your Web Analytics reports. AWStats implements a modular approach to robot recognition: all robots are contained in a perl module separate from the main AWStats program. We've been adding robots we come across – more than 150 to date. Where possible, we also provide a link to the robot's home page which integrates in the AWStats reports.
To update AWStats robot detection, just replace your existing robots.pm file with our version here.
You should be aware that adding additional robots to your Web Analytics processing will increase the time necessary to process your logs – this is the price you'll pay for greater accuracy. Recognition of newly added robots is not retroactive – you must reprocess your logs if you want to see new robots for historical data.
To avoid false robot recognition, we often must use strings including spaces (indicated by using the regex \s); it seems we could use _ but we could not get this to work. Microsoft's IIS replaces spaces in the Useragent with the + character. We considered using the regular expression (\s|\+) to support both spaces as well as the + used by MS IIS, but we didn't want to add the additional performance overhead. Thus, IIS users should manually change \s to \+. Eventually we may investigate why the underscore wasn't recognized as an equivalent by AWStats as this solution is less than ideal.
Download our updated robots.pm and save it in the AWStats lib directory, after having made a copy of your existing version. Consider backing up your AWStats statistics (intermediary) files as well. They are usually in the AWStats DirData directory. The library should be backwardly compatible.
See our updated AWStats Search Engine Database and Browsers Database and our other AWStats web analytics resources as well!
Last updated: 2006-10-15
Several sites regularly document known information about the various web robots, including suggestions on which robots may be worth blocking from your site as their intentions are not ethical.
We have enhanced the current Robots database:
Added:
The above changes are already included in the 2005-11-26 version of AWStats 6.5.
The following changes, posted here 2005-12-15, may not yet be in 6.5:
2005-12-22: More additions.
2006-01-13: More additions.
2006-01-24
2006-01-27 22 robots from a list supplied by Moizes Gabor
2006-02-01: More additions. Most from a list provided by Moizes Gabor [ mojzi -a-t- free mail hu ]
2006-05-15: Added more robots; made two changes:
2006-05-17
2006-05-20 80+ robots, many from a list supplied by Moizes Gabor [ mojzi -a-t- free mail hu ]
2006-05-23 10 robots
2006-05-27 4 robots
2006-06-13 14 robots
2006-06-26: 7 robots
2006-08-25: 17 robots
2006-09-07: 6 robots
2006-10-15: 38 robots
Improve the quality and accuracy of the information here by sending us feedback.
If you find this document useful and want to provide a translation in your native language, write us.
Let Antezeta help you in the selection, implementation and usage of a Web Analytics solution!
Contact us to find out more about this topic and the rest of the Web Ecosystem.