Search Engine Crawlers: Who's visiting my site and why?

Need to put all the pieces of the search marketing puzzle together?

Organizations implementing search engine optimization (SEO) strategies will sooner or later consider monitoring search engine crawling activity. Before a web page can appear in search results, the content has to be discovered through a crawling or spidering process. This is done through software which automatically navigates the web, finding and downloading web content for the search engine to parse, index and rank.

Contents

search engine spider
A “spider”, also known as a “crawler”, “robot” or simply “bot”, finds and retrieves web pages. Once a search engine finds your site, either through a link from another site or through a submission form, the “spider” will begin to crawl your site.

Search engine crawling activity is an early sign that SEO is functioning or a potential warning sign of site issues impeding content discovery.

Some commercial web server log file based web analytics reporting tools, such as ClickTracks, provide reports an organization can use to verify:

Open source web analytics tools such as Analog and AWStats can provide a subset of this information.

Major Search Engine bots

The following is a list of the major search engine and search engine related service bots we have come across, with a brief note on their usage. Use this information based on your website's priorities to monitor specific crawling activity.

When tracking search engine robots, it is important to track not only the robot's name in the useragent data, but to verify the robot's host name as well. Some users and robots will spoof a well-known search engine robot when navigating your site.

Google

NoteGoogle Wireless Transcoder · Google Mobile's phone browser proxy. Not a bot as traffic is based on human page requests. Ref: http://www.google.com/xhtml.

All Google bots use a host name ending in googlebot.com. Google has documented how to verify googlebot is really from Google rather than a spoofed user agent.

Yahoo!

Microsoft MSN / Windows Live

As of November 2006, all Microsoft bots use a host name ending in search.live.com. Microsoft has documented how to verify msnbot is really from Live Search rather than a spoofed user agent.

Ask

 

Additional Robot Resources

Comprehensive Bot listings are maintained by several sites:

Note Web analytics system configuration tip: Crawler reports are not usually pre-defined in web analytics systems. Fortunately, they are easy to add. The key is to report on „Pages by User Agent”, where the user agent is the name of the Search Engine bot. Embedded tag solutions may not be able track non-human traffic — check for these limitations before making a commitment to a particular solution; while embedded tag solutions have some significant advantages over web log analysis systems, this, despite what you might be told, is not one of them.

Related Resources in this Website

Antezeta provides additional resources in this site which may be of interest to companies pursuing search engine optimization and web analytics strategies to better leverage their Internet presence.

Need to put all the pieces of the search marketing puzzle together?

To better understand the nuances of Search Engine Optimization and Web Marketing, let Antezeta help you with your Search Engine Marketing Needs!

Contact us today to find out more about this topic and the rest of the Web Ecosystem!

Bookmark this resource!

Was this resource helpful? If so, feel free to put a link to this page on your site! Just copy this code:

<a href="http://www.antezeta.com/search-engine-crawlers.html">List of search engine robots</a>

Bookmark this page with your bookmark service (hover over a logo to see service name):

Link broken? Let us know the correct link!

Contact us today to find out more about this topic and the rest of the Web Ecosystem!

Home · Contact Us · Site Map & Search · Keyboard shortcuts · Top ↑