A small percentage of search engine users may view a web site using a search engine's saved copy of site pages, their cached version. The cached copy the search engine serves to the user usually contains links to embedded objects present in the original site: images, CSS stylesheets, javascript, etc. Organizations focusing on web marketing activities, such as search engine optimization, will want to track all search engine activity, including cached page views.
Contents
Referrers from the search engine's cached copy will show up in the site's web server log files, including the keywords and keyword phrases used to find the cached copy. In some cases, the user will click through to the original website, viewing a real page with cache referring information in the web server log file.
Cache views are more difficult for Web Analytics software to recognize, but it can be done.
A Web Analytics tool must dissect the search engine referring URL as in this example:
http://64.233.179.104/search?q=cache:l5D4yOKeZaYJ:www.antezeta.com/search-engines-site-
localization-duplicate-content.html+google+dialect&hl=en&ct=clnk&cd=9
| Item | Description |
| http://64.233.179.104/ | A known Google IP address. |
| search | The Google Service. Others you may see include translate_c |
| q=cache:l5D4yOKeZaYJ: | Indicates a query, made to an item in cache. The cache ID is a 12 character alphanumeric string. |
| www.antezeta.com | Domain containing item matching query terms |
| search-engines-site-localization-duplicate-content.html | Object matching query terms (html page, pdf...) |
| google dialect | Query words entered by user |
| hl=en | Google Interface Human Language code (English) |
| ct=clnk | Not needed |
| cd=9 | Not needed |
In some cases, a user may view a search engine's cached copy of a page without entering search words in a search engine. How? Through a search engine browser toolbar. Such a referrer will look like this example:
http://72.14.207.104/search?sourceid=navclient&ie=UTF-8&rls=GGLG,GGLG:2005-50,GGLG:
en&q=cache:http%3A%2F%2Fwww.antezeta.com%2Fawstats.html
We have added logic to the AWStats Web Analytics application Search Engine Recognition Module to better recognize Search Engine Cache query terms, page views and click-throughs to a site.
Each of the other major search engines also allows search engine users to view a cached copy of a website's pages. As Google is the dominate player in our main market, Italy, we have focused most of our effort on implementing tracking for Google's cache. We will eventually add logic to AWStats to better track traffic from the Yahoo!, Ask and Microsoft caches.
In many markets, international traffic is of great interest. Tracking user requests to translate a website's pages, combined with Geographic Location, a.k.a. Geo Location, data, can provide useful input into decisions to translate a website into other languages. AWStats custom report facility, ExtraSections, allows a site utilizing AWStats to track which site pages have been translated by users using Google's translation services. Similar ExtraSections can be created for the other major online machine translation services such as Babelfish and Free Translation:
ExtraSectionName1="Google Translate Referrals - Top 50"
ExtraSectionCodeFilter1="200 304"
ExtraSectionCondition1="REFERER,(.*\/translate_c\/?)"
ExtraSectionFirstColumnTitle1="Language pair and Page"
ExtraSectionFirstColumnValues1="REFERER,(langpair=([^&]+).*u=([^&]+))||REFERER,u=([^&]+)"
ExtraSectionStatTypes1=PHL
ExtraSectionAddAverageRow1=2
ExtraSectionAddSumRow1=1
MaxNbOfExtra1=50
MinHitExtra1=1
Not all referrals contain a language pair. The number of hits is not too meaningful. Each object on a translated page, such as images, CSS, etc., will count as a hit. The number of pages reflects a user clicking on a link in the translated page, arriving at the original site. The above syntax could be improved to remove the langpair= and u= strings from the output. Contributions welcome!
Some sites may want to inhibit search engines from serving copies of a site's pages. This is especially the case for site content which may be quickly out of date. A site may also want to better control how it's pages are displayed. Fortunately, it is very easy to allow search engines to index a site's pages, but not deliver cache results. Simply add
<meta name="robots" content="noarchive" />
in the <head> section of each page. The change will not be effective until the search engine re-crawls and indexes the page.
Antezeta provides additional resources in this site which may be of interest to companies pursuing search engine optimization and web analytics strategies to better leverage their Internet presence.
To better understand the nuances of Search Engine Optimization and Web Marketing, let Antezeta help you with your Search Engine Marketing Needs!
Contact us today to find out more about this topic and the rest of the Web Ecosystem!
Was this resource helpful? If so, feel free to put a link to this page on your site! Just copy this code:
<a href="http://www.antezeta.com/search-engine-cache.html">
Tracking Search Engine Cache Views</a>
Bookmark this page with your bookmark service (hover over a logo to see service name):
Link broken? Let us know the correct link!
Contact us today to find out more about this topic and the rest of the Web Ecosystem!