Several months ago a client inspired me to write a comprehensive guide to keeping website content out of search engines. Usually website owners are focused on the opposite side of search engine optimization, insuring web content is well indexed. Yet, as many can attest, search engines can be all too efficient at finding documents they shouldn’t. Thus, the need to understand what options exist, how they work and which search engines support them.
One problem with the techniques available up until now is that options for digital media have been limited. The official way to keep video, audio and pdf files out of search engines was through the robots.txt protocol, not a very efficient tool when setting indexing options on a file level.
Google, acutely aware of the growing popularity of video, image and other non-html file types, has responded to the gap by introducing a way to add indexing instructions to the http headers via a “X-Robots-Tag” directive. Any of the Google supported meta robots values may be specified. While the “X-Robots-Tag” directive is an excellent tool, I suspect usage will be limited: most website administrators are probably not too familiar with Apache’s mod_headers or Microsoft’s custom headers.
A second common problem with search engine indexing has been the delay between when a page is removed from a website and when it is finally removed from a search engine’s index. Google is addressing this problem as well by introducing a meta tag attribute called unavailable_after. With this meta tag, sites can specify when a page should be removed from search results. Unfortunately, Google says this tag is currently only limited to “web search” which is a bit strange as they also said that web search has become “universal search“, integrating images, video and maps into the standard document search.
There are a few enhancements I’d like to see:
- Google’s X-Robots-Tag and unavailable_after blog entry refers to the ambiguous and obsolete RFC 850 date format. I’d like to see Google refer to a current specification, such as IETF Internet standard RFC 3339, to insure proper date parsing.
- Google’s webmaster tools console should show site pages and their expiration dates, offering confirmation that unavailable_after has been properly set.
- Google’s online documentation does not yet seem to reflect these new indexing options.
- Yahoo!, Microsoft and Ask: please continue the cooperation you’ve show with sitemaps and the noodp meta tag. Please adopt the X-Robots-Tag and unavailable_after indexing directives.
Related article in this site: How to prevent Search Engines from indexing parts of your website: 6 ways.
Have your say: click a number of stars to rate this post:
Email This Post











0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.
Leave a Comment