Antezeta LogoAntezeta Web Marketing

Reflections on search engine optimization, web analytics and web marketing

Antezeta Web Marketing header image 2

Now there are 6 ways to keep website content out of search engines

by sean · No Comments

Share

Several months ago a client inspired me to write a comprehensive guide to keeping website content out of search engines. Usually website owners are focused on the opposite side of search engine optimization, insuring web content is well indexed. Yet, as many can attest, search engines can be all too efficient at finding documents they shouldn’t. Thus, the need to understand what options exist, how they work and which search engines support them.

One problem with the techniques available up until now is that options for digital media have been limited. The official way to keep video, audio and pdf files out of search engines was through the robots.txt protocol, not a very efficient tool when setting indexing options on a file level.

Google, acutely aware of the growing popularity of video, image and other non-html file types, has responded to the gap by introducing a way to add indexing instructions to the http headers via a “X-Robots-Tag” directive. Any of the Google supported meta robots values may be specified. While the “X-Robots-Tag” directive is an excellent tool, I suspect usage will be limited: most website administrators are probably not too familiar with Apache’s mod_headers or Microsoft’s custom headers.

A second common problem with search engine indexing has been the delay between when a page is removed from a website and when it is finally removed from a search engine’s index. Google is addressing this problem as well by introducing a meta tag attribute called unavailable_after. With this meta tag, sites can specify when a page should be removed from search results. Unfortunately, Google says this tag is currently only limited to “web search” which is a bit strange as they also said that web search has become “universal search“, integrating images, video and maps into the standard document search.

There are a few enhancements I’d like to see:

  • Google’s X-Robots-Tag and unavailable_after blog entry refers to the ambiguous and obsolete RFC 850 date format. I’d like to see Google refer to a current specification, such as IETF Internet standard RFC 3339, to insure proper date parsing.
  • Google’s webmaster tools console should show site pages and their expiration dates, offering confirmation that unavailable_after has been properly set.
  • Google’s online documentation does not yet seem to reflect these new indexing options.
  • Yahoo!, Microsoft and Ask: please continue the cooperation you’ve show with sitemaps and the noodp meta tag. Please adopt the X-Robots-Tag and unavailable_after indexing directives.

Related article in this site: How to prevent Search Engines from indexing parts of your website: 6 ways.

Similar Posts:

If you haven't already, you might subscribe to my feed by Email, RSS feed and/or follow me on Twitter, which is updated on a more frequent – and more meaningless – basis. Finally, if you're a Sphinn user, Sphinn love is welcome :-). Thanks for visiting!

Share

Originally published July 28th, 2007 Tags: ···


0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment

Warning: Comments are welcome insofar as they add something to the discussion. Anonymous and/or polemical comments without a rational justification of the author's position risk being mercilessly deleted at the sole discretion of the administrator. Yes, life is hard :-).

*
To prove you're a person (not a spam script), type the answer to the math equation shown in the picture. Click on the picture to hear an audio file of the equation.
Click to hear an audio file of the anti-spam equation