Antezeta LogoAntezeta Web Marketing

Reflections on search engine optimization, web analytics and web marketing

Antezeta Web Marketing header image 2

7 sources of link intelligence data and key link analysis considerations

by sean · 6 Comments

Share

It may seem like a cliché but on the web no website is an island. Any site worth its salt will have accumulated inbound links and will most certainly contain outbound links to other resources on the web. Indeed, one can easily say that without links to interconnect websites, there wouldn’t be a worldwide web.

For search engines, such as Google, incoming links provide a strong signal as to the authority of a website. If multiple websites link to a specific website for a given topic, there is a good chance the website cited by others is deemed to be highly relevant for a good reason. Google and other search engines identify the theme of a website page by analyzing a page’s content and the text of the incoming links – the underlined text you click on to arrive at a page. Links, especially inbound links, are thus one of the most significant in the over 200 factors Google considers in its ranking algorithms. Inbound links from related sites in a business’ sector are also an excellent source of highly qualified direct traffic.

Where do you find link popularity data?

So if links are so important to a website’s visibility, where can a web marketing professional find competitive link data? The general approach is to make a list of sites (and their pages) already ranking well for keywords and phrases of interest. The list will most likely represent a business’ competitors. Once the list of URLs has been complied, it is time to find out who is linking to these URLs. There are multiple link analysis resources on the web. Later in this document we will examine different sources of link data, considering their relative strengths and weaknesses.

How to value a link, or why not all links are created equal

Seasoned SEO professionals know that not all links are created equal. Google Engineers often state at conferences and other forums that Google has a data-driven culture. SEO professionals thus need to think like Google and evaluate each link to understand its relative value. Top search engines, like Google, continuously refine their algorithms – Google itself makes 300-400 changes a year. Thus link valuation is not an exact science. Yet there are basic attributes of each link which will contribute to the overall value. Let’s consider the major attributes, starting with one of the most important, the nofollow link attribute. Keep in mind that the actual weighting done by first tier search engines is more complex than what is outlined here – it will take other factors into account, such as the link’s position on the page (top / bottom/ template / content), the words in proximity to a link, and so forth.

The nofollow link saga

Some links may show up in link reporting systems but are discounted by search engines due to use of a Nofollow directive. Nofollow tells search engines that a linking site doesn’t vouch for the quality of the link’s target destination. Ideally a search engine should ignore the link, but actual behavior does seem to vary to a degree from one search engine to another. It is probably safe to say that a nofollowed link is, in most cases, worth less than a regular link.

Nofollow was adopted as a solution to rampant spamming of Blog comments. Many unscrupulous people began to leave fake comments on blog posts as a pretext to insert a link or two to a site or sites they wanted to promote. As time progressed, all pretext was lost and comments became more blatant – sometimes just long lists of links. By automatically adding rel=”nofollow” to a link in blog comments, blog authors ostensibly remove any ulterior motive for someone to drop a link in blog comments, called a link drop in the trade. Nofollow usage has expanded to most sites with User Generated Content (UGC). Wikipedia is probably the best known example.

The nofollow tag (more correctly, attribute) can be specified at the link level by adding rel=”nofollow” to the link syntax, or at the page level via a <meta name=”robots” content=”nofollow” /> tag or through the use of an X-Robots-Tag http header.

How common is the nofollow link attribute?

Here the correct answer is “it depends” on your sample. In a real world data sample I just looked at, of almost 17,000 pages only about 5.5% used nofollow of any type for the links I checked. 15 pages used a meta robots nofollow tag and only one used the X-Robots-Tag http header. Yet if you look at just blog comments, the percentage is going to be high.

Link Anchor Text

A link’s anchor text, the underlined link text a user clicks on, is important in helping search engines confirm the theme or topic of the target page. Text which contains keywords rather than a bland “click here” will be more useful both to end users and search engine algorithms.

This is an ongoing SEO theme – much of Search Engine Optimization activity is often also site usability improvement!

Site and link age: well seasoned or fresh as a whipper snapper?

On the web the age of the linking site is a signal of the stability of the linking site: has it resisted the test of time or is it a fly by night operation set up just the other day? Major search engines like Google will most likely maintain a profile on each website they know about. This profile will contain information from the domain registration, including changes in domain ownership over time. It isn’t by chance that Google is also a domain registrar (see no. 895). This site profile will extend to page and link attributes. Top search engines will note when a page was first found on a site, when a link was added. The easiest way for a SEO professional to get domain information in an automated fashion is to buy it from a company like domaintools.com or through Amazon’s Alexa Web Information Services. Occasional information needs can be satisfied through whois lookups (Linux users have this at the command line) or though various browser toolbars, including Alexa’s, which access domaintools.com or Alexa data.

Linking page language

Implied in the recommendation that incoming links come from thematically related sites and include keywords relevant to the target page is that the links come from pages in the same language as the target page. While there are many occasions where a site will link to a source of information in another language, this is less than ideal – there is no guarantee that the reader will understand the other language. You can rest assured that search engines are good at language recognition and take language into consideration when weighing links. This is just another example of what is good for end users is good for SEO and vice-versa.

A link analysis program has several options for analyzing incoming link language. One signal is the top level domain, e.g. a .fr domain will often host web content in French. Unfortunately, this model, while useful, breaks down very quickly. A Swiss domain, .ch, can easily host content in the legal languages of French, Germany, Italian or Romansh. More sophisticated approaches combine tld analysis with a textual analysis of a web page’s content, looking for patterns that are common to a specific language. This can be done programatically, often by using interfaces to N-gram and similar analysis. Google provides an interface to their language detection facility, not a bad place to start! Yahoo has mentioned that they consider the http-equiv header or meta tag Content-Language, e.g. <meta http-equiv=”Content-Language” content=”en” /> where en is the language code. You can check for this, but I doubt it is very reliable.

Is the page with the link indexed in Search Engines?

Actually, there is probably one particular element which is more important than the nofollow status of a link. Is the page containing a link, or that could contain a link, actually indexed by the search engines you want to target? If you are targeting Yahoo! and you obtain link information from Yahoo!, than barring any recent changes to the linking site, this is a non-issue. Yet just because a page containing an interesting link has been crawled by Yahoo! doesn’t necessarily mean it has been crawled by Google, Microsoft or other search engines.

You could in theory automatically cross reference Yahoo link data with Google’s index but Google is parsimonious when it comes to sharing the data it has collected about our websites. See my rant on Google’s API access policies for details.

Usage of Google Page Rank in evaluating links

Neophytes easily allow themselves to be distracted by the PageRank siren call. Yes, you can look up Google’s PageRank for pages you want to target. You can even do this automatically. But keep in mind that Google’s publicly published PageRank values have some severe limitations. As mentioned earlier, Google and other search engines consider many factors, such as the link anchor text keywords and site age, when weighting a link. These factors are completely independent of a PageRank value. That PageRank is devoid of a thematic context is an enormous flaw in its ability to determine the value of a page to any particular query.

It is also important to note that Google’s publicly published PageRank values are not what Google uses internally. We do know that the Google Toolbar PageRank is only updated every three or four months. We also know that Google will sometimes penalize a site for improperly selling links by lowering the displayed PageRank value. Yet in many cases the site doesn’t actually experience a drop in traffic. The lowered value serves to dissuade the purchase of links from that site.

Google’s Toolbar PageRank is certainly an interesting signal in a SEO tool kit. But it is far from a reliable one and is it is certainly not the all encompassing value many would like to think it is.

By the way, don’t even think about using Alexa’s rank and reach values. Alexa data samples are highly influenced by sample self-selection – particular a preponderance of webmasters installing the Alexa toolbar to influence their site’s rankings or otherwise manipulating Alexa statistics.

Organize Links by Category, e.g. Blogs, Directories, Forums, Articles

Once a list of interesting sites and pages to target for links has been identified, it can be very useful to organized them into categories. Categories help you to understand at a glance the types of issues you are likely to encounter when looking for a link on the site or page in question. Blogs often have nofollow issues. Forums often won’t allow you to specify link text, etc. Other categories which may be useful include collaborative bookmarks, free web space, classified advertising, events, education, government, generalist portals, groups, personal profiles, short URLs…. you get the idea.

Finding “easy” links: sites that mention us without a link

Part of link analysis involves looking for the low hanging fruit, relatively easy opportunities to obtain new links. What better than to look for sites which have cited us but didn’t get around to linking to us? A link analysis program can check a list of pages for citations and note if the citations are links or not. Once citations without links are found, a polite email or phone call to the right person suggesting the citation be turned into a link, in part to help readers of the page, and voilà, a new link is born.

Buying links

If links are important for the visibility of a web site in the greater web ecosystem, it shouldn’t be too much of a surprise that a market in buying and selling links has emerged. There are many link brokers, such as Text Link Ads and Link Lift.

It should be noted that buying and selling links is controversial. Search engines will tell you that it violates their guidelines as they rely on editorially chosen a.k.a. natural linking for their algorithms. They will tell you that paid links should be disclosed as such – to end users by using language such as sponsored ads – and to search engines through the use of nofollow. Indeed, in some jurisdictions, full disclosure may actually be a legal requirement.

While this all sounds great on paper, the real world situation is less clear. The very same search engines which tell you not to buy links to enhance your direct traffic and search engine visibility are also the largest sellers of text link ads. Just think that to this day Google still closes an eye on Ask.com's AdWords arbitrage. When Google Japan was caught in the very act of buying links, it became clear to all that the focus on paid links is perhaps a misplaced and ill advised battle. Yes, the engines are mostly transparent (Yahoo!’s Express Include may be an exception along with Google Japan) but it does give one pause for reflection. Should the web follow the traditional Anglo-Saxon division between commercial and editorial divisions in the press sector? Or should it follow Hollywood’s lead with product placement – advertising can pay the bills, support the development of web content, yet hopefully isn’t too invasive?

In an ideal world, I’d go with the transparency argument. I find product placement annoying. Yet a cynic could easy argue that the press has, for example, underplayed the destructive effect of the automobile on the environment (pollution, wars for oil resources) so as to avoid biting the hand that was feeding them. I think the jury is still out on this one. But you’ve been warned.

Sources for Link intelligence

Any company crawling the web is a potential source of link data, but in reality the size and complexity of the web limits this activity to a very few players, mostly the major search engines.

Google

Google Logo If our goal is to be highly visible in Google for a series of keywords and keyword phrases, it would be nice to know what links Google’s crawler, googlebot, has found. After all if Google hasn’t found a link, it is rather unlikely that the link could contribute to a page’s ranking in Google. Google has a special search operator, link:<URL>, which returns pages linking to the URL provided with link:. As an example, link:www.google.com/analytics would return a list of web pages linking to www.google.com/analytics (it isn’t necessary to specify the http:// part).

There are a few caveats. The returned links will probably include internal links from a site. There is no way to exclude this information; you need to filter it out yourself. The returned links are also just a random subset of all of the links Google knows about. If you’re looking for comprehensive data you’ve come to the wrong place. Google does not want to facilitate the reverse engineering of link information on third party sites.

Google does provide fairly comprehensive link information to website owners for their own sites through the Google Webmaster Tools. A comparison of one’s own information in GWT and the link: operator will provide a sense of what is missing from the public link results.

More information on the limits of Google’s link operator can be found in a recent video by Google Engineer Matt Cutts.

Yahoo! and Yahoo!’s Site Explorer

Yahoo! Site Explorer Logo Yahoo! offers a great tool for competitive link analysis, the Yahoo! Site Explorer. Simply enter a URL to analyze and a few clicks later, Yahoo! provides a list of incoming links. The list can be exported for analysis in a spreadsheet program. Yahoo! also offers an API to facilitate programmatic data retrieval. So far, so good. One significant limitation with the Yahoo Site Explorer is that Yahoo only provides up to 1000 links to any given URL. Since some websites may include a link in a sitewide template, such as a blog’s blogroll, it isn’t uncommon to see the 1000 links coming from just a few domains. This issue can be mitigated somewhat by analyzing inbound links to multiple site pages, but still this isn’t ideal.

As with Google, the link information Yahoo! provides is pretty basic: linking source URL and page title. The web interface includes document format, e.g. HTML, and size, data which is not currently available in the API. To perform link analysis, at a minimum we need to know the link anchor text and if a nofollow directive is in effect via rel=”nofollow”, a meta robots tag or a http header x-robots-tag directive. Unfortunately Yahoo! doesn’t provide this information necessary to evaluate the quality of a link.

In our experience, data returned from the website interface and the API can vary. In particular, some linking pages seem to be duplicated as Yahoo! currently truncates URLs at 192 characters. Some parameter heavy URLs (which should be avoided by the way) will easily exceed this length.

Tattler for Yahoo Site Explorer

Greg Boser’s Tattler is a small windows based tool to facilitate link analysis data collection from Yahoo’s Sitexplorer. You start by specifying one or more URLs. For each returned result, you can, with a click, retrieve its backlink data. The final data set can be saved in spreadsheet format for further analysis.

Majestic SEO

Majestic SEO Logo Several years ago, while updating robot recognition in the Web Analytics tool AWStats, I came across the robot MJ12bot which led me to a search engine project in the UK, Majestic-12. Majestic-12 has developed a sophisticated database of link information in order to gain better insight into ranking algorithms. In February 2008 Majestic began to supply this link intelligence to SEO professionals as way to share with the web community and to further fund development of the Majestic-12 search engine.

Basic site intelligence information is available from the Majestic SEO site or through browser plugins. As with Google’s Webmaster Tools, Majestic SEO provides site owners with free access to extensive data for sites to which they control. Data on competitors is available for a fee – varying from about 50p to ~£600 in the case of link information for a well known site like google.com. Majestic SEO uses a credit system which provides volume discounts.

Data provided and why it counts

Majestic SEO augments basic link information with valuable anchor text and image alt text. Majestic SEO also checks for the presence of nofollow at the link and page meta tag levels. They don’t yet include checking for the X-Robots-Tag http header, although I haven’t yet seen much use of this in the wild nor do I expect much. Majestic also offers a primitive ranking value, ACRank, although it isn’t really clear how useful this is currently.

Update frequency

At the time of this writing, data was just refreshed; Majestic tells me that most of their index was crawled in the last 12 months. Majestic is aiming at archiving a monthly update schedule.

Web coverage

Majestic knows of 539 billion unique URLs and claims to have crawled about 77 billion unique web pages of those URLs, not a bad number. The geographic coverage is wide, Majestic tells me “we crawl pretty much all URLs we can get our hands on“.

SEOmoz LinkScape

SEOmoz LinkScape Logo SEOmoz launched their commercial link intelligence service LinkScape to much fanfare in October 2008.

As I’ve discussed in my guide to keyword selection (in Italian), research tools, despite how glitzy the user interface may be, are only as solid as their underlying data.

Therein lies the problem with SEOmoz’s LinkScape. It isn’t clear where the data is coming from, the geographic and linguistic representation, nor the freshness. SEOmoz lists potential sources, but this list too ambiguous to be meaningful. It is worth noting that access to the listed Ask.com API was shut off several years ago, but when it worked it did not include data from Ask’s foreign language search sources. I imagine SEOMoz’s medium to long-term goal is to be self-sufficient in link data collection (see the whois record for dotnetdotcom.org), but they are apparently not there yet.

Similar to Yahoo!’s Site Explorer, LinkScape limits the number of links it will show to any given URL, although at a more generous 3000. On the Linkscape home page, SEOmoz says it knows of 38 billion URLs (their FAQ says 36+ billion pages, I assume they mean URLS, not all of which have been crawled, this compares to Majestic SEO’s 350 billion URLs of which 77 billion have been crawled). Since many pages on the web may be simply duplicate URLs of the same content or of poor quality, a billion here and a billion there may not be so relevant. What is important is appropriate geographic and linguistic coverage for the markets you’re interested in. While I believe LinkScape is a promising SEO tool, I have the feeling the product, and its marketing, need to mature a bit.

Access to Linkscape is available by paying a monthly or annual fee which includes access to other SEO resources as well.

Alexa

Alexa Logo Amazon’s Alexa division actively crawls the web, graciously providing the data which powers the Internet Archive (thank you Amazon!). Limited link data, based on what is available via a web interface and toolbar implementations, is available free of charge. The web interface and toolbars indicate the number of domains linking to a site – not the overall number of links. Full inbound link lists for a site are available though an API for a nominal cost – $0.15 for 1,000 requests. Similar to domaintools.com, Alexa provides domain registration information where publicly available.

Exalead

Exalead Logo Exalead, an international search engine based in France, supports the link: operator for web based searches. Their search engine serves as a test bed and showcase for Exalead’s Enterprise search technology. I am not aware of a publicly available API. Updated 2009-07-10: Exalead changed their syntax from +link to link.

Gigablast

Gigablast Logo Similar to Exalead, Gigablast has a public search engine which also serves in part to showcase their enterprise search technology. Their link: operator syntax doesn’t seem to work right now, which is too bad as they do offer an XML API, just append &raw=8 to the query (limited to 1,000 queries a day.).

Microsoft MSN Live Search

Live Search Logo Once upon a time Microsoft supported the linkdomain: search operator. They disabled it and then later began offering backlink information solely for verified sites in their Live Search Webmaster Center. By default Microsoft shows both inbound and internal links to any page in your site. Results can be filtered by subdomain and / or directory, useful to exclude internal links. You may query links to a specific page or URL pattern. Up to 1000 results may be exported for further analysis. The data file contains the URL, page title, Microsoft’s “Page score”, region and language codes. The online help discusses tracking results over time but this functionality is not included.

It is too bad that Microsoft doesn’t use this opportunity to win the hearts and minds of the web master community by offering back link data for any site, as Yahoo! does. At least this should be an anti-trust consideration for any purchase of Yahoo! by Microsoft :-) .

Ask.com / Teoma

Ask.com Logo Ask.com has never provided backlink information, although at one point they had a great API for accessing search results. Through Keith Hogan, a VP at Ask, Ask has recently been communicating much more with the webmaster community. Hopefully Ask will demonstrate its support of the webmaster community by reactivating their API and by providing back link data.

Other Major Search Engines

You may find link information at other search engines, but often they are just rebranded versions of Google or Yahoo!.

International Search Engines

Naver (네이버), Korea

Naver is a very innovate portal and search engine in Korea. If I understand correctly, Yahoo! Answers is based on work first done at Naver. That said, I couldn’t find advanced search form for Naver’s search engine. Neither link: nor linkdomain: operators seem to work.

Rambler (Рамблер), Russia

Rambler does offer an advanced search form, but there’s no mention of back link or link search support. Neither link: nor linkdomain: operators seem to work.

Yandex (Яндекс), Russia

Yandex offers an advanced search form but again, there’s no mention of back link or link search support. Neither link: nor linkdomain: operators seem to work.

Baidu (百度), China

Baidu offers advanced search features and an excellent description of advanced search operators but alas, there’s no mention of a link: like operator.

Sogou (搜狗), China

If you noticed a pattern repeating here, you’re in luck. Sogou’s advanced search help does discuss site and filetype operators but there’s no mention of a link: operator.

Link showdown: number of incoming links to anteztea.it

The following chart provides a quick overview of the different link services, including the number of links they show for a specific domain, antezeta.it.

Who Competitor data Verified sites data Syntax Example API Antezeta.it links Notes
Alexa Yes. Via a web interface, or, for a small fee, via an API. Not necessary http://www.alexa.com/site/linksin/antezeta.it Yes 109 Link count is for domains. Free interface just notes one link per domain. Pay as you go web service provides more comprehensive data
Exalead Yes Not necessary http://www.exalead.it/search/results?q=+link:www.antezeta.it No 332  
Google.com Yes but limited to a subset of what Google’s complete link knowledge Free via Google Webmaster Tools http://www.google.com/search?q=link:www.antezeta.it Sort of. Deprecated Soap search api. Json no. Google Web Master Tools API doesn’t (yet) support link data. 76 link: count includes ~35 internal links; GWT shows 19,167 inbound links; You may select links to a specific URL
Majestic SEO yes; basic data is free, detailed data requires payment Free with registration http://www.majesticseo.com/search.php?q=www.antezeta.it Yes 19,370  
Microsoft No Free via Microsoft Webmaster Central NA No 12,100 Only data for one’s own sites is available
SeoMoz LinkScape Yes. Link detail data requires a subscription. Not offered. http://www.seomoz.org/linkscape/intel/basic/?uri=www.antezeta.it Yes 561 links are (72 domains) to home page.
Yahoo Site Explorer Yes. Up to 1000 links for any given Url or domain. Not necessary http://siteexplorer.search.yahoo.com/search?p=http%3A%2F%2Fwww.antezeta.it&bwm=i&bwmo=d&bwmf=s Yes 26,951 You may select links to a specific URL or site wide

Next steps

If you reached this point, it should be clear that plain link data available from the sources listed above becomes much more useful when enriched with additional details – nofollow use, page language, domain age, contact details, etc., If you need support in gathering and enriching link data for your web marketing efforts, maybe we should talk!. You might consider subscribing to this blog’s feed, and, why not, sphinning the article (blue icon below).

Hey mister, can you spare me a link?

Naturally, if this article was useful, a link of your choice would be a nice donation to keep the candles burning… thanks!

Similar Posts:

Was this article useful? If so, spread the word:
  • Sphinn
  • StumbleUpon
  • Reddit
  • Digg
  • FriendFeed
  • Wikio
  • del.icio.us
  • Mixx
  • Google Bookmarks
  • Slashdot
  • Technorati
  • TwitThis
  • Facebook
  • Diigo
  • Netvibes
  • NewsVine
  • HelloTxt
  • Tumblr
  • Yahoo! Bookmarks
  • email
  • Suggest to Techmeme via Twitter
  • Yahoo! Buzz

If you're new here, you might subscribe to my feed by Email, RSS feed and/or follow me on Twitter, which is updated on a more frequent – and more meaningless – basis in English and Italian. Finally, if you're a Sphinn user, Sphinn love is welcome :-). Thanks for visiting!

Share

Originally published April 28th, 2009 Tags: ······························


6 responses so far ↓

  • 1 Foot In Mouth // Apr 28, 2009 at 16:49:41

    Wow, what a monster of an article! Thanks Sean for putting in so much time to put together this article, it had quite a few resources I hadn’t heard of yet.

    I know that you mentioned using citations as link opportunities, but I was curious. Have you seen any examples where a non linked url prompted a search engine spider to crawl the page? I.e. does having your URL listed as a non link give any value?

  • 2 Seo Notes // Apr 30, 2009 at 18:24:51

    Hi Sean,

    this is a good article that shows the relevanz of SEO Tools for Link Data and review links.

  • 3 referenceur // May 4, 2009 at 15:29:03

    I liked your article too much and I would like to translate it to french just to spread and share and I’ll link to you as a source if u permit too.
    thx for the infos and continue.

  • 4 sean // May 4, 2009 at 16:02:16

    @referenceur

    Thank you for your request!

    I would welcome authorized translations of this guide. As I indicated in an email to you, I would like to know where you plan on publishing the article before I grant approval.

    Should anyone else want to translate this in to languages other than English, Italian or German, feel free to contact me for more information.

  • 5 Sam With Traffic Is King // Jun 16, 2009 at 4:16:03

    You have no idea how much clarity I have now on Nofollow tags. There was so much floating around, I’m glad I found this site.

  • 6 Arif Site // Dec 16, 2009 at 12:47:43

    Why is the result of backlinks from each search engine is also different . and about the pagerank is also known get from backlink counts. The question is why is less backlink have more pagerank. Please reply me on my email. Can I get your YM?

Leave a Comment

Warning: Comments are welcome insofar as they add something to the discussion. Anonymous and/or polemical comments without a rational justification of the author's position risk being mercilessly deleted at the sole discretion of the administrator. Yes, life is hard :-).

*
To prove you're a person (not a spam script), type the answer to the math equation shown in the picture. Click on the picture to hear an audio file of the equation.
Click to hear an audio file of the anti-spam equation