Pages

Google Algorithm Tweaking; the canonical tag

Search quality highlights: new monthly series on algorithm changes
http://insidesearch.blogspot.com/2011/12/search-quality-highlights-new-monthly.html briefly; 


Multiple URLs, one page
Duplicate content comes in different forms, but a major scenario is multiple URLs that point to the same page. This can come up for lots of reasons. An ecommerce site might allow various sort orders for a page (by lowest price, highest rated…), the marketing department might wanttracking codes added to URLs for analytics. You could end up with 100 pages, but 10 URLs for each page. Suddenly search engines have to sort  through 1,000 URLs.
This can be a problem for a couple of reasons.
  • Less of the site may get crawled. Search engine crawlers use a limited amount of bandwidth on each site (based on numerous factors). If the crawler only is able to crawl 100 pages of your site in a single visit, you want it to be 100 unique pages, not 10 pages 10 times each.
  • Each page may not get full link credit. If a page has 10 URLs that point to it, then other sites can link to it 10 different ways. One link to each URL dilutes the value  the page could have if all 10 links pointed to a single URL.
Using the new canonical tag
Specify the canonical version using a tag in the head section of the page as follows:
http://www.example.com/product.php?item=swedish-fish"/>
That’s it!
  • You can only use the tag on pages within a single site (subdomains and subfolders are fine).
  • You can use relative or absolute links, but the search engines recommend absolute links.
This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.
  • Links to all URLs will be consolidated to the one specified as canonical.
  • Search engines will consider this URL a “strong hint” as to the one to crawl and index.
Canonical URL best practices
The search engines use this as a hint, not as a directive, (Google calls it a “suggestion that we honor strongly”) but are more likely to use  it if the URLs use best practices, such as:
  • The  content rendered for each URL is very similar or exact
  • The canonical URL is the shortest version
  • The URL uses easy to understand parameter patterns (such as using ? and %)