| Duplicate Content |
| Sunday, 14 January 2007 | |||
|
– One more time Why do engines care? - In order to make a search more relevant to a user, search engines use a filter that removes the duplicate content pages from the search results, Another is that they don’t want to spend the resources in indexing pages that are substantially similar. That said, there still seems to be some confusion out in the SEO world over ‘duplicate content’ and how search engines treat and deal with them. Right away I would like to say - RELAX -. If you are doing sneaky things like filling up a site with dodgy content that YOU KNOW is duplicate, then worry. Most people that may have duplicate content issues are honest web site owners and aren’t at risk of any penalization. Is it really a Penalty? That is the crux of the main misconceptions. Generally speaking it is a ‘duplicate content filter’ not a ‘penalty’ per se, as many in the SEO world seem to agonize over. At various points in the indexation and retrieval process, various documents (web pages) are scored and ultimately removed from the results for a given query.
When a search engine robot crawls the web it reads the pages and stores the information within its database. At various stages of the indexing and retrieval process, it checks the document against the existing index(es) for potential duplication issues. It is scored on a variety of factors including descriptions, authority, document age, content structuring (phrase scoring) and more. For example when Google uses Phrasing to determine duplications one method is outlined below;
When the user (searcher), queries the index it then attempts to further filter out any possible duplication and serve up the document it feels is the best resource/authority for the submitted query.
Types of Dupes and what to do; There are a variety of ways the average website can run into duplicate content filtering problems without even knowing it. Here are some common ones; Websites with Identical Pages – Sometimes a company/individual will try to actually compete with themselves by creating other versions of their sites on a different domain name. Not a good idea. Affiliate sites with the same look and feel which contain identical content, are certainly not a good idea either. Regardless if it’s one site o many, create unique content throughout. Scraped Content – this is content directly taken ‘verbatim’ from another site. This is obviously not a good idea. Distribution of Articles – Do not publish articles you are using for distribution on your site. Some people will say to let the SEs index it first and then distribute it – this will not work. If you have 2 articles, put one on your site and the other into circulation. Home Page URLs.- Having multiple home page naming conventions and Back Links to those multiple root domains. The best way to tackle these is via 301 redirects. Here are some examples of what I mean; http://www.example.com Print-friendly pages; -. believe it or not, our little bot friends can follow the links to the printer friendly page and consider it duplicate content. For this be sure to use the robots.txt to forbid them from said pages.
I hope this has at very least given you a better idea about what all the fuss is over duplicate content and learned a few ways to avoid it. A fun tool for checking a site/document for duplicate content is www.copyscape.com.. Give it a whirl if you are concerned about content you may put on your site. Also see read up on Google and Duplicate content - guidelines right from the Plex Related Google Patent applications; Detecting Duplicate and near duplicate files - Detecting Query specific duplicate documents and the more recent Methods and apparatus for estimating similarity
Need help ranking? Get in touch today for our SEO services
|
|||