Search:

Home | Internet

How Is Duplicate Content Defined And Does It Matter?

By: Donald Saunders

The argument over just how duplicate content is defined and whether or not duplicate content really matters continues to rage and there is no sign that it is going to go away. So just how do you define duplicate content and should we be concerned about it?

It is widely felt that duplicate content is important and, though one well known and highly respected search engine optimization expert recently wrote an article opposing this view, even a quick peek at the huge mass of material which has been published on this subject recently will clearly demonstrate that this is very much a minority opinion.

However, if we agree that duplicate content is important, then how should we define duplicate content? For example, if I compose an original article for an article directory and then re-write that same article for submission to a second article directory how are the search engines going to check my two articles and decide whether or not they contain duplicate content? The simple truth is that we do not know, however, here is this writer's opinion.

When duplicate content checking was initially introduced by the search engines it was a simple case of comparing one web page as a whole with another and no attempt was made to start dissecting the two pages and comparing individual page elements. Back then it was possible to use identical content and merely add an introductory and concluding paragraph to one of the two pages to escape the attention of the duplicate content filters. Unfortunately for many those days are now a distant memory.

The search engines now dissect the two pages and examine individual elements and here is the core of the present discussion. It is generally believed that attention is now largely restricted to the central content of a page rather than the structure of the page. Many webmasters use templates when creating their pages which set the structure of each page including things like headers, footers and navigation bars. This is widely considered to be accepted and the search engines do not see this as duplicate content. What the search engines are examining is the informational content contained within the body of the page. But just how do they examine this page content?

Some people believe that this checking is carried out at 'block' level (that is to say at the level of individual sentences or paragraphs), while other people believe that filters search for phrases or even for individual words. None of us really knows the answer although it might seem reasonable to conclude that the most likely basis for comparison would be to use either sentence or phrase matching.

Sentence matching is relatively clear-cut and merely involves cutting both pages up into chunks based upon the punctuation on the page. For example, take a look at this sentence:

It is reasonably simple to find a good deal on a cell phone, providing you know where to shop.

This could either be classed as a single sentence or as two sentences, depending upon whether or not you use the traditional definition of a full-stop as indicating the end of a sentence or adopt a flexible approach and make use of other punctuation marks, like commas.

Phrase matching is a little more complex. What constitutes a phrase? Should a phrase be 2 or 3 or 4 or 20 words?

Just for now let's assume that we are going to define a phrase as 3 words. If this is the case then the following phrases would all be viewed as duplicate content if they appeared on two pages which were being checked:

In the end
The answer is
In those days
Take a look
Did you know

These five phrases are all typical everyday phrases that could appear on pages about building a greenhouse, learning to swim, making money online or anything else you care to mention. Now there are some people who contend that the search engines do examine pages down to this level. Indeed, when I asked the staff of one popular duplicate checker (Dupecop) about how their system examined duplicate content they replied saying:

"DupeCop compares both individual words and 3-word phrases. It also ignores all punctuation and scans across sentences"

I was not surprised therefore that when I ran several articles through this system (comparing articles on the subject of gun dogs against articles about Christmas dec�r) I discovered that they showed an average of 25% duplicate content!

Against this background, I think it would be ridiculous to believe that the search engines would filter down this far. But how low would the filters be set? At 4, 5, 6 words? Well, your guess would be as good as mine.

Over the years I have written hundreds of articles and closely monitored the results in terms of duplicate content penalties, as far as it is possible for any of us to do this. On the basis of my own experience I am confident that filtering is not conducted down to the level of short phrases but almost certainly stops at the sentence level. As a consequence, as long as you are changing content down to sentence level, you ought not to have a problem in escaping the filters. As a matter of fact, even if a couple of sentences are duplicated you should be fine.

Article Source: http://00articles.com

WebMarketingCentre.com provides information on article writing and article submission and is also an article directory where you can pick up free articles for your website or ezine and to which you can submit articles on a wide variety of topics including webmarketingcentre.com/Category/Article-Writing/307">article writing and much more.


Please Rate this Article

 

Not yet Rated

Click the XML Icon Above to Receive Internet Articles Via RSS!




 

Powered by Article Dashboard