Duplicate Content: Search Engine Policies to Detect It

Duplicate Content

 As we already know, Google tends to call "duplicate content" those blocks with content of considerable size that completely coincide, or in any case, are very similar to others that are in the same domain or in any other website. For the most part, it is not malicious, including those that we will indicate below:

1 - In those Forums that serve for debate that generate both standard or simplified pages for mobile devices.

2 - Those stored elements that are displayed or that appear linked by URLs that are different.

3 - Different versions to print of the web pages.

It should be noted that if your website has multiple pages with largely identical content, different methods exist and are available to tell Google your preferred URL (this practice is called "canonicalization"). You can click here to get more information about it.

Although we have already clarified one of the main points that Google takes into account for its search engine, we must also highlight that in some cases the content is deliberately duplicated in several domains as an attempt to manipulate the positioning of a website in different search engines or to achieve increased traffic. Google recognizes this type of deceptive practices and knows that they harm the experience of users, since they will see the same content repeated in different search results causing the quality of the content to be not good. It is therefore that it is tried that the pages that sample and index have different content.

Through this filtering, for example, if your site has a version marked as "normal" and for printing of each article, and none of these versions is blocked with a noindex meta tag, the engines will choose one of them to include it in their index. For those cases in which the engines detect that duplicate content is being displayed to manipulate a positioning in the search terms and deceive users, the appropriate adjustments will also be applied to the indexing and positioning of the sites that are involved. This practice will result in a clear impact on positioning. It is also very likely that a site will be removed from the Google index so that it no longer appears among the different search results.

Anyway, nothing is lost as it is known that there are some steps the creator can take to address duplicate content issues in advance and ensure that users who visit the website can see the content that the webmaster wants to display.

1 - First we must Use 301 redirects: if your site has been restructured, you must use the 301 redirects ("RedirectPermanent") in the .htaccess file that will be very useful to intelligently redirect both users and Googlebot and other spiders .

2 - Google also tells us to "be consistent": you must ensure that internal links "are consistent". To serve as an example, you should not link to http://www.example.com/pagina/, http://www.example.com/pagina and http://www.example.com/pagina/index .htm.

3 - Use top-level domains: it is extremely important, whenever possible, to use this type of domains to manage content directed to specific countries and thus you will be helping search engines to show the version of the documents most appropriate for each case. For example, Google is more likely to know that http://www.example.de contains content focused on Germany than using http://www.example.com/de or http://de.example.com.

4 - It is important to distribute the content with extreme caution: if the creator is distributing their content on other sites, the engines will always show the version that they consider most appropriate for users in each specific search. We must know that it is useful to ensure that each site that distributes your content (usually Social Networks) includes a link that points to the original entry. It would also be good to advise those who use the distributed material to use the noindex meta tag to prevent search engines from indexing their version of that content.

5 - Reduce the repetition of templates: this is good to use in those cases that we are going to include a long text on copyright at the bottom of each page, it would be more strategic to put a brief summary and a link to a page that contains "more information". Also, you can use a parameter handling tool to specify how you want Google to handle the URL parameters.

6 - It is necessary not to publish incomplete content: users who are conducting a search do not like to find "empty" pages, so it is important to avoid placeholders. As an example we can say that you should not publish pages without content. If this type of publication is necessary for any reason, the noindex meta tag should always be used to prevent these pages from being indexed and thus avoid a penalty.

7 - You should try to know the content management system: webmasters should make sure they know the way the content of their websites is displayed. Blogs, forums, and their related systems often display the same content in different formats, even though they are in different formats.

8 - Reduce similar content: as we mentioned earlier, this is common, but if you have many similar pages, it would be useful to consider expanding each page or grouping them into one.

Google has mentioned in different articles that it does not recommend blocking trackers from accessing duplicate content on a website, regardless of whether you use a robots.txt file or another method. If search engines cannot crawl pages with duplicate content, they will not be able to automatically detect that the URLs direct to the same content and this will bring us a greater problem considering them independent and unique pages. To fix this, a better solution is known to be to allow search engines to crawl these URLs and at the same time mark them as duplicates using the rel = "canonical" link element, which is nothing more and nothing less than the search tool. organizing URL parameters or 301 redirects. In cases where duplicate content causes excessive crawling of website content, you can also adjust the crawl frequency settings directly from Search Console.

Let's clarify that there is duplicate content on a site does not directly mean that action should be taken in this regard, unless it seems clear that the objective of having it included is precisely to deceive and manipulate the search engine results. If your website has problems related to duplicate content and you do not follow the advice indicated above, Google and any other search engine knows very well how to choose the most suitable version to display in the results.

Finally, if a search engine review indicates that deceptive practices have been involved and a website has been removed from search results, it needs to be carefully reviewed.

There are other occasions that the algorithm selects the URL of an external website that is hosting content without permission. If the webmaster believes that another site is duplicating its content and thus infringes copyright law, they should contact the site host to request removal of the content. You can also request that Google remove the offending page from search results by submitting a request based on the United States' Digital Millennium Copyright Act (DMCA).

Post a Comment


Featured News