Understanding Google’s duplicate content policy.
The words “duplicate content penalty” strike fear in the hearts of marketers. People with no SEO experience use this phrase all the time. Most have never read Google’s guidelines on duplicate content. They just somehow assume that if something appears twice online, asteroids and locusts must be close behind. Duplicate content does cause issues so it is important to understand some of the finer detail to think of it in context. .
Three Myths about Duplicate Content
1: All Duplicate Content is harmful
There are 2 types of duplicate content to be aware of . How this is marked up makes the difference. See below for the exact classification of this from Google.
2: Scrapers Will Hurt Your Site
Do not worry about these so much. Be more careful to watch the links that may backlink to your website from unknown sources. Only disavow when absolutely necessary and the protocol for this has been followed correctly.
3: Republishing Your Guest Posts on Your Own Site Will Hurt Your Site
As a general rule, I prefer that the content on my own site be strictly original. But this comes from a desire to add value, not from the fear of a penalty.
Knowing the Rules of Duplicate Content is Important
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.
Non-Malicious Duplicate Content Explained
Examples of non-malicious duplicate content could include:
- Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
- Store items shown or linked via multiple distinct URLs
- Printer-only versions of web pages
How to Avoiding Duplicate Content Issues Correctly
If your site contains multiple pages with largely identical content, there are a number of ways you can indicate your preferred URL to Google. (This is called “canonicalization”.) More information about canonicalization.
However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic. Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results.
Google tries hard to index and show pages with distinct information. This filtering means, for instance, that if your site has a “regular” and “printer” version of each article, and neither of these is blocked with a noindex meta tag, we’ll choose one of them to list. In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.
There are some steps you can take to proactively address duplicate content issues, and ensure that visitors see the content you want them to.
- Use 301s: If you’ve restructured your site, use 301 redirects (“RedirectPermanent”) in your .htaccess file to smartly redirect users, Googlebot, and other spiders. (In Apache, you can do this with an .htaccess file; in IIS, you can do this through the administrative console.)
- Be consistent: Try to keep your internal linking consistent. For example, don’t link to
- Use top-level domains: To help us serve the most appropriate version of a document, use top-level domains whenever possible to handle country-specific content. We’re more likely to know that
http://www.example.decontains Germany-focused content, for instance, than
- Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you’d prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. You can also ask those who use your syndicated material to use the noindex meta tag to prevent search engines from indexing their version of the content.
- Use Search Console to tell us how you prefer your site to be indexed: You can tell Google your preferred domain (for example,
- Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details. In addition, you can use the Parameter Handling tool to specify how you would like Google to treat URL parameters.
- Avoid publishing stubs: Users don’t like seeing “empty” pages, so avoid placeholders where possible. For example, don’t publish pages for which you don’t yet have real content. If you do create placeholder pages, use the noindex meta tag to block these pages from being indexed.
Keep calm and carry on marketing….
Googlebot visits most sites every day. If it finds a copied version of something a week later on another site, it knows where the original appeared. Googlebot doesn’t get angry and penalize. It moves on. That’s pretty much all you need to know.
Remember, Google has 2,000 math PhDs on staff. They build self-driving cars and computerized glasses. They are really, really good. Do you think they’ll ding a domain because they found a page of unoriginal text?
A huge percentage of the internet is duplicate content. Google knows this. They’ve been separating originals from copies since 1997, long before the phrase “duplicate content” became a buzzword in 2005.
Curated from3 Myths About Duplicate Content & Google Guidelines for duplicate content.