Mobile Hamburger Menu Mobile Close Menu

SEO Basics: What is Duplicate Content?

If you’ve worked in SEO for any amount of time, you’ve probably come across the term “duplicate content”. And although it may cause many content professionals to gasp and clutch their pearls, it isn’t always considered black hat SEO. But, before you march ahead and start posting duplicate content all over your site, you need to know when duplicate content is OK and when it should be avoided.

Jump to:

What is duplicate content?

In terms of search engine optimization (SEO), Google considers duplicate content to be, “…substantive blocks of content within or across domains that either completely match other content or are appreciably similar.”

Essentially, duplicate content is when you:

  • Plagiarize content that has been published by someone else (we’re talking entire blogs posts or articles, not quotes from resources that you link to)
  • Publish the same blocks of content (word for word) on different landing pages throughout your own website
  • Repost content that has already been published with very few changes on a new URL
  • Purposely and deceptively publish content that does not belong to you

Duplicate content generally only counts when the content is an exact copy of an existing page or the content is a copy of another page with very few (and very minor) changes made to the writing.

Is duplicate content bad for SEO?

Many SEO newbies would wholeheartedly exclaim, “Yes!”, but that’s not completely true.

Duplicate content can be harmful to SEO, but only when it’s not properly labeled. It can actually be useful for product landing pages and user resources like how-to guides and articles.

When websites are penalized for duplicate content issues, it’s most often because they’re intentionally scraping and republishing tons of other people’s content. It’s not the websites that have a couple of similar landing pages for their products.

But, even if you aren’t maliciously using duplicate content, you should still tell Google how you would like it to be treated so that it can understand how a page fits into your website. You can do this through your HTML using canonical tags.

What are canonical tags?

If it’s impossible for you to avoid duplicate content for one reason or another, all hope is not lost. You just have to make sure that you label your content properly to avoid potential penalties.

This is where the almighty canonical tag (rel=canonical in your HTML) comes in. This tag tells search engines which page of content is a master page.

When you create duplicate pages on your site (either accidentally or on purpose), you can add a canonical tag to those and direct the tag to the URL of your master page.

Let’s use my site as an example. In the HTML of my homepage, you can find my canonical tag:

Because my homepage is the only version of that URL that I have, my canonical tag is self-referential, meaning it canonicalizes to itself. In other words, https://bfostercreative.com/ canonicalizes to https://bfostercreative.com/.

If I were to create another version of my homepage based on the sections I have, I might create something like https://bfostercreative.com/skills to go straight to my skills section.

While the URL would be different, the page content would just be a duplicate of my homepage. In order to avoid duplicate content issues, I would canonicalize that URL to my master page (https://bfostercreative.com/).

That way, Google would know that I purposely created two identical pages with two separate URLs, but that one is my master (or main) page.

Basically, canonical tags help you to tell Google that you aren’t trying to beat the system by getting identical pages to rank multiple times.

How does duplicate content hurt SEO?

Duplicate content has a lot of the same downsides as keyword cannibalization and keyword stuffing.

When duplicate content is harmful it can:

  • Dilute your organic traffic by causing too many pages to show in search results without targeting benefits to one URL
  • Stunt SEO growth by spreading out clicks and rank across multiple URLs
  • Reduce the benefits of backlinks and internal links
  • Make you look like an unoriginal content thief

Plus, for larger websites with 100+ pages, duplicate content can make it harder for Google to crawl and index your site properly as well as max out your crawl budget (the number of pages Google will crawl on your website each day).

Any time you create a new URL (even if it’s an addition to an existing URL), Google considers that to be a new page. Every new page is crawled and indexed separately. If you have a bunch of pages with the same content and Google has to crawl and index all of them, you could miss out on having your new content crawled and indexed, losing out on organic search benefits completely.  

When is it OK to have duplicate content?

Duplicate content isn’t always bad. While you should always canonicalize, duplicate content can be useful when:

You’re using the exact same content, but for two different URLs

This could be because you have an ad campaign running and you want to track it through the URL, or you have a URL that includes a language subfolder (like en-us) and one that doesn’t.

You’re using the same content for two URLs but it’s slightly different

Duplicate content only counts if the content has very few changes. This could mean that you change currency, or have different URLs based on different names or spellings for the same product in different countries (like yogurt, yogourt, yoghourt, or yoghurt).

But, keep in mind that if you want those different spellings to rank as different keywords, you should just write new content and have it be crawled and indexed separately.

You’re running ads on social media and the URLs reflect the campaign

Facebook, Twitter, and Instagram can all add tracking details to the end of a URL. These new versions of your URL are technically considered new pages, which you can’t help. But you can canonicalize them to your master page to avoid issues.

Pro tip: Make a plan for how to handle duplicate content that comes up in your SEO sales funnel so that it’s always canonicalized properly.

What happens when people copy my content?

Google is smarter than most people think. It can tell when a piece of content was published, so it knows when someone publishes a fresh, original piece of content versus when someone steals it. You only really need to be concerned about duplicate content that you create, whether it’s on purpose or not.

Controlling what other people scrape and publish from your content is almost impossible, and it usually does more harm to them than to you. Websites that exist only to copy and paste content that others have written don’t rank well and they don’t last for long, so in reality, it’s best to just let Google handle them unless they’re a legitimate publishing website or another writer who has plagiarized your piece.

Unfortunately, a lot of amateur bloggers copy and repost content from other writers without knowing just how bad it can be for their own site, even if they credit the work back to the original post. Most of the time, simply reaching out and that they either take it down or only post an excerpt from the original content works like a charm.

To duplicate content or not to duplicate content

If you’re duplicating content for SEO purposes, that’s a good indication that you should just create fresh, original content instead. You’ll always rank better with original, SEO-friendly content that has been written for a specific audience and purpose than you will from copy/pasting your existing pages.

But, if you’re creating a copy of another page for an ad campaign or contest, then duplicating an existing page could be right for you.

If you’re having a hard time figuring out how to handle duplicate content, it might be time to bring in an expert.