As long as most marketers can remember, digital marketing has been all about doing more. But these days, success isn’t quite as simple as ratcheting up spend or creating 10 new blog posts. You could even argue that in the midst of rising ad costs and recent Google algorithm updates, more can be a little bit too much — especially when it comes at the cost of quality.
In the modern search landscape, quality is the key to organic success. Quantity, on the other hand, can hurt more than it helps at times. So while a lot of sites are focused on what to publish in the months ahead, some of the biggest opportunities for organic growth are hiding in pages that are live and indexed.
Why? Over time, all that legacy content camping out can lead to a couple of costly problems:
In either case, SEO pruning can help win back traffic lost to content cannibalization and turn previously published URLs into a source of organic growth. Here, we’ll break down different types of cannibalization, how to identify page-pruning SEO opportunities, and best practices to help apply the right solutions.
Let’s say that the pages of a website are dishes on a menu. Some of those menu items are probably a lot better than others, right? There might be some amazing dishes listed, but it’s just not very easy to find them when the menu is filled with lackluster and duplicative options.
In the simplest sense, content pruning SEO is an exercise in identifying the bad dishes and choosing whether to improve them, combine their ingredients to make a better dish, or take them off the menu.
There are different criteria Google uses to gauge the quality of content, and those standards can help to understand when a page is probably falling short for users.
The topic of content cannibalization is one of those fun SEO gray areas. The core idea is that URLs from the same site sometimes compete against one another in organic search because they share a similar topic or focus. The underlying implication is that it’s a bad thing, but whether or not that’s true is a topic of much discussion.
You might ask three different, very intelligent SEOs for their thoughts and get three very different answers. The first might tell you that the solution is generally to consolidate content and make it more comprehensive. The second might advise you to let nature take its course because there’s the possibility of indented results. The third might say that indented results are actually bad.
We say… it depends.
We’ve seen and driven some pretty undeniable results by pruning low-quality content and consolidating URLs into pieces that do a better job of meeting user needs. We’re also firm believers that pruning has the potential to shoot a site in the foot.
Our hope is that you’ll be well-equipped to identify which scenarios lead to which outcomes by the end of this article.
When we say “content cannibalization” is a problem, we’re specifically talking about instances where two or more indexable URLs share keywords and address the same user intent. Because at that point, it’s effectively duplicate content. Sure the words are different, but is either URL providing unique value for the user? No.
There’s no formal penalty for duplicate content, just like there isn’t one for content cannibalization. Still, most SEOs would tell you that duplicate, indexable content on a website is going to hurt performance. Even Google tells large websites, “Eliminate duplicate content to focus crawling on unique content rather than unique URLs.”
Of course there are nuances — it’s SEO! But at the end of the day, the same-keywords-same-intent scenario is really the meat of the argument.
Understanding the SEO benefits of pruning starts with a solid grasp of cannibalization - from an SEO perspective - and why it happens in the first place. There are three distinct forms of cannibalization to call out here, each of which can be detrimental if left unchecked.
Folks who have been in content marketing for a while probably remember short blog posts and landing pages that targeted even just a single long-tail keyword. (Yes, that classic 300-word listicle titled 5 Tips for X.)
For years, the goal of a lot of content programs was to have as many posts and landing pages that covered as many keywords as possible. But in the background, Google wasn’t exactly subtle about its goals to cut down on thin, duplicative, and less-than-useful content — even going as far back as the first Panda release. The art of using keywords in SEO content has evolved in tandem.
Today, Google has a much better understanding of semantics and how terms relate to one another in the context of different topic areas. It realizes that all of those pages with slight keyword differentiation are talking about the same thing. Since only one of the pages can rank #1, the pages are effectively competitors.
Here’s how John Mueller put it in a 2019 Reddit AMA:
“We just rank the content as we get it. If you have a bunch of pages with roughly the same content, it’s going to compete with each other, kinda like a bunch of kids wanting to be first in line, and ultimately someone else slips in ahead of them :). Personally, I prefer fewer, stronger pages over lots of weaker ones—don't water your site's value down.”
— John Mueller, Google
Take a look at the cluster of keywords below, which make up just a small portion of related terms for “dog grooming tips.”
If these target keywords had been split out into different blog posts in the long-tail days, it would probably have gone something like this:
This presents Google with multiple relevant options to surface when a user searches for any of the terms. Why? Because each of the articles clearly caters to the same user intent: seeking informational content to help with grooming a dog at home. Even worse, there’s a good chance all of the URLs have relatively sparse and/or similar on-page content.
If one of the URLs is a great article that ranks toward the middle of SERP one for its target keyword, the other URLs are still there fighting for its spot. They’re going to have at least some negative impact if they’re indexable. Since Google limits the number of results shown from a single domain via host clustering, it might even swap in one of the stragglers instead (note that some exceptions are made to host clustering - specifically for indented results!).
On the flip side, if all of the six pages are low-quality, simply cutting five of them or removing them from the index won’t solve the issue either. The leftover page might not have internal competition, but it wouldn’t be any more helpful or relevant to the user’s search query.
Content consolidation in combination with pruning has the potential to stop internal competition while bettering content. Keywords and information that were once separate join forces to create an improved experience. Meanwhile, link equity that was split across multiple pages is transferred to a single URL.
The consolidated content makes Google’s job easier by cutting down on relevant results from the same domain. Plus, the quality of content is likely to improve with regard to intent and related search terms.
In the dog grooming example above, we can spot that there might be cannibalization just by looking at keywords and article titles. But what truly creates problematic cannibalization isn’t keywords, it’s semantics. That’s because the value and relevance of content with respect to search terms come down to a user’s intent (sometimes also referred to as “search intent” — as in, a user’s intent as they browse their options in search).
For Google to provide a great experience, it has to figure out what users actually mean when they search for different things. In this case, all three of the existing pages assume the same user intent is behind any of the keywords in the cluster. That’s what makes them competitors and calls for content consolidation.
The thing is, that’s not always the case. What if users search for “dog grooming at home” and mean something different? Maybe they’re really looking for a mobile dog groomer.
In that case, this query isn’t synonymous with others in the cluster and the existing content doesn’t have much to offer. Not one of the three URLs enables a user to schedule a grooming appointment.
With this type of misalignment, consolidation isn’t going to help rank for “dog grooming at home.” If Google knows the user isn’t looking for informational content, it may instead surface a more product-focused SERP.
If the site can’t provide the right experience, it probably doesn’t make much sense as a target keyword. If it can, this could be the target keyword for a new page tailored to the correct intent. Whether it should be the target would depend on the cluster of related terms that share the same intent.
Would targeting “dog grooming at home” with a new service-focused page call for removing the keyword from the existing articles? Not inherently. The new page still wouldn’t be in competition with the others. Google will choose which type of experience best matches a user’s intent and surface it.
There are even some keywords where Google is unsure what users mean or has determined they mean more than one thing. “Dog grooming at home” is actually one of them! On the SERP, organic results are a mix of informational and product-focused content.
Mixed SERPs like this are interesting scenarios where having pages for different user intents could lead to indented results on page one that don’t inherently compete for clicks. As long as keeping the pages live doesn’t have a negative business impact, the site can cater to both.
When everything Google can index from a site is great, Google is more likely to assume the site is also pretty great. Content pruning can come in handy when it comes to keeping content quality high.
We can look at it through the menu analogy. Getting a lackluster dish once or twice at a typically tasty restaurant is pretty forgivable. Getting bad dishes off the menu many times is enough to make someone question the restaurant altogether.
A lot of low-quality or “thin” content on a site is like having a lot of bad dishes on a menu. It leads to what you might call “authority cannibalization.”
We want to keep Google’s attention on the good stuff and away from sparse, duplicative, or low-value pages. The best way to do that is to make sure they don’t exist in the first place — or, that they exist with the proper signals to keep Google from crawling them.
Content pruning takes away these lackluster options and curates the pages Google should use to assess the site. Plus, users won’t be able to land on bad experiences that might hurt brand perception and conversion.
That takes more than just cutting, which is why “content pruning” as a term can be a little misleading. Beyond just getting rid of content or consolidating it with other pages, a site needs to communicate what Google should and shouldn’t consider part of the index.
“So it might be that we found a part of your website where we say we’re not so sure about the quality of this part of the website because there’s some really good stuff here. But there’s also some really shady or iffy stuff here as well… and we don’t know like how we should treat things overall. That might be the case.”
— John Mueller, Google Search Advocate
For Google to index a new page published on a site or register changes to existing URLs, it needs to crawl them. Crawl budget is effectively the amount of time and resources Google is willing to spend crawling the same domain in a given period. A larger site might not have enough crawl budget for Google to crawl everything.
“The web is a nearly infinite space, exceeding Google's ability to explore and index every available URL. As a result, there are limits to how much time Googlebot can spend crawling any single site. The amount of time and resources that Google devotes to crawling a site is commonly called the site's crawl budget. Note that not everything crawled on your site will necessarily be indexed; each page must be evaluated, consolidated, and assessed to determine whether it will be indexed after it has been crawled.”
It’s important to use crawl budget efficiently by giving Google guidance as to what it should crawl, and just as importantly, what it should not crawl. Basics like a clean, up-to-date sitemap (crawl) and thoughtful use of robots.txt (don’t crawl) will help keep Google’s attention on high-value and high-quality pages.
Most domains don’t have exclusions specific to content and/or landing pages, since they’re often created for organic search. If the site has a lot of low-quality, indexable content, then a portion of crawl budget is still going to pages that may hurt authority.
The promise of organic traffic might drive the decision to get out those digital shears, but the benefits of content pruning extend to a site’s users and internal team. Which makes a lot of sense because a good user experience usually makes for good SEO.
SEO pruning doesn’t just set the bar higher for the content Google sees, it creates a better experience for every user who visits the site. When every page provides value, it makes users want to engage with the website, product, and brand as a whole.
Internally, there’s also a big, unspoken benefit for content teams. When future content has to live up to or exceed the quality of what’s already published, it will continue to improve as existing content improves.
"Content pruning is like trimming the fat off of your website's content. You get rid of the stuff that could be better and focus on making the remaining pages top-notch. Doing this makes a massive difference in how users engage with your site, product, or brand. Here's a cool thing that often gets overlooked: when your existing content is the best it can be, it raises the bar for any new content you create in the future. This creates a positive feedback loop where improving existing content drives future content improvement."
— Melissa Popp, Content Strategy Director at Rickety Roo
Okay, let’s go back to our restaurant analogy and order a salad. If every ingredient in the salad showed up on a separate plate, it probably wouldn’t be the easiest or most enjoyable thing to eat.
When very closely related topics are broken out into individual pages with sparse information, it’s essentially serving a user’s salad on multiple plates. It’s a much better dining experience to serve a salad in a single bowl, and the same can be said for information.
The way a site or page groups, presents, or links information is known as information architecture (IA) — and it goes all the way down to the information on an individual page. Content consolidation leads to improved IA by nature because it brings ingredients into the same bowl. When done right, it layers them in the tastiest way and makes the menu a little easier to navigate.
“Regardless of how much great information there is on a site, how a business organizes content determines whether or not it's actually valuable to user discovery, engagement, and comprehension. When we talk about IA, we tend to talk about it on a holistic level as a self-service mapping for users and a framework to learn how the user engages with the site, content, and product. But we can zoom in and apply the same thinking we apply to the site to a given URL. How are we organizing information to best help users find what they need? How can what we know about the user’s goals serve as the scaffolding for the page contents?”
— Cole Audett, Senior Product Designer at Creative Market
"An ideally organized website (articles and webpages) benefits users and website owners. Users appreciate getting all the information about a topic on a single website and linked correctly with other pages, rather than finding each aspect of the topic separately. This can help in pushing them down the conversion funnel, so having the right personas in place is important. One can focus on the UX of the website to make the architecture clearer, such as adding a summary of the topics at the top of an article or implementing the right markup schemas to help generate sitelinks. Moreover, better IA will make website management easier and reduce cannibalization, which will further prevent competition between the same website pages."
— Navneet Kaur, Freelance Technical SEO Consultant
Most email, social, and paid marketing campaigns use landing and content pages. Understandably, channel-focused marketers don’t always know which content is high quality, where it fits into the funnel, etc. So they do what performance marketers do best — test and experiment.
Unless there’s a process where the content team curates and recommends which URLs to prioritize, there are likely some in use across channels that aren’t the best reflection of the brand or very likely to move a user down the funnel.
Better content and landing pages have a downstream effect, because distribution-focused marketing channels rely on landing pages. Sure it’s simple, but it can go by the wayside sometimes. When the content that’s getting dished out is stronger, the ad, email, or social post is more likely to perform well.
SEO pruning and content consolidation eliminate a lot of the low-value, low-appeal, and low-conversion pages that can make their way into other marketing channels and eat up marketing budgets.
“In an ideal world, all of the content we link to, pull from, or promote in channels is valuable for users. So by nature, cleaning up a content library has similar benefits for virtually any channel as it does for the site as a whole. The goal is to remove or improve content that doesn’t provide a great experience, which makes for a stronger pool of landing pages. Other channels can only promote high-quality pages, increasing the likelihood of clicks, leads, conversion, and positive brand awareness across campaigns. Without periodic pruning, there’s potential for budget and impressions to go toward pages that probably shouldn’t be on the site, only to find out they’re falling short.”
— Alberto Cañas, VP of Marketing at Physician’s Choice
To prune or not to prune? Let’s lend some clarity to the process with a quick guide to pruning SEO best practices. Think: data to pull, tools to use, and solutions to apply.
When should you prune content on a website? It’s usually a good time for page pruning when investigating why organic traffic is down or if there’s a larger SEO content audit already in motion. Otherwise, a cadence of once a year will generally suffice.
Why? Pruning any more frequently than every six months risks making further changes before the impact of the previous pruning cycle has an opportunity to fully materialize. Those results are invaluable to fine-tuning the next round.
Plus, seasonality is a real thing! Looking at a smaller window of time like three or six months runs the risk of cutting out seasonal traffic peaks. This is particularly important when pruning eCommerce SEO content, because a lot of direct-to-consumer brands create seasonal content to capitalize on shopping peaks. (7 Festive Ideas for Holiday Decor, Mother’s Day Gift Ideas for the Mom Who Has Everything, etc.)
A 12-month window ensures that seasonal clicks aren’t overlooked. SEOs just need to take caution and identify newly published URLs (first three months live), since clicks and backlinks will look deflated in the fledgling months after the page gets indexed.
It’s possible to spot potential cannibalization on the SERP and confirm it using a number of SEO tools that help identify SEO content cannibalization.
If you know exactly what you’re looking for, Google Search Console can be of help. For example, if two underperforming URLs from a content audit seem too close for comfort, you can filter down to the exact URLs and compare the queries from which they drive traffic (Performance > Search Results > Queries).
When this yields multiple overlapping keywords, poor positionality, and the URLs seem to share the same intent, then there’s a good chance there’s a cannibalization issue.
Identifying potential issues at scale is much easier with other tools. For example, in Ahrefs you can run and export a report of all of the keywords where multiple URLs from a domain are ranking.
Not every instance of cannibalization is a reason for action. Making the right decision is about more than simply identifying cannibalization. It’s also about determining the impact on the business, which outweighs any impact specific to SEO.
Sometimes, the right solution is doing nothing — and that’s ok. That’s where having the right metrics for solid decision-making comes into play.
One of the biggest problems that pops up in pruning is the tendency of SEOs or content managers to take a definition of “value” that is too narrowly focused on organic search. That’s why looking at the right metrics is crucial to making sure that pages worth keeping don’t get cut:
For any of these metrics, SEOs need to determine what quantifies the value of any URL with respect to other URLs. That baseline gets set by looking at performance across the board and determining the outliers at the bottom.
Just like an overgrown backyard needs different tools to clean up, content pruning calls for a varied toolkit of solutions.
When to choose this option: different URLs compete for similar terms with the same intent and consolidating content would create fewer, more relevant pages.
We’ve covered that URLs are good candidates for consolidating content when:
By taking relevant, quality information from the other pages and thoughtfully integrating it into the strongest page (or the one that’s most important to the business), the content improves for this cluster of related terms. 301 redirects to the remaining page will ensure link equity is passed on from backlinks or internal links pointing at sunset pages.
This is a good opportunity to call out that consolidation doesn’t inherently mean making content longer — nor does making content longer mean it’s automatically better. Consolidation should bring the most relevant content together to create the best possible experience based on the user’s intent.
When to choose this option: page is no longer needed and does not have on-page content that would help create a better experience through consolidation.
Sometimes, there’s really no reason to keep a page or its content. Because even if the page has backlinks or internal links pointing to it, PageRank is passed on via a redirect. If you’ve heard the SEO lore that ~15% is lost with a redirect, Google changed that in 2016.
Still, SEOs should exercise caution if a page has a significant amount of linking domains or high-DA backlinks from authoritative sites (DA = 80+). Even if PageRank isn’t lost, there’s still the issue of the time it takes for the process to see itself out.
Examples of pages that might call for removal without consolidation include:
When to choose this option: page should remain live and linkable, but has little SEO value or needs to remain private.
It’s possible to use the meta robots tag to serve a “noindex,nofollow” or “noindex,follow” status, which will tell Google to keep a page out of the index. The difference is that the latter allows Google to follow any links on the page. Although, it will eventually end up treating it as a “nofollow” over time.
Is a noindex enough for content pruning? If the page needs to stay live and isn’t likely to pass PageRank, it’s a good solution.
You could consider controlling indexation with the meta robots tag for pages like:
When to choose this option: consolidate PageRank from duplicative, business-critical pages that need to stay live, but would compete in organic search.
Canonicalization isn’t necessarily the ideal pruning tool to keep pages from getting crawled, but it can help in cases of keyword cannibalization. This is especially true when consolidation doesn’t make sense or pages need to be kept live.
That’s because canonicalization doesn’t keep Google from crawling a page. The canonical tag helps Google identify which page to prioritize when determining what to index. It’s a recommendation, not a rule. If canonicalization is used the wrong way, then there’s a chance Google will ignore the recommendation.
If Google takes the recommendation, then it will generally index the canonical URLs instead of the original URLs. This means it’s likely Google won’t crawl the original URLs as frequently. However, it still checks in periodically to understand whether the recommendation has changed. It may also choose to surface a non-canonical page when it’s a better answer for the user’s query.
The nice thing is that canonicalization passes PageRank. So if cannibalizing pages need to stay live, it’s still possible to keep them out of the index and lend their signals of quality to a preferred URL.
The types of pages where it might make sense to set a different canonical URL include:
Content pruning doesn’t happen in a vacuum. URLs that are candidates for pruning might be a part of paid ad feeds, email campaigns, partnership requirements, etc. It’s always good to share the URLs with the marketing team in advance and give the product team visibility, depending on your team structure.
Since SEOs aren’t working with UTM parameters on a regular basis, it’s also worth noting that redirects will end up stripping UTMs, unless they’re coded to pass them through. For paid marketers, that can result in underreporting which makes it look like campaigns are underperforming.
Remember these final cleanup items to ensure your content pruning doesn’t actually cause SEO problems. We’ve linked to some resources that can help!
Content pruning is a great tool, but the best way to cut down on cutting down is by creating user-friendly, SEO-optimized content in the first place. In a lot of cases, that means creating fewer pieces of content with greater relevance and quality. For a lot of marketers, that philosophy might feel like a bit of a paradigm shift.
At the end of the day, it’s driven by the same goal that’s always informed successful marketing: to understand what the user needs and design a great experience around it. That’s what content pruning is all about — and so are we.