Removing content from Google can be a tricky process. There are important questions to be answered, different use cases, and a variety of methods to resolve the issue (which vary by content type, scale, speed requirements, etc. Every step you take means new decisions to make!
You may also be seeing different advice from different sources (or based on different use cases.) Fortunately, we are here to help you to provide a thorough explanation - what to do in the most common specific situations, whether you are trying to deindex a page, an image, or an entire site!
First and foremost, let’s review the common use cases for wanting to remove “stuff” - content, images, web pages, or PDFs - and what tools (or combinations of tools) are useful in those scenarios.
Generally speaking, the speed of the solution is paramount.
Here’s one where you can’t (or won’t) move or delete the content but you need it off Google quickly. What do you do?
Sometimes you want to deindex a PDF or page because it’s duplicate content. In this case:
If the issue is just that you have lots of low-quality pages indexed, you’d want a fast solution that doesn’t take too long to implement. This commonly occurs on URL variations (e.g. URL parameters), and /cart and /account pages (that already require a password to access.) In this situation:
You can also utilize the Meta Robots or X-Robots tags (instead of the Robots.txt file), if you wish. It’s simply a tradeoff: less monitoring, but more potential for crawl budget issues. There is no perfect answer - just the best, most reasonable one for you!
Next, let's review the tools of the trade! Keep and mind that some of these tools can and should be used together. We’ll break down those appropriate combinations in the deindexation use cases outlined below.
But first: 1 specific combo should NOT be used: don’t use the Robots.txt Disallow command in conjunction with any page-level tool (e.g. x-header, meta robots, or canonical tags.) This tool controls crawling NOT indexing. And if Google can’t crawl the page, they can’t read your noindex or canonical tags in order to abide by them. Learn more about this, and other common search engine crawling & indexing issues, in this in-depth guide.
Sometimes the obvious solution is the right one. If you need something to go away - ASAP - you can just delete it!
This tool is available via Google Search Console (GSC), the tool that enable SEOs to quickly identify and fix issues. Use this magical power with care - don’t noindex other, value pages on accident! This is a “temporary” fix - 6 months is the cap. Use it in conjunction with another method so the page/PDF/image in question stays deindexed.
People tend to ignore this tool because of its temporary nature - which I’d argue is a mistake. Other than removing the content entirely (not always an option!), this is the *fastest* way to get something deindexed. So use it, and use it well.
Use this tool for content:
Send an instruction to “noindex” a page via 1 single line of code added to the HEAD section of your HTML. Keep in mind that this is a longer-term (AKA likely not very quick) solution, since you have to wait around for Google to re-crawl the page to *see* this tag. It is, however, very sustainable. If your use case isn’t urgent, this is a great tool, and you could always request that Google recrawl the page sooner (via GSC.)
Meta robots will not help with your crawl budget if that’s a concern (it generally only is for very large websites, when there are a LOT of pages noindex.)
All the same rules as the Meta Robots tag, but sent via an HTTP header response instead of via the HTML. It is much less common, and will likely require development help to implement.
It’s a great tool for deindexing items like PDFs, which don’t have HTML to add a meta tag to.
The Disallow command doesn’t actually control indexation - but for run-of-the-mill pages you don’t want indexed, it may be effective. If there are any internal links to the resource in question - or any external links! - this method will fall short.
That said, it’s helpful for controlling crawl budget (a concern for large websites.)
If you canonicalize a page to a different page, you are telling search engines that it’s a duplicate, and therefore less worthy of indexing. It will often result in noindexing - but this is by no means guaranteed.
*This is primarily useful for duplicate content, where you actively want to pass credit to the “original” source of the content - on or off your website.
So we’ve got several effective tools, fortunately. I’d add, however, that these 2 are - by far - the most used, for a reason. If you just aren’t sure what to do, use these 2 (together, ideally, or the first at minimum.)
“Does speed matter? (Do you need to solve this indexation issue QUICKLY?) The Remove URL tool is your go-to. Just make sure to follow this step up with another solution as well. So your fix “sticks.” ”
Here’s how to actually use these tools to solve these problems (step-by-step).
Log into the Google Search Console and create a “New Request” under the Temporary Removals section. You can find this via the Removals item in the left navigation bar.
The URL Removal Tool enables you to remove a whole site, a page, or a section of your website from SERPs (so tread carefully!)
First navigate to GSC, and the Remove URL tool within it:
Click the big red "New Request" button. Add your URL to the "Enter URL" field and click Next to confirm.
Use the “Remove this URL only” option if you are deindexing one-off pages.
If you want to deindex, say, all pages that live under the /cart folder, add that URL path to the Enter URL box, and change the radio button to the “Remove all URLs with this prefix” option. *Beware this option, as you might accidentally noindex other pages unintentionally. Basically: make sure nothing you want indexed lives in that folder. If that's the case, you are set!
A noindex meta robots tag or x-robots header is a method of communication to tell the search engines to remove the page from their indexes.
<meta name="robots" content="noindex">
The primary difference between the two - for your purposes - is that the former works for pages (URLs) whereas the x‑robots response works for pages AND non-HTML file types like PDFs. X-Robots is less common, and may or may not be available directly in your CM (e.g. you likely have to work with your engineering team to get it working.)
Make sure that your pages are crawlable - since the bots will have to crawl your page to see the tags. You can refer here for detailed information on the crawling and indexing process of search engine bots.
When there are multiple versions of a page - meaning the page copy can be reached via multiple URLs or URL variations - the SEO best practice is to “consolidate” these pages via canonicalizing them to a master URL. (NOTE that this assumes you can’t just delete & redirect the duplicate content.)
When implementing this methodology, remember that the bots may choose to ignore you - especially if the pages aren’t true duplicates. Rather, a canonical is a “recommendation” you give to Google, not a “command” they must follow.
Again - the disallow isn’t *directly* an indexation tool, but it can be effective (especially when scale matters!) when used in conjunction with other methods (specifically the Remove URL tool, and link removal.)
This issue is likely that Google isn't crawling these URLs (or PDFs, etc.) to see the new instructions, and therefore isn't doing the work to get rid of them.
The easiest solution here is to:
This way Google will quickly find each URL, crawl it, see the updated instructions, and know what action to take ASAP.
*Don't forget to remove the file from GSC/your site once the process is complete (so as to reduce the error rate of your sitemap files. Long term, you don't want that to become an issue!)
Think about why this issue occurred in the first place - and look to resolve the cause of the problem, not just the symptom.
Make your SEO processes failsafe so that you reduce the risks of encountering such problems in the future. For example, if you have a bunch of internal links to a page that shouldn’t be indexed, nofollow the links to them. If that page also has external links, reach out to those websites and ask them to remove the links, or disallow them as well. After all, Google is only indexing them because they believe someone wanted them to. So remove Google’s reason to care!)
If you need to monitor indexation over time, here are some helpful tools: