Removing content from Google can be a tricky process. There are important questions to be answered, different use cases, and a variety of methods to resolve the issue (which vary by content type, scale, speed requirements, etc. Every step you take means new decisions to make!
You may also be seeing different advice from different sources (or based on different use cases.) Fortunately, we are here to help you to provide a thorough explanation - what to do in the most common specific situations, whether you are trying to deindex a page, an image, or an entire site!
Use this magical power with care - don’t noindex other, value pages on accident!
This is a “temporary” fix - 6 months is the cap. Use it in conjunction with another method so the page/PDF/image in question *stays* deindexed.
People tend to ignore this tool because of its temporary nature - which I’d argue is a mistake. Other than removing the content entirely (not always an option!), this is the *fastest* way to get something deindexed. So use it, and use it well.
Send an instruction to “noindex” a page via 1 single line of code added to the HEAD section of your HTML.
Keep in mind that this is a longer-term (AKA likely not very quick) solution, since you have to wait around for Google to re-crawl the page to *see* this tag. It is, however, very sustainable. If your use case isn’t urgent, this is a great tool, and you could always request that Google recrawl the page sooner (via GSC.)
Meta robots will not help with your crawl budget if that’s a concern (it generally only is for very large websites, when there are a LOT of pages noindex.)
The Disallow command doesn’t actually control indexation - but for run-of-the-mill pages you don’t want indexed, it may be effective. If there are any internal links to the resource in question - or any external links! - this method will fall short.
That said, it’s helpful for controlling crawl budget (a concern for large websites.)
If you canonicalize a page to a different page, you are telling search engines that it’s a duplicate, and therefore less worthy of indexing. It will often result in noindexing - but this is by no means guaranteed.
This is primarily useful for duplicate content, where you actively want to pass credit to the “original” source of the content - on or off your website.
So we’ve got several effective tools, fortunately. I’d add, however, that these 2 are - by far - the most used, for a reason. If you just aren’t sure what to do, use these 2 (together, ideally, or the first at minimum.)
Meta Robots Noindex tag
Remove URL tool
“Does speed matter? (Do you need to solve this indexation issue QUICKLY?) The Remove URL tool is your go-to. Just make sure to follow this step up with another solution as well. So your fix “sticks.” ”
Deindex Use Cases
Next, let’s review the common use cases for wanting to remove “stuff” - content, images, web pages, or PDFs - and what tools (or combinations of tools) are useful in those scenarios.
Confidential Information OR Hacked Site Pages
Generally speaking, the speed of the solution is paramount.
Remove the offending content. Delete it, move it, etc.
Use the Remove URL tool (GSC access & permissions) or the Remove Content tool (if you don’t.)
Urgent But Not Confidential
Here’s one where you can’t (or won’t) move or delete the content but you need it off Google quickly. What do you do?
Use the Remove URL tool - either for a specific URL or a specific folder
If it’s a few pages, use the Meta Robots Noindex tag.
Monitor and confirm the fix over the next few days.
If it’s many, many pages (and therefore not worth the manual work), use the Robots.txt Disallow command.
Monitor and confirm the (temporary) fix over the next few days.
Then set yourself a calendar reminder to check on this again in 6(ish) months.
Depending on how many links there are to this content, you may need to keep monitoring - and re-upping - the Remove URL request. Remove links to the asset to help reduce this problem!
Sometimes you want to deindex a PDF or page because it’s duplicate content. In this case:
If it’s just one, or just a few pages, use the meta robots tag (or the X-Robots tag in the case of PDFs)
If it’s many (100s+) pages, a Robots.txt Disallow is likely easier (requires ongoing monitoring.) Here again - be careful to not accidentally block valuable pages!
If you need to help it finalize the process sooner, use the Remove URL tool as well.
LOTS of pages need to be deindexed (Indexation Bloat!)
If the issue is just that you have lots of low-quality pages indexed, you’d want a fast solution that doesn’t take too long to implement. This commonly occurs on URL variations (e.g. URL parameters), and /cart and /account pages (that already require a password to access.) In this situation:
Look for patterns in the URL structure. You can likely noindex anything and everything in the /cart, /account, etc. folders.
Block these folders in the Remove URL folder
Disallow these folders in the Robots.txt file
Monitor the results over time to ensure it doesn’t occur again.
You can also utilize the Meta Robots or X-Robots tags (instead of the Robots.txt file), if you wish. It’s simply a tradeoff: less monitoring, but more potential for crawl budget issues. There is no perfect answer - just the best, most reasonable one for you!
How to Use these Deindexation Tools
Here’s how to actually use these tools to solve these problems (step-by-step).
Remove URL Tool Instructions
Log into the Google Search Console and create a “New Request” under the Temporary Removals section. You can find this via the Removals item in the left navigation bar.
The URL Removal Tool enables you to remove a whole site, a page, or a section of your website from SERPs (so tread carefully!)
Add your URL and click Next to confirm.
Use the “Remove this URL only” if you are deindexing one-off pages. If you want to deindex, say, all pages that live under the /cart folder, add that URL path to the Enter URL box, and change the radio button to the “Remove all URLs with this prefix” option.
Noindex Tag Instructions
A noindex meta robots tag or x-robots header is a method of communication to tell the search engines to remove the page from their indexes.
Example of a meta robots noindex instruction:
<meta name="robots" content="noindex">
Example of x‑robots noindex tag in the header response:
HTTP/1.1 200 OK
The primary difference between the two - for your purposes - is that the former works for pages (URLs) whereas the x‑robots response works for pages AND non-HTML file types like PDFs. X-Robots is less common, and may or may not be available directly in your CM (e.g. you likely have to work with your engineering team to get it working.)
Make sure that your pages are crawlable since the bots will have to crawl your page to see the tags. You can refer here for detailed information on the crawling and indexing process of search engine bots.
When there are multiple versions of a page - meaning the page copy can be reached via multiple URLs or URL variations - the SEO best practice is to “consolidate” these pages via canonicalizing them to a master URL. (NOTE that this assumes you can’t just delete & redirect the duplicate content.)
When implementing this methodology, remember that the bots may choose to ignore you - especially if the pages aren’t true duplicates. Rather, a canonical is a “recommendation” you give to Google, not a “command” they must follow.
Robots.txt Disallow Instructions
Again - the disallow isn’t *directly* an indexation tool, but it can be effective (especially when scale matters!) when used in conjunction with other methods (specifically the Remove URL tool, and link removal.)
Are you doing all the right things, and still not seeing these assets get flushed out of the search index?
This issue is likely that Google isn't crawling these URLs (or PDFs, etc.) to *see* the new instructions, and therefore isn't doing the work to get rid of them.
The easiest solution here is to:
1) Create a static XML sitemap with the old/bad URLs list 2) Upload that list to your website 3) Submit the file to your site's GSC account and monitor results for (typically) ~1-2 months (sometimes as quickly as a few weeks) 4) Remove the file from GSC/your site once the process is complete (so as to reduce the error rate of your sitemap file)
This way Google will quickly find each URL, crawl it, see the updated instructions, and know what action to take ASAP.
***Don't forget to remove the file from GSC/your site once the process is complete (so as to reduce the error rate of your sitemap files. Long term, you don't want that to become an issue!)
Finally, here are some quick tips “for the road”:
Think about why this issue occurred in the first place - and look to resolve the cause of the problem, not just the symptom. Make your SEO processes failsafe so that you reduce the risks of encountering such problems in the future.
For example, if you have a bunch of internal links to a page that shouldn’t be indexed, nofollow the links to them. If that page also has external links, reach out to those websites and ask them to remove the links, or disallow them as well. After all, Google is only indexing them because they believe someone wanted them to. So remove Google’s reason to care!)
If you need to monitor indexation over time, here are some helpful tools:
A site protocol search: search google for “site:url-here” or “site:domain-name.com/folder” If it’s indexed, you’ll see it.
GSC: you can “Inspect URL” for the page in question, or look in the performance report for impressions/clicks to that URL. You can also monitor the Coverage report, and look for any spikes in indexation (that you didn’t expect and don’t want.)
A handy-dandy calendar reminder: Sometimes it’s the simple things! Just have Google remind you to check on it periodically.
Site monitoring tools: Paid tools like Content King can be set up to do the monitoring work for you, and alert you if/when it becomes an issue.
If you need to remove a whole site from Google’s index (e.g. deindex the whole thing!) the quickest path is to claim it in Google Search Console, then use the Remove URL tool - for the whole domain, not just a page path. Be sure to also Disallow crawling of the domain (via the robots.txt file) and monitor it over time; roughly every 6 months - just in case. Here are the settings you’ll want to choose (replacing our domain name with yours!
Work With Us
We’ll help teach, mastermind, and carry out SEO roadmaps that check all the boxes.