Blog

How to Deindex "Stuff" from Google Quickly & Effectively

Published on: 
February 17, 2021
Updated on: 
September 29, 2021
by
Tory Gray
Tory Gray
Begüm Kaya
Begüm Kaya

Removing content from Google can be a tricky process. There are important questions to be answered, different use cases, and a variety of methods to resolve the issue (which vary by content type, scale, speed requirements, etc. Every step you take means new decisions to make! 

You may also be seeing different advice from different sources (or based on different use cases.) Fortunately, we are here to help you to provide a thorough explanation - what to do in the most common specific situations, whether you are trying to deindex a page, an image, or an entire site!

Deindex Use Cases

First and foremost, let’s review the common use cases for wanting to remove “stuff” - content, images, web pages, or PDFs - and what tools (or combinations of tools) are useful in those scenarios. 

Confidential Information OR Hacked Site Pages

Generally speaking, the speed of the solution is paramount. 

  1. Remove the offending content. Delete it, move it, etc.
  2. Use the Remove URL tool (GSC access & permissions) or the Remove Content tool (if you don’t.)
Did your website get hacked?
Did your website get hacked?

Urgent But Not Confidential

Here’s one where you can’t (or won’t) move or delete the content but you need it off Google quickly. What do you do? 

  1. Use the Remove URL tool - either for a specific URL or a specific folder
  2. If it’s a few pages, use the Meta Robots Noindex tag. 
  3. Monitor and confirm the fix over the next few days. 
  4. If it’s many, many pages (and therefore not worth the manual work), use the Robots.txt Disallow command. 
  5. Monitor and confirm the (temporary) fix over the next few days.
  6. Then set yourself a calendar reminder to check on this again in 6(ish) months.
  7. Depending on how many links there are to this content, you may need to keep monitoring - and re-upping - the Remove URL request. Remove links to the asset to help reduce this problem!

Duplicate Content 

Sometimes you want to deindex a PDF or page because it’s duplicate content. In this case: 

  1. If it’s just one, or just a few pages, use the meta robots tag (or the X-Robots tag in the case of PDFs)
  2. If it’s many (100s+) pages, a Robots.txt Disallow is likely easier (requires ongoing monitoring.) Here again - be careful to not accidentally block valuable pages!
  3. If you need to help it finalize the process sooner, use the Remove URL tool as well. 

Indexation Bloat (LOTS of pages need to be deindexed!)

If the issue is just that you have lots of low-quality pages indexed, you’d want a fast solution that doesn’t take too long to implement. This commonly occurs on URL variations (e.g. URL parameters), and /cart and /account pages (that already require a password to access.) In this situation: 

  1. Look for patterns in the URL structure. You can likely noindex anything and everything in the /cart, /account, etc. folders.
  2. Block these folders in the Remove URL folder
  3. Disallow these folders in the Robots.txt file
  4. Monitor the results over time to ensure it doesn’t occur again. 

You can also utilize the Meta Robots or X-Robots tags (instead of the Robots.txt file), if you wish. It’s simply a tradeoff: less monitoring, but more potential for crawl budget issues. There is no perfect answer - just the best, most reasonable one for you!

The Tools of Deindexation

Next, let's review the tools of the trade! Keep and mind that some of these tools can and should be used together. We’ll break down those appropriate combinations in the deindexation use cases outlined below. 

But first: 1 specific combo should NOT be used: don’t use the Robots.txt Disallow command in conjunction with any page-level tool (e.g. x-header, meta robots, or canonical tags.) This tool controls crawling NOT indexing. And if Google can’t crawl the page, they can’t read your noindex or canonical tags in order to abide by them. Learn more about this, and other common search engine crawling & indexing issues, in this in-depth guide

Remove (or Move) the Content Itself

Sometimes the obvious solution is the right one. If you need something to go away - ASAP - you can just delete it!

  • Put it into Draft mode (or another non-live mode)
  • Put a password requirement in front of it
  • Move it to a different URL entirely
    • Don’t set up a redirect in this scenario - and make sure the new page isn’t indexable.

Remove URL Tool

This tool is available via Google Search Console (GSC), the tool that enable SEOs to quickly identify and fix issues. Use this magical power with care - don’t noindex other, value pages on accident! This is a “temporary” fix - 6 months is the cap. Use it in conjunction with another method so the page/PDF/image in question stays deindexed. 

People tend to ignore this tool because of its temporary nature - which I’d argue is a mistake. Other than removing the content entirely (not always an option!), this is the *fastest* way to get something deindexed. So use it, and use it well. 

Remove URL Tool Dashboard
GSC's URL Removal Dashboard

Remove Content Tool

Use this tool for content: 

  1. That you *don’t* own, or for which you don’t have access to (or high enough permission for) in GSC. In this case, utilize the GSC Remove URL tool.
  2. That has already been removed. As in, it’ll only work if and when the content is gone. 

Meta Robots Noindex

Send an instruction to “noindex” a page via 1 single line of code added to the HEAD section of your HTML. Keep in mind that this is a longer-term (AKA likely not very quick) solution, since you have to wait around for Google to re-crawl the page to *see* this tag. It is, however, very sustainable. If your use case isn’t urgent, this is a great tool, and you could always request that Google recrawl the page sooner (via GSC.)

Meta robots will not help with your crawl budget if that’s a concern (it generally only is for very large websites, when there are a LOT of pages noindex.)

X-Robots Noindex

All the same rules as the Meta Robots tag, but sent via an HTTP header response instead of via the HTML. It is much less common, and will likely require development help to implement.

It’s a great tool for deindexing items like PDFs, which don’t have HTML to add a meta tag to. 

Robots.txt Disallow (Sort of...)

The Disallow command doesn’t actually control indexation - but for run-of-the-mill pages you don’t want indexed, it may be effective. If there are any internal links to the resource in question - or any external links! - this method will fall short. 

That said, it’s helpful for controlling crawl budget (a concern for large websites.)

Canonicalizing to other resources (Again, sort of.)

If you canonicalize a page to a different page, you are telling search engines that it’s a duplicate, and therefore less worthy of indexing. It will often result in noindexing - but this is by no means guaranteed. 

*This is primarily useful for duplicate content, where you actively want to pass credit to the “original” source of the content - on or off your website. 

So we’ve got several effective tools, fortunately. I’d add, however, that these 2 are - by far - the most used, for a reason. If you just aren’t sure what to do, use these 2 (together, ideally, or the first at minimum.)

  • Meta Robots Noindex tag
  • Remove URL tool
“Does speed matter? (Do you need to solve this indexation issue QUICKLY?) The Remove URL tool is your go-to. Just make sure to follow this step up with another solution as well. So your fix “sticks.” ”    

How to Use these Deindexation Tools

Here’s how to actually use these tools to solve these problems (step-by-step).

Remove URL Tool Instructions

Log into the Google Search Console and create a “New Request” under the Temporary Removals section. You can find this via the Removals item in the left navigation bar. 

The URL Removal Tool enables you to remove a whole site, a page, or a section of your website from SERPs (so tread carefully!)

First navigate to GSC, and the Remove URL tool within it: 

GSC Left Navigation - Find The URL Removals Section
Click "Removals" in GSC's Index Navigation

Click the big red "New Request" button. Add your URL to the "Enter URL" field and click Next to confirm.

Using the Remove URL Tool in GSC
How to Use GSC's Remove URL Tool

Use the “Remove this URL only” option if you are deindexing one-off pages.

If you want to deindex, say, all pages that live under the /cart folder, add that URL path to the Enter URL box, and change the radio button to the “Remove all URLs with this prefix” option. *Beware this option, as you might accidentally noindex other pages unintentionally. Basically: make sure nothing you want indexed lives in that folder. If that's the case, you are set!

Noindex Tag Instructions

A noindex meta robots tag or x-robots header is a method of communication to tell the search engines to remove the page from their indexes.

Example of a meta robots noindex instruction:

<meta name="robots" content="noindex">

Example of x‑robots noindex tag in the header response:

HTTP/1.1 200 OK
X-Robots-Tag: noindex

The primary difference between the two - for your purposes - is that the former works for pages (URLs) whereas the x‑robots response works for pages AND non-HTML file types like PDFs. X-Robots is less common, and may or may not be available directly in your CM (e.g. you likely have to work with your engineering team to get it working.)

Make sure that your pages are crawlable - since the bots will have to crawl your page to see the tags. You can refer here for detailed information on the crawling and indexing process of search engine bots.

Learn all about robots noindexing & how to do it here.

Canonical Tag Instructions

When there are multiple versions of a page - meaning the page copy can be reached via multiple URLs or URL variations - the SEO best practice is to “consolidate” these pages via canonicalizing them to a master URL. (NOTE that this assumes you can’t just delete & redirect the duplicate content.)

Learn all about canonicalization & how to do it here.

When implementing this methodology, remember that the bots may choose to ignore you - especially if the pages aren’t true duplicates. Rather, a canonical is a “recommendation” you give to Google, not a “command” they must follow. 

Robots.txt Disallow Instructions

Again - the disallow isn’t *directly* an indexation tool, but it can be effective (especially when scale matters!) when used in conjunction with other methods (specifically the Remove URL tool, and link removal.)

You can learn all about the robots.txt file & how to do it here

Are you doing all the right things, and still not seeing these assets get flushed out of the search index? 

This issue is likely that Google isn't crawling these URLs (or PDFs, etc.) to see the new instructions, and therefore isn't doing the work to get rid of them.

The easiest solution here is to:

  1. Create a static XML sitemap with the old/bad URLs list
  2. Upload that list to your website
  3. Submit the file to your site's GSC account and monitor results for (typically) ~1-2 months (sometimes as quickly as a few weeks)
  4. Remove the file from GSC/your site once the process is complete (so as to reduce the error rate of your sitemap file)

This way Google will quickly find each URL, crawl it, see the updated instructions, and know what action to take ASAP.

*Don't forget to remove the file from GSC/your site once the process is complete (so as to reduce the error rate of your sitemap files. Long term, you don't want that to become an issue!)

Finally, here are some quick tips “for the road”: 

Think about why this issue occurred in the first place - and look to resolve the cause of the problem, not just the symptom. 

Make your SEO processes failsafe so that you reduce the risks of encountering such problems in the future. For example, if you have a bunch of internal links to a page that shouldn’t be indexed, nofollow the links to them. If that page also has external links, reach out to those websites and ask them to remove the links, or disallow them as well. After all, Google is only indexing them because they believe someone wanted them to. So remove Google’s reason to care!)

If you need to monitor indexation over time, here are some helpful tools: 

  • GSC: you can “Inspect URL” for the page in question, or look in the performance report for impressions/clicks to that URL. You can also monitor the Coverage report, and look for any spikes in indexation (that you didn’t expect and don’t want.)
  • A handy-dandy calendar reminder: Sometimes it’s the simple things! Just have Google remind you to check on it periodically. 
  • Site monitoring tools: Paid tools like Content King can be set up to do the monitoring work for you, and alert you if/when it becomes an issue. 
  • If you need to remove a whole site from Google’s index (e.g. deindex the whole thing!) the quickest path is to claim it in Google Search Console, then use the Remove URL tool - for the whole domain, not just a page path. Be sure to also Disallow crawling of the domain (via the robots.txt file) and monitor it over time; roughly every 6 months - just in case. Here are the settings you’ll want to choose (replacing our domain name with yours!
Google Calendar Reminder to check on indexation status

Happy deindexing!

Work With Us
We’ll help teach, mastermind, and carry out SEO roadmaps that check all the boxes.
CONNECT THE DOTS WITH US