Blog

How To Properly Serve Up 404s on Single Page Application (SPA) Pages for SEO

Published on: 
February 17, 2021
Updated on: 
October 5, 2021
by
Colin Gray
Colin Gray
Sam Torres
Sam Torres

Updated Aug 16, 2021 to add a new option (Serve a 404 Component) and update our takeaway recommendations accordingly.

{NOTE: The Gray Dot Company offers premier SEO Consulting services, and as a part of that service, we have senior software engineering expertise available to support the proper execution of our technical requirements for those inevitable “tricky” situations. The following is an example of that work.}

I think it's safe to say that most people in digital fields know that a "404 Page"  or “Page Not Found” is what you see when your URL is wrong. Once you've learned a thing or two about SEO, you will also learn about "301 Redirects". But these status error codes - which Google relies heavily on when it indexes content - usually get tossed out the window by sites implemented using popular Single Page Application (aka SPA) frameworks like React, Vue, Laravel and Angular.

Why? When you write your site as a SPA, one of the first steps is to move the "URL routing" logic out of the server (backend) and into the web client. The SPA becomes responsible for determining what page is shown to the user. We've gone from servers sending static files, to dynamic server-side rendering (SSR), to client-side rendering.

Translation: Out of the box, 404 error pages in SPAs don’t work properly, creating problems for SEO. Here we’ll review the pros and cons of the various workarounds, including the ones we recommend (and the popular solution we DON’T recommend.

Read our full Guide to 404 FAQs, SEO, and proper 404 functionality, learn how to audit JS for SEO, and explore other common JavaScript SEO issues.

Nuts-and-Bolts SPA 404 Issue Details 

  • When a user comes to your site they will be handed the exact same HTML regardless of what URL they are visiting. They will download your JavaScript application code, and your SPA will render the correct page. As they browse the site, SPA applications use Push State and the History API to make it seem like you are browsing multiple pages, when actually you are viewing one JS application (and technically one “page”, in terms of the number of index.html pages that are served).
  • We're not going to go into the many many implications this has on site speed, and Google's ability to index the content (that’s a post for another day!), we're just going to touch on one thing: the loss of the 404.
  • The web server no longer has any way of knowing which URLs are valid - all of that logic has been moved to the browser! In SEO terms, it means that every request will return a "200 Success" HTTP status code. 
  • In less technical terms: this is very very bad for SEO. Google and other search engines have no way of telling which pages are valid, and which are not. 
The 404 page for AirBnb.com
AirBnb's 404 Page Design

Now that we know the problem, let's discuss some 404 solutions for SPAs

This is the solution that seems to be recommended most frequently across the web. Basically, it breaks down the solution as creating an actual 404 URL (‘/404-not-found/’ or '/404.html'...) and redirecting pages that should error to that page. 

However, that’s not a desirable solution for SEO - unless, of course, you want your 404 page itself to start ranking in Google!

Pros: It’s… easy, I guess?

Cons: It doesn’t work, and it’s not good for SEO. So don’t do it!

The Whitelist Solution

If your site is relatively small (which begs the question: how did you get talked into using a SPA!?) or if the dynamic pages are easily determined from a few database queries, you can often create a sitemap very easily. (If you generate this with a web crawler, do note that you may accidentally miss pages, including orphan pages or pages hidden behind a login. If you aren’t careful, this can result in accidentally deindexed pages.)

First: Make sure you think about your static assets folder - that's your JavaScript, CSS, and images. That folder should just serve files, never your SPA content. This should be easy to configure.

Next, the simplest approach: Take each URL and create a corresponding folder in your HTTP document root, and put a copy of `index.html` in all of those folders. Then turn off whatever "match all URLs" configuration you have in nginx or apache, and configure it to just serve the static HTML file. Bingo bango, you will have reliable 404s. This is a bit cumbersome because now you have a file/folder structure to maintain, but if "simpler is better" is your mantra, this might appeal to you.

Alternatively, you could write some configuration code that controls which URLs return 200 and which return 404. This is how I do things when using the Whitelist approach. You will need to add a block of config code that explicitly returns a 200 only for your whitelisted URLs.

Here's an example using Nginx:

# like I said, special consideration for the /static/ folder.
# Just having the block here is enough - all we really want
# to do is override the 'catchall' behavior
location /static/ {}
location = /index.html {}
location = / {}
# the `try_files` directive says: for this location, serve
# the file /index.html from document root.
location = /valid-page1 {
  try_files /index.html /index.html;
}
location = /valid-page2 {
  try_files /index.html /index.html;
}
location = /more/valid/pages {
  try_files /index.html /index.html;
}
# and finally the catchall match - all other URls will
# return a 404, and show an error page to the user
location / {
  return 404;
}

Here’s an example for Apache:

ErrorDocument 404 "404ohno"
RewriteEngine On
RewriteBase /
# ignore actual files and directories
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# whitelist / and /valid-page1
RewriteCond %{REQUEST_URI} !^/$
RewriteCond %{REQUEST_URI} !^/valid-page1$
# all other URLs will receive 404
RewriteRule ^ - [R=404,L]
# our whitelist URLs will get to this line;
# return our SPA code, but ignore static files
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.html [L]

Pros: Relatively small LOE, and it gets the job done. 

Cons: Harder to maintain over time. 

Pro Tip: Consider adding a dynamic way to keep the URL Path Whitelist up to date so that new pages added to your site don’t 404 without manually updating the file. 

The Blacklist Solution

Very similar to the Whitelist approach, but instead of having a list of valid URLs, you can get a list of URLs that you know people are accessing but are not valid on your site. 

Pros: This way you will continue to have dynamic URLs - so if your site is adding and removing pages often (and so a whitelist approach is not tenable), this might be a good "quick fix" for you.

Cons:

  • Generating this list is tricky. A SPA doesn't send requests to your server for every URL, so you can't scour Google Analytics or your logs for 404 responses! (a 404 competent can help!)
  • It’s somewhat harder to maintain; you’ll need to seek other sources of potential 404 issues to add to your Blacklist.
  • Finally, once you know what URLs are 404ing - why do extra work to force them to issue 404s, when you could just redirect them instead? (The goal is issuing 404 responses is to find errors to fix; they aren't an end goal in themselves.)

Isomorphic Rendering

This is a big hammer, so tread lightly! 

Here's how the solution works: 

  • Create a web application server that is capable of emulating a web browser - nodejs is a likely candidate for this approach. 
  • For every request that comes in you render the page in the server, and use the rendered HTML to determine if the page is valid (200) or not (404). 
  • This approach is really processor-intensive (which nodejs has issues with - nodejs is good at high I/O operations, but not so much high CPU operations).

Pros: One-time cost. Once you get that server set up it is "set and forget". Also, you can then serve that rendered HTML, which will benefit your users and Google Analytics. 

Cons:
You will incur performance overhead on every request - though this can be mitigated somewhat on public pages by memoizing HTML responses that are the same for everyone. 

Pro Tip: We don’t recommend this approach due to the site performance issues and server requirements.

Simple Server Side

This solution requires up-front work and on-going updates, but the potential is more than any other solution. It involves creating a web application server that mirrors the routing that is in your SPA.

Any dynamic URLs are validated (for instance, the url `/users/12345` would check that User with the id 12345 exists on the server), and any static pages need to be hard-coded. Pages that are valid with this check return a 200 status, pages that are invalid return 404, and show an error page to the user.

Another step that can be taken with this approach is to render some simple HTML (we recommend HTML 5 semantic markup) that is intended to closely resemble the markup that will eventually be rendered by the SPA. This HTML will be returned immediately, so Googlebot will have something to crawl even if they don't render the JavaScript that powers your SPA. It is important, though, to render HTML content that matches the rendered content, otherwise, Google will penalize your page.

Pros: Faster. Better. Sustainable. More maintainable.

Cons:
Larger upfront effort level.

Pro Tip 1: This is our default, A-1 recommendation.

Pro Tip 2: You only need to validate public content. Any content that requires a login can just be served with 200s, because Googlebot will never see or index those pages!

Discord's 404 Page Template
Discord's 404 Page Template

Finally: Serve a 404 Component

The solutions above are relying heavily on having a correctly – and thoroughly – configured HTTP server. You will also need to configure your SPA to show "Not Found" error messages to the end user. These "Error Components" can be a tool to help improve your HTTP configuration. There are the 2 situations it improves:

  • Fallthrough pattern matching, for instance React Router supports a catch-all `<Route path="*">` to display an error component.
  • API 404 response: if you are showing dynamic content and the API returns a 404, you should show an error page.

The error component you use can and should log the current URL somehow. You could have an API endpoint to track them, send a Google Analytics event, or send a request to your server with `404=true` appended to the end of the URL, so that you could track these in your logs.

Taking this a step further: if you have implemented the "Simple Server Side" solution, and are confident that you are serving 404s correctly, this Error Component only needs to call `window.location.reload()`. This will tell the browser to reload the current URL, bypassing the SPA router, and your HTTP server will correctly issue a 404 response and error page.

This article does a good job describing the many places error pages are shown, and goes into detail creating generic, reusable solutions for React and React Router.

Pros:

  • Improved user experience in some edge use cases
  • Improved error logging
  • If you have Analytics installed, you have a clear list of what URLs are being accessed by users. We recommend sending an event to the DataLayer when the 404 module is rendered to make it easy to separate and segment audiences your Analytics platform.

Cons:

  • Does not issue a 404 header status as a standalone solution. Use only in conjunction with one of these other solutions.

Final Thoughts

  1. Good: If you need a quick-and-dirty solution, the Whitelist (or Blacklist) is the way to go.
  2. Better: Add the 404 component.
  3. Best: If and when you are ready to build the “right” solution, we recommend the Simple Server Side setup + the 404 component.
Learn more about common JavaScript SEO issues & how to overcome them.

And a quick aside!

It is our (highly opinionated) opinion that most sites should not be built using SPA frameworks! A static site builder or basic CMS serves most sites' needs much better. SPAs are fantastic for sites that have a high degree of interactivity and dynamism, features like chat or other social messaging features, for example. They are poorly suited for sites that have lots of static content or rely on Google search results, like eCommerce sites.
*Note that this opinion can be attributed to BOTH the software engineering experts and SEO experts at TGDC.

Work With Us
We’ll help teach, mastermind, and carry out SEO roadmaps that check all the boxes.
CONNECT THE DOTS WITH US