Learn more from an Opinionated Developer – co-founder and board member Colin Gray takes a deep dive into important software engineering topics of interest. In this post, Colin shares his wisdom on how to properly serve a 404 header status code on Single-page Applications.
I think it's safe to say that most people in digital fields know that a "404 Page" or “Page Not Found” is what you see when your URL is wrong. Once you've learned a thing or two about SEO, you will also learn about "301 Redirects". But these status error codes - which Google relies heavily on when it indexes content - usually get tossed out the window by sites implemented using popular Single Page Application (aka SPA) frameworks like React, Vue, Laravel and Angular.
Why? When you write your site as a SPA, one of the first steps is to move the "URL routing" logic out of the server (backend) and into the web client. The SPA becomes responsible for determining what page is shown to the user. We've gone from servers sending static files, to dynamic server-side rendering (SSR), to client-side rendering.
Translation: Out of the box, 404 error pages in SPAs don’t work properly, creating problems for SEO. Here we’ll review the pros and cons of the various workarounds, including the ones we recommend (and the popular solution we DON’T recommend.
Read our full Guide to 404 FAQs, SEO, and proper 404 functionality, learn how to audit JS for SEO, and explore other common JavaScript SEO issues.
This is the solution that seems to be recommended most frequently across the web. Basically, it breaks down the solution as creating an actual 404 URL (‘/404-not-found/’ or '/404.html'...) and redirecting pages that should error to that page.
However, that’s not a desirable solution for SEO - unless, of course, you want your 404 page itself to start ranking in Google!
Pros: It’s… easy, I guess?
Cons: It doesn’t work, and it’s not good for SEO. So don’t do it!
If your site is relatively small (which begs the question: how did you get talked into using a SPA!?) or if the dynamic pages are easily determined from a few database queries, you can often create a sitemap very easily. (If you generate this with a web crawler, do note that you may accidentally miss pages, including orphan pages or pages hidden behind a login. If you aren’t careful, this can result in accidentally deindexed pages.)
First: Make sure you think about your static assets folder - that's your JavaScript, CSS, and images. That folder should just serve files, never your SPA content. This should be easy to configure.
Next, the simplest approach: Take each URL and create a corresponding folder in your HTTP document root, and put a copy of `index.html` in all of those folders. Then turn off whatever "match all URLs" configuration you have in nginx or apache, and configure it to just serve the static HTML file. Bingo bango, you will have reliable 404s. This is a bit cumbersome because now you have a file/folder structure to maintain, but if "simpler is better" is your mantra, this might appeal to you.
Alternatively, you could write some configuration code that controls which URLs return 200 and which return 404. This is how I do things when using the Inclusion list approach. You will need to add a block of config code that explicitly returns a 200 only for your listed URLs.
Pros: Relatively small LOE, and it gets the job done.
Cons: Harder to maintain over time.
Pro Tip: Consider adding a dynamic way to keep the URL Path Inclusion List up to date so that new pages added to your site don’t 404 without manually updating the file.
Very similar to the Inclusion approach, but instead of having a list of valid URLs, you can get a list of URLs that you know people are accessing but are not valid on your site.
Pros: This way you will continue to have dynamic URLs - so if your site is adding and removing pages often (and so the inclusion list approach is not tenable), this might be a good "quick fix" for you.
Cons:
This is a big hammer, so tread lightly!
Here's how the solution works:
Pros: One-time cost. Once you get that server set up it is "set and forget". Also, you can then serve that rendered HTML, which will benefit your users and Google Analytics.
Cons: You will incur performance overhead on every request - though this can be mitigated somewhat on public pages by memoizing HTML responses that are the same for everyone.
Pro Tip: We don’t recommend this approach due to the site performance issues and server requirements.
This solution requires up-front work and on-going updates, but the potential is more than any other solution. It involves creating a web application server that mirrors the routing that is in your SPA.
Any dynamic URLs are validated (for instance, the url `/users/12345` would check that User with the id 12345 exists on the server), and any static pages need to be hard-coded. Pages that are valid with this check return a 200 status, pages that are invalid return 404, and show an error page to the user.
Another step that can be taken with this approach is to render some simple HTML (we recommend HTML 5 semantic markup) that is intended to closely resemble the markup that will eventually be rendered by the SPA. This HTML will be returned immediately, so Googlebot will have something to crawl even if they don't render the JavaScript that powers your SPA. It is important, though, to render HTML content that matches the rendered content, otherwise, Google will penalize your page.
Pros: Faster. Better. Sustainable. More maintainable.
Cons: Larger upfront effort level.
Pro Tip 1: This is our default, A-1 recommendation.
Pro Tip 2: You only need to validate public content. Any content that requires a login can just be served with 200s, because Googlebot will never see or index those pages!
The solutions above are relying heavily on having a correctly – and thoroughly – configured HTTP server. You will also need to configure your SPA to show "Not Found" error messages to the end user. These "Error Components" can be a tool to help improve your HTTP configuration. There are the 2 situations it improves:
The error component you use can and should log the current URL somehow. You could have an API endpoint to track them, send a Google Analytics event, or send a request to your server with `404=true` appended to the end of the URL, so that you could track these in your logs.
Taking this a step further: if you have implemented the "Simple Server Side" solution, and are confident that you are serving 404s correctly, this Error Component only needs to call `window.location.reload()`. This will tell the browser to reload the current URL, bypassing the SPA router, and your HTTP server will correctly issue a 404 response and error page.
This article does a good job describing the many places error pages are shown, and goes into detail creating generic, reusable solutions for React and React Router.
Pros:
Cons:
Learn more about common JavaScript SEO issues & how to overcome them.
It is our (highly opinionated) opinion that most sites should not be built using SPA frameworks! A static site builder or basic CMS serves most sites' needs much better. SPAs are fantastic for sites that have a high degree of interactivity and dynamism, features like chat or other social messaging features, for example. They are poorly suited for sites that have lots of static content or rely on Google search results, like eCommerce sites.
*Note that this opinion can be attributed to BOTH the software engineering experts and SEO experts at TGDC.
Thanks for joining us alongside an Opinionated Developer. Reach out to Gray Dot Company if you need support with Strategy + Tech + Data.