A lot has changed in the last 5 years since the so-called Google Webmaster Tools, which is now called Google Search Console. Google has released significantly more data that promises to be extremely useful for SEOs. Since we lost enough keyword data in Google Analytics a long time ago, we've come to rely on Search Console more than ever.
The “Search Analysis” and “Links to your site” sections are two of the main features that did not exist in Webmasters Tools.
While we may never feel completely satisfied with Google's tools and we may call on its help from time to time. To its credit, Google has developed more help documents and support resources to help Search Console users locate and fix errors.
Even though some of these are not as fun as creating content or watching your keywords jump in the rankings, this SEO category is still very important.
Taking a look at Portent's picture of how the pieces of Internet marketing fit together, fixing crawl errors in Search Console fits neatly into the infrastructure piece:
If you can develop good habits and practice preventative maintenance, doing weekly crawl bug checks will be perfectly adequate to keep bugs under control. However, if you completely ignore these (annoying) errors, things can quickly go from bad to worse.
Crawl error design
One change that has evolved in recent years is the design of the Crawl Errors view within Search Console.
Search Console is divided into two main sections: Site errors and URL errors.
Categorizing errors in this way is quite useful because there is a clear difference between site-level errors and page-level errors. Site-level problems can be more catastrophic, with the potential to damage the overall usability of your site.
URL errors, on the other hand, are specific to individual pages and are therefore less urgent.
The quickest way to access trace errors is from the dashboard. The main dashboard gives you a quick preview of your site, showing you three of the most important management tools: Crawl Errors, Search Analytics, and Sitemaps.
You can quickly see your crawl errors from here. Even if you just take a look every day, you'll be miles ahead of most site administrators.
Site errors
The site errors section shows errors for your website as a whole. These are the high-level errors that affect your site as a whole, so don't skip them.
In the Crawl Errors dashboard, Google will display errors from the last 90 days.
If you have any type of activity in the last 90 days, the snippet will look like this:
If you've been 100% bug-free for the last 90 days, it will look like this:
That's the goal – get a “Well Done!” Of Google. As SEOs we don't usually get any validation from Google, so savor this rare moment if it happens to you one day.
How often should I check the site for errors?
In an ideal world you will log in daily to make sure there are no problems. It may be monotonous since most days everything is fine, but wouldn't you be upset if you missed some critical site errors by not logging in?
At a minimum, you should check at least every 90 days to look for errors so you can keep an eye on them for the future – but frequent, regular checks are best.
We'll talk about creating alerts and automation later, but just keep in mind that this section is critical and needs to be 100% error-free every day. There is no gray area here.
A) DNS errors
Meaning
DNS errors are important – and the implications for your site are huge.
DNS (Domain Name System) errors are the first and most prominent as if Googlebot is having DNS issues, it means that it cannot connect to your domain due to a DNS timeout or DNS lookup issue. .
Your domain is likely hosted at a common domain company, such as Namecheap or GoDaddy, or with your web hosting company. Sometimes your domain is hosted separately from your web hosting company, but other times the same company handles both.
Are important?
Although Google claims that many DNS issues still allow Google to connect to your site, if you are receiving a serious DNS issue, you should act immediately.
There may be high latency issues that allow Google to crawl the site, but provide a poor user experience.
A DNS issue is extremely important as it is the first step to accessing your website. You should take quick action if you encounter DNS issues that are preventing Google from connecting to your site.
How to Solve DNS Problems
First of all, Google recommends using its tool Explore like Google to see how Googlebot crawls your page. Search in Tracking -> Explore as Google in Search Console.
1.- If you are just looking for the DNS connection status and are trying to act quickly, you can use the get, raw button. The slower fetch and process process is useful for getting a side-by-side comparison of how Google views your site compared to a user.
2.- Check with your DNS provider. If Google cannot search and process the page correctly, you will need to take further action. Check with your DNS provider to see where the problem is. There could be problems at the final DNS provider, or it could be worse.
Make sure your server is displaying a 404 or 500 error code. Instead of having a failed connection, your server should display a 404 (not found) or 500 (server error) code. These codes are more accurate than having a DNS error.
other tools
ISUP.me – Lets you know instantly if your site is down for everyone, or just up for you.
Web-Sniffer.net – Shows you the current HTTP request(s) and response header. Helpful for point #3 above.
B) Server errors
Meaning
A server error often means that your server takes too long to respond and the request times out. The Googlebot that is trying to crawl your site can only wait a certain amount of time to load your website before it gives up. If you take too long, Googlebot will stop trying.
Server errors are different from DNS errors. A DNS error means that Googlebot can't even look up the URL due to DNS issues, while server errors mean that although Googlebot can connect to your site, it can't load the page due to server errors.
Server errors can happen if your site is overloaded with too much traffic that the server cannot handle. To avoid this, make sure your hosting provider can extend the hosting to suppress sudden bursts of website traffic. Everyone wants their website to go viral, but not everyone is ready for that!
Are important?
Like DNS errors, a server error is extremely urgent. It's a fundamental mistake and hurts your site overall. You should take immediate action if you detect server errors in Search Console for your site.
Making sure Googlebot can connect to DNS is an important first step, but you won't get much further if your website doesn't actually appear. If you're having server errors, Googlebot won't be able to find anything to crawl and will give up after a certain amount of time.
How to Fix Server Errors
In case your website is working fine at the time this error is encountered, it may mean that there were server errors before although this error may have been resolved for now, you still need to make some changes to prevent it from happening again. .
This is Google's official recommendation to fix server errors:
“Use Explore as Google to check if Googlebot can currently crawl your site. If Explore as Google returns your home page content without problems, you can assume that Google can access your site correctly. ”
Before troubleshooting the server error, you need to diagnose specifically what type of server error you are receiving, as there are many types:
- Time is over
- Truncated headers
- Connection reset
- Truncated response
- Connection denied
- Failed connection
- Connection timeout
- Unanswered
Addressing how to correct each of them is outside the scope of this article, but you should consult the Google Search Console help to diagnose specific errors.
C) Robot Error
A Robots error means that Googlebot cannot retrieve the robots.txt file, located at [yourdomain.com]/robots.txt.
Meaning
One of the most surprising things about a robots.txt file is that it is only necessary if you don't want Google to crawl certain pages.
Google Search Console Help states:
“You only need a robots.txt file if your site includes content that you don't want search engines to index. If you want search engines to index everything on your site, you don't need a robots.txt file, not even an empty one. If you don't have a robots.txt file, your server will return a 404 error when Googlebot requests it and we will continue crawling your site. There will be no problem."
Are important?
This is a pretty important question. For small, static websites without many recent changes or new pages, it's not particularly urgent. But the issue still needs to be addressed.
However, if your site is publishing or changing new content daily, this is an urgent problem. If Googlebot cannot load your robots.txt file, it is not crawling your website and is not indexing new pages and changes.
How to Fix Robots.txt Problems
Make sure the robots.txt file is configured correctly. Check which pages you are telling Googlebot not to crawl, as all others will be crawled by default. Double check the almighty “Disallow: /” line and make sure this line DOES NOT EXIST unless for some reason you don't want your website to appear in Google search results.
If your file appears to be in order and you are still receiving errors, use a server header checking tool to see if your file is returning a 200 or 404 error.
The interesting thing about this problem is that it is better to not have robots.txt at all than to have one that is not configured correctly. If you don't have any, Google will crawl your site as usual. If you have errors, Google will stop crawling until you fix this file.
For just a few lines of text, the robots.txt file can have catastrophic consequences for your website. Be sure to check it often.
2.- URL errors
URL errors are different from site errors because they only affect specific pages on your site, not your website as a whole.
Google Search Console will show the top URL errors by category: desktop, and smartphone. For large sites, this may not be enough data to show all errors, but for most sites this captures all known issues.
Many site owners have run into the problem of seeing a large number of URL errors and getting scared. The important thing to remember is a) Google ranks the most important errors first and b) Some of these errors may have already been resolved.
If you've made some drastic changes to your site to fix errors or believe that many of the URL errors are no longer occurring, one tactic to employ is to mark all errors as fixed and check back in a few days.
Doing this will remove the errors from your dashboard for now, but Google will bring back the errors the next time it crawls your site in the next few days. If you had actually fixed these errors in the past, they will not appear again. If the errors still exist, you know that they are still affecting your site.
A) Soft 404
A soft 404 error is when a page shows up as 200 (found) when it should show up as 404 (not found).
Meaning
Just because your 404 page looks like a 404 page doesn't mean it actually is one. The user-visible aspect of a 404 page is the content of the page. The visible message should let users know that the page they requested has disappeared. Often site owners will have a helpful list of related links for users to visit or a fun 404 response.
The back of a 404 page is the visible response from the crawler. The HTTP header response code should be 404 (not found) or 410 (missing).
A quick reminder of what HTTP requests and responses look like:
If you display a 404 page and it is listed as Soft 404, it means that the HTTP header response code does not return the 404 (not found) response code. Google recommends “that you always return a 404 (not found) or 410 (gone) response code in response to a request for a non-existent page.”
Another situation where soft 404 errors can appear is if you have pages that are redirecting to unrelated pages, such as the home page. Google doesn't seem to explicitly state where the line is drawn on this, only mentioning it in vague terms.
Officially, Google says this about soft 404s:
“Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the home page, instead of returning a 404) can be problematic.”
Although this gives us some direction, it is not clear when it is appropriate to redirect an expired page to the home page and when it is not.
In practice, from my own experience, if you are redirecting large amounts of pages to the home page, Google may interpret those redirected URLs as soft 404s instead of true 301 redirects.
Conversely, if you redirect an old page to a closely related page, you are unlikely to trigger the soft 404 warning in the same way.
Are important?
If the pages listed as soft 404 errors are not critical pages and it is not eating up your crawl budget by having some soft 404 errors, these are not an urgent item to fix.
If you have crucial pages on your site listed as soft 404s, you'll want to take steps to fix them. Important product, category or important pages should not appear in the soft 404 list if they are live pages. Pay special attention to critical pages with the ability to make money on your site.
If you have a large number of soft 404 errors relative to the total number of pages on your site, you need to take quick action. It may be eating up your (precious?) Googlebot crawl budget, allowing these soft 404s to exist.
How to Solve Soft 404 Errors
For pages that no longer exist:
- Allow 404 or 410 if the page is gone and not receiving significant traffic or links. Make sure the server header response is 404 or 410, not 200.
- Redirect (301) each old page to a relevant related page on your site.
- Don't redirect large amounts of dead pages to your home page. They should have a 404 or be redirected to appropriate similar pages.
For pages that are active pages, and are not supposed to be soft 404:
- Make sure there is an adequate amount of content on the page, as thin content can trigger a soft 404 error.
- Make sure your page content does not appear to represent a 404 page while serving a 200 response code.
- Soft 404s errors are strange errors. They lead to a lot of confusion because they tend to be a strange hybrid of normal and 404 pages, and what's causing them isn't always clear. Make sure the most critical pages on your site are not throwing soft 404 errors.
B) 404
A 404 error means that Googlebot tried to crawl a page that doesn't exist on your site. Googlebot finds 404 pages when other sites or pages link to that non-existent page.
Meaning
404 errors are probably the most misunderstood crawling error. Whether it is an intermediate SEO or the CEO of the company, the most common reaction is fear and aversion to 404 errors.
Google states this clearly in its guidelines:
“Generally, 404 errors don't affect your site's ranking on Google, so you can safely ignore them.”
I'll be the first to admit that “you can safely ignore them” is a pretty misleading statement for beginners. No – you can't ignore them if they are 404 errors for crucial pages on your site.
(Google does walk the talk, in this sense: the .go to google.com/searchconsole returns a 404 instead of a useful redirect to google.com/webmasters)
Distinguishing between times when you can ignore an error and when you need to stay late in the office to fix something comes from deep review and experience, but Rand (Moz) offered some timeless advice on 404s in 2009:
“When faced with 404 errors, my thinking is that unless the page:
- A) Receive important links from external sources.
- B) Is receiving a substantial amount of visitor traffic, and/or
- C) Has an obvious URL that visitors/links intend to reach
It is okay to leave the 404 error.”
The hard work comes in deciding what qualifies as a significant external link and substantial amount of traffic for your particular URL on your site.
Annie Cushing also prefers Rand's method and recommends:
“Two of the most important metrics to keep in mind are backlinks to make sure you don't miss the most valuable links and total landing page views in your analytics software. You can have others, like looking at social metrics. Whatever you decide those metrics are, you want to export them all and analyze them in Excel.”
Another thing to consider not mentioned above is offline marketing campaigns, podcasts, and other media that use memorable tracking URLs. It could be that your new magazine ad won't come out until next month, and the marketing department forgot to tell you about an unimportant URL aspect (example.com/offer-20), it's about to be recorded in tens of thousands of magazines. Another reason for interdepartmental synergy.
Are important?
This is probably one of the most complicated and simplest of all errors.
404 errors are very urgent if important pages on your site appear as 404s. On the contrary, as Google says, if a page is long gone and doesn't meet our previous quality criteria, let it be.
As painful as it might be to see hundreds of errors in your Search Console, you just have to ignore them. Unless you get to the root of the problem, they will continue to appear.
How to fix 404 errors
If your important page appears as a 404 error and you do not want this, follow these steps:
1.- Make sure the page is published from your content management system and not in draft or deleted mode.
2.- Make sure the 404 error URL is the correct page and not another variation.
3.- Check if this error appears in the www and non-www version of your site and the http vs https version of your site.
4.- If you do not want to revive the page, but want to redirect it to another page, make sure it is redirected to the most appropriate related page.
In short, if your page is dead, recreate the page. If you don't want that page to exist, do a 301 redirect to the correct page.
How to Prevent Old 404 Crawl Error Report Errors from Reappearing
If your 404 error URL is destined not to return, let it die. Just ignore it, as Google recommends. But to prevent it from showing up in your crawl error report, you'll need to do a few more things.
As another indication of the power of links, Google will only show 404 errors first if your site or an external website is linking to the 404 page.
In other words, if you type yoursite.com/unicornios-azules on your site, it will not appear in the crawl errors dashboard unless someone links to it.
To find the links to your 404 error page, go to the Tracing > Crawl Errors section:
Next, click the URL you want to fix:
Look on your page for the link. It's often faster to look at the source code of your page and find the link in question there:
It's careful work, but if you really want to prevent old 404s from showing up on your dashboard, you'll have to remove links to that page from every page that links to it. Even other websites.
What's really (not) fun is if you're getting links from old sitemaps. You will have to leave the old 404 sitemaps to remove them completely. Don't redirect them to your live website.
C) Access denied
Access denied means that Googlebot cannot crawl the page. Unlike a 404, Googlebot cannot crawl the page in the first place.
Meaning
Access Denied errors commonly crash the Googlebot through these methods:
You require users to log in to view a URL on your site, so Googlebot is blocked.
Your robots.txt file blocks Googlebot from individual URLs, entire folders, or the entire site.
Your hosting provider is blocking Googlebot from your site, or the server requires users to authenticate through a proxy.
It is important?
Like 404 and soft 404 errors, if the blocked pages are important for Google to crawl and index, you should take immediate action.
If you do not want this page to be crawled and indexed, you can safely ignore access denied errors.
How to Fix Access Denied Errors
To fix access denied errors, you'll need to remove the item that's blocking Googlebot access:
Remove the login from the pages you want Google to crawl, whether it's an in-page or popup login message
Check your robots.txt file to make sure the pages listed in it are intended to be blocked from crawling and indexing
Use the robots.txt checker to view warnings in your robots.txt file and to test individual URLs in the file
Use a user agent switch plugin for your browser or the Explore as Google tool to see how your site appears in Googlebot
Explore your website with Screaming Frog, it will ask you to log in to pages if the page requires it
Although not as common as 404 errors, access denied issues can hurt your site's ranking ability if the wrong pages are blocked. Be sure to keep an eye out for these errors and quickly fix any pressing issues.
D) Not followed
Meaning
Not to be confused with a “nofollow” link directive, a “Not followed” error means that Google was unable to follow that particular URL.
In most cases these errors arise from problems with Flash, Javascript or redirects.
Are important?
If you are dealing with unfollowed URL errors on a high priority URL, then yes, these are important.
If the problems arise from old URLs that are no longer active, or from parameters that are not indexed, the priority level on these is lower, but you should still analyze them.
How to solve unfollowed URL problems
Google identifies the following as features that Googlebot and other search engines may have trouble crawling:
- JavaScript
- Cookies
- Session ID
- Frames
- DHTML
- Flash
Use the Lynx text browser or the Browse as Google tool, using get and process, to view the site as Google would. You can also use a Chrome plugin, like User-Agent Switcher, to imitate Googlebot while browsing pages.
If like Googlebot, you are not seeing pages load or not seeing important content on the page due to some of the above technologies, then you have found the problem. Without visible content and links to crawl on the page, some URLs cannot be followed. Make sure you dig deeper and diagnose the problem to fix it.
For crawling issue parameters, be sure to review how Google is currently handling your parameters. Specify changes in the URL Parameters tool if you want Google to treat your parameters differently.
For url not followed errors related to redirects, be sure to correct any of the following procedures that apply:
- Check redirect chains. If there are too many “hops”, Google will stop following the redirect chain
- Where possible, update your site architecture to allow every page on your site to be reached from static links, rather than relying on redirects implemented in the past
- Do not include redirected URLs in your sitemap, include the destination URL
Google used to include more details in the Unfollowed URLs section, but a lot of additional data can be available in the Search Console API.
other tools
- Screaming Frog SEO Spider is a great tool to scan your live site and unearth redirect errors. This tool will show you at scale how redirects are configured and whether they are correctly configured as 301 redirects or if they are configured as something else.
- Moz Pro Site Crawl
- Raven Tools Site Auditor
- E) Server errors and DNS errors
Under the URL errors section, Google lists server errors and DNS errors, the same sections in the site errors report. Google's direction is to handle them the same way you would handle site errors and DNS errors.
They would differ in the URL errors section if the errors will only affect the individual URLs and not the site as a whole. If you have isolated settings for individual URLs, such as minisites or a different setting for certain URLs on your domain, they may be displayed here.
Now that you're an expert on these URL errors, here's a handy URL error chart that you can print out and stick on your desk or bathroom mirror.
I get it – some of this technical SEO stuff can be boring. No one wants to individually inspect seemingly unimportant URL errors, or conversely, have a panic attack looking at thousands of errors on their site.
However, with experience and repetition, you will gain the mental muscle memory of knowing how to react to mistakes: which ones are important and which ones can be safely ignored.
If you haven't already, I invite you to read the official Google documentation for Search Console and keep these URLs handy for future questions:
- Webmaster Central Help Forum
- Webmaster FAQ: Crawling, Indexing and Sorting
- Webmaster Central Blog
- Help Google Search Console Report Crawl Errors
The Search Console Crawl Errors section was simply covered. Search Console is a beast of data on its own, so to learn a little more about how to make the best use of this tool in its entirety, check out these other guides:
- The Ultimate Guide to Using Google Search Console as a Powerful SEO Tool
- The Ultimate Guide to Google Webmaster Tools
- Yoast Search Console series
Google has generously given us one of the most powerful (and free!) tools for diagnosing website errors. Not only will fixing these errors help you improve your Google rankings, it will help you provide a better user experience to your visitors and help you achieve your business goals faster.