At first glance, Google Search Console (GSC) may appear a bit discouraging because of so many reports, functions, and metrics that make little sense. But you can’t just ignore it. Google Search Console is the most important tool for every website owner and mostly for web-masters.
If you are already aware of Google Search Console, then it is time to level up your knowledge. This post will allow you to maintain a healthy and well-ranked website. This post can be a bit technical, but it will be worth it if you read it carefully. In this post, you will know about all the Index Coverage Issue, their reasons, and how to fix these issues without overwhelming much.
What Is Index Coverage Report?
Index Coverage Report provides an overview of all the pages of your websites that Google Bots tried to Crawl and Index. If Google Bots find any issue during crawling and indexing, they report you with relevant errors. Moreover, you will receive an email regarding index coverage issues straight to your inbox.
If you are experiencing any Index Coverage Issues and want to find Fixes for each issue, then you are in the right place. For your better understanding, I will discuss each Index Coverage Issue individually along with all the possible fixes.
The Index Coverage Report shows the following four statuses:
- Valid with warnings
For websites having less than 500 pages, Google recommends them not to use Index Coverage Report, but instead, Google allows such websites to use Google’s Site Operator.
I strongly disagree with that because if the organic traffic is essential for your business, then you should use Index Coverage Report instead of Google’s Site Operator because Index Coverage Report provides detailed information and helps you debugging the Index issues more effectively.
You can find your own Index Coverage Report using the following steps:
- Log in to Google Search Console
- Choose a property
- Under Index, click on coverage in the left navigation
The Index Coverage Report statuses provide the following information:
- Error: It means some pages couldn’t be indexed for some reason.
- Valid with warnings: It means some pages have indexed successfully, but they have some issues that need to be addressed.
- Valid: It shows the pages that have been indexed successfully.
- Excluded: It means the pages that couldn’t be indexed because search engines got a clear signal that these pages shouldn’t be indexed.
Each status consists of one or more errors, and you will know what those errors depict and how to fix those errors. Below are all the Index Coverage Issues with possible solutions:
As discussed earlier, Valid URLs are the pages that have been successfully Indexed. The Valid URLs are further divided into two types:
- URL Submitted and Indexed: These are the URLs that were submitted through your XML Sitemap and subsequently Indexed. There is no action required for this.
- URL Submitted, not indexed in Sitemap: These are the URLs that were not submitted through your XML sitemap, but Google indexed them anyway.
You need to check whether these URLs need to be indexed; if yes, then add them to your XML sitemap. If not, make sure to exclude them in your robots.txt if they can cause crawl budget issues.
Valid URLs with warnings
As mentioned earlier, Valid URLs with warnings are the URLs that have been indexed, but there are still some issues that need to be addressed.
Valid URLs with warnings status contain only one type, which is “Indexed, though blocked by robots.txt.” Normally Google doesn’t index URLs that are blocked by robots.txt, but sometimes Google finds some links; this is why it Index such URLs.
If you don’t need such pages to be Indexed, then update your robots.txt, and possibly apply robots noindex directives.
Excluded URLs are those pages that are not Indexed because the search engines got clear instructions not to index them. There could be possible issues for that if you didn’t exclude such URLs on purpose.
The Excluded URLs contains the following 15 different types based on the issues:
Alternative page with the proper canonical tag
Canonical tags are used to prevent duplication of URLs. These URLs are the duplicates of other URLs, and it shows they are canonicalized with the preferred version of URLs.
There is no need for any action for this.
Blocked by page removal tool
The URLs removal request doesn’t allow URLs to show on Google searches. These URLs are hidden from Google searches for 90 days. After this period, Google may bring back these URLs to the surface. The URL removal request feature is the quickest way to prevent your URLs from appearing in Google searches.
If you don’t want these URLs to Index by Google, then give a clear signal to Google that they shouldn’t Index these URLs. You can send this signal via robots noindex directive and make sure that these URLs are recrawled before the expiry of 90 days.
Blocked due to unauthorized request 401
This means that Google received 401 HTTP response while requesting them – meaning they were unauthorized to access these URLs. This usually occurs during the Staging environment and made inaccessible to the world using HTTP Authentication.
Check whether you have any important URLs in them. If you do have, then you will have to overview the problem because it can be a serious SEO issue. Sometimes the internal and external links can be the cause of this. Investigate how Google found your Staging environment, if it is listed and remove any reference to it.
Blocked by robots.txt
These URLs are not indexed by Google because they couldn’t get enough signals strong enough to index these URLs. These URLs are maybe blocked by robots.txt file.
Make sure you do not have any important URLs among them.
Crawl anomaly means the Google crawlers received 4XX and 5XX range response codes that are not listed with their own types in Google Source Coverage Report.
Use the URL inception tool to see whether you can replicate the issue. If you can able to do that, then investigate the issue. If you don’t find any issue and everything works fine, then wait for some time because it can be a temporary issue and will be resolved soon.
Discovered – currently not indexed
It means that Google discovers these URLs, but they could not be crawled and hence not indexed.
There can be some possible reasons for this, like these URLs are in the queuing process, and Google was unsuccessful in indexing them because your site is overloaded, slow or unavailable most of the time. While investigating, if you do not find any issue, keep an eye because it can be a temporary delay.
Crawled – currently not indexed
Google crawled these URLs but could not index them, because either they were not as important or the content was duplicate and thin.
Sometimes, Google takes time after crawling to index URLs. So, you should wait for some time if you don’t find any issue.
Plus, check whether these URLs contain enough internal links to be indexed.
Duplicate, Google chose different canonical than user
It means that Google duplicated these URLs even you canonicalized them to your preferred URL, and Google chose to ignore that and applied a different canonical. This usually happens with multi-language pages with higher similar pages and thin content.
You can use the URL inspection tool to check the issue. It can be possible that Google selected a different canonical because it has more links and more content.
Duplicate without user-selected canonical
According to Google, these URLs are duplicates. Sometimes, among these URLs, some PDFs are the 100% duplicates of other PDFs. Therefore, Google decides to exclude these URLs. Google thinks that these URLs are not the preferred versions.
To fix this issue, add canonical URLs to the preferred versions of URLs.
Excluded by ‘noindex’ tag
These URLs were not indexed by Google because of noindex directive, either in the HTTP header or HTML source.
Check if you don’t have any important pages among them. If you do have, then use the noindex directive and run the URL inspection tool to request indexing. If you want these pages to be inaccessible publicly, then use HTTP authentication.
Duplicate, submitted URL not selected as canonical
When you submit URLs through XML sitemap and don’t set a canonical URL to them, Google considers such URLs are duplicates of other URLs.
To fix this issue, add proper canonical URLs to the preferred version of URLs.
Not found (404)
Although these URLs are not included in your XML sitemap, Google found them anyway because they returned HTTP status code 404. It can be possible that Google found these URLs because they existed previously, or Google found them on other sites.
If you have any important URL among them, then restore the content on these URLs, or you can 301 redirect the URL to the most relevant alternative.
Page with redirect
The URLs are redirecting and do not index by Google.
No action is required for this.
Page removed because of legal complaint
Google has some legal guidelines, and if any of your pages breach those guidelines, Google doesn’t index such pages.
You have to check the content of pages so that you can fix the issue.
These URLs don’t return HTTP 404 status code, but content gives the impression that it is, in fact, the 404 page by displaying “Page can’t be found” message.
Make sure these URLs reflect the proper 404 HTTP status code if they are actually 404 pages. If they are not 404 pages, then make sure they reflect the otherwise.
Following are the 8 types of Error URLs:
Google did not crawl these URLs because of redirect error. The possible reasons for this error are:
- The redirect URL is too long.
- Redirect Loops
- Redirect chains are too long. Google follows a maximum of 5 redirects in a single crawl.
If you encounter any of the above reasons for redirect errors, then try to resolve them.
Submitted URL blocked by robots.txt
Google’s blocked through robots.txt, although you submitted these URLs through XML sitemap. This is similar to the other two types that we have already covered before.
Here is what makes this one different:
- URLs would be listed under “Indexed, though blocked by robots.txt” if they would have been indexed.
- If URLs are not submitted through XML sitemap and are still indexed, then they would be listed under type “Blocked by robots.txt.”
- You should remove these URLs from the XML sitemap if they were not accessible to Google.
- If these URLs are important and you want to prevent them from blocking through robots.txt, select the URL and find atxt directive. On the right-hand side, click on the TEST ROBOTS.TXT BLOCKING button.
Server error (5xx)
Google doesn’t crawl pages showing 5XX error.
Investigate why these URLs are showing 5XX error. Sometimes, this error is temporary because of the busy server.
Submitted URL marked ‘noindex’
These URLs got noindex directive either in HTTP header or HTML source, although you submitted them through XML sitemap.
- If you don’t want them to be indexed then remove them from the XML sitemap
- If they are important to you, then remove the noindex directive
Submitted URL returns unauthorized request (401)
Google was not able to access these articles because it received a 401 HTTP response, although you submitted these URLs through the XML sitemap. The only difference between this error and “Blocked due to unauthorized request (401)” is that this time you submitted these URLs through XML sitemap.
If 401 HTTP code is correctly returned, then make sure to remove these URLs from the XML sitemap. If not, then allow the access of these URLs to Google.
Submitted URL seems to be a Soft 404
Google considers them as “soft 404” although you submitted them through XML sitemap. These URLs may return status code 200, while, in fact, the content on the page gives the impression that it is a 404. This type is similar to soft 404, but the only difference is in this case, URLs are submitted through XML sitemap.
- Make sure these URLs return a proper 404 HTTP status code if they are real 404s and are removed from the XML sitemap.
- If they are not real 404s, then make sure they reflect otherwise.
Submitted URL not found (404)
It seems like you submitted these URLs through the XML sitemap, but it appears URLs do not exist. The only difference between this error and “Not found (404)” is that this time you submitted URLs through XML sitemap.
If these URLs are not important, then remove them from the XML sitemap. If they are important, then restore the content or 301 redirect the URL to the relevant alternative.
Submitted URL has crawl issue
Google encountered crawl issues, although you submitted these URLs through the XML sitemap. Sometimes, these crawl issues are temporary and are resolved on their own.
Use the URL inspection tool to replicate the issue. If you do not find any issue, then this can be temporary.
We have discussed all the Index Coverage Issues and their possible solutions. Make sure you have gone through the relevant Index coverage issue so that you can easily resolve the problem without any difficulty. No doubt, these are some technical terms, but these are explained in a simple manner so that you can find your relevant index coverage issue and its issue without any difficulty.
Two Runs is a website development and website designing company. We can help you with your Index Coverage Issues and tell you how to resolves such issues without overwhelming yourself. Our team of experts is always keen to help you with your Index Coverage Issues so that you can fuel the growth of your website without any hurdle.