Google: How to Optimize Your Website's Crawling Budget

Google: How to Optimize Your Website's Crawling Budget pixelwork

Google: How to Optimize Your Website's Crawling Budget

 

On the official Google blog for webmasters, Gary Illyes wrote about crawl budgets and how they affect your website. Prioritizing pages that should be indexed can help you get high rankings for your pages. There are two factors that influence a website's crawl budget:

1.- The tracking speed limit

Crawling is the top priority of Google's web crawler. The crawl rate limit represents the number of simultaneous parallel connections that Googlebot can use to crawl the site, as well as the time it has to wait between crawls.

The crawl rate depends on how quickly a website responds to requests. You can also limit indexing in Google Search Console.

2.- Tracking Demand

Crawl demand represents Google's interest in a website. URLs that are more popular on the Internet tend to be crawled more frequently to keep them fresh in Google's index. Google also tries to prevent URLs from becoming stale in the index.

If a website moves to a new address, crawl demand could increase to reindex content under the new URLs.

The crawl rate limit and crawl demand define the crawl budget as the number of URLs that Googlebot can and wants to crawl.

How to optimize your tracking budget

Having many URLs of low added value can negatively affect the crawling and indexing of a site. These are some of the low value added URLs that should be excluded from crawling:

  1. Pages with session ID: If the same page is accessible with multiple session IDs, use the rel=canonical attribute on these pages to show Google the preferred version of the page. The same applies to all duplicate content pages on your site, for example print versions of web pages. Duplicates will be ignored.
  2. Faceted navigation (filtering by color, size, etc.): Filtering pages by color, size, and other criteria can also result in a lot of duplicate content. Use your site's robots.txt file to ensure these duplicates are not indexed.
  3. Pages 404 Soft: Soft 404 pages are error pages that display a “this page was not found” error message with the incorrect HTTP status code “200 OK.” These error pages should use the HTTP status code “404 not found.”
  4. infinite spaces: For example, if your website has a calendar with a “next month” link, Google could follow these “next month” links forever. If your website contains automatically created pages that don't really contain new content, add the rel=nofollow attribute to these links.
  5. Low quality and spam content: Check if there are pages on your website that are not so good. If your website has many pages, removing these pages may result in better rankings.

If you don't block these page types, you'll waste server resources on unimportant pages that have no value. Excluding these pages will ensure that Google indexes the important pages on your site.

What does this mean for your website rankings on Google?

You probably won't have to worry about tracking budgets. If Google indexes your pages the same day they are published (or a day later), then you don't have to do anything.

Google crawls thousands of websites efficiently. If you have a very large site with tens of thousands of websites, it is more important to prioritize what to crawl and how many server resources to use.

Tracking is not a ranking factor. There are many factors that are used by Google's ranking algorithms. Crawl pace is not one of them.