How To Control Googlebot’s Interaction With Your Website

googlebot crawling

Google’s Search Relations team provides insights into controlling Googlebot’s interactions with webpages on the latest ‘Search Off The Record’ podcast.

Highlights

    • You can’t block Googlebot from crawling specific sections of an HTML page.
    • The data-snippet HTML attribute or an iframe can control how content appears in search snippets.
    • A disallow rule in robots.txt or firewall rules using Googlebot’s IP addresses can block Googlebot from a site.

    Google’s Search Relations answered several questions regarding webpage indexing on the latest episode of the ‘Search Off The Record’ podcast.

    The topics discussed were how to block Googlebot from crawling specific sections of a page and how to prevent Googlebot from accessing a site altogether.

    Google’s John Mueller and Gary Illyes answered the questions examined in this article.

    Blocking Googlebot From Specific Web Page Sections

    Mueller says it’s impossible when asked how to stop Googlebot from crawling specific web page sections, such as “also bought” areas on product pages.

    “The short version is that you can’t block crawling of a specific section on an HTML page,” Mueller said.

    He went on to offer two potential strategies for dealing with the issue, neither of which, he stressed, are ideal solutions.

    Mueller suggested using the data-snippet HTML attribute to prevent text from appearing in a search snippet.

    Alternatively, you could use an iframe or JavaScript with the source blocked by robots.txt, although he cautioned that’s not a good idea.

    “Using a robotted iframe or JavaScript file can cause problems in crawling and indexing that are hard to diagnose and resolve,” Mueller stated.

    He reassured everyone listening that if the content in question is being reused across multiple pages, it’s not a problem that needs fixing.

    “There’s no need to block Googlebot from seeing that kind of duplication,” he added.

    Blocking Googlebot From Accessing A Website

    In response to a question about preventing Googlebot from accessing any part of a site, Illyes provided an easy-to-follow solution.

    “The simplest way is robots.txt: if you add a disallow: / for the Googlebot user agent, Googlebot will leave your site alone for as long you keep that rule there,” Illyes explained.

    For those seeking a more robust solution, Illyes offers another method:

    “If you want to block even network access, you’d need to create firewall rules that load our IP ranges into a deny rule,” he said.

    See Google’s official documentation for a list of Googlebot’s IP addresses.

    In Summary

    Though it’s impossible to prevent Googlebot from accessing specific sections of an HTML page, methods such as using the data-nosnippet attribute can offer control.

    When considering blocking Googlebot from your site entirely, a simple disallow rule in your robots.txt file will do the trick. However, more extreme measures like creating specific firewall rules are also available.

    Search Engine Journal

    Matt G. Southern

    Subscribe To Our Newsletter

    Get updates and learn from the best.

    More To Explore

    The best SEO Services Provider

    Don’t play hide and seek with people who are searching for you

    Get in touch with us today. Hurry!

    Sign Up & get a discount

    Let’s Do This!

    Fill out the form below, and we will get back to you within the next 24
    hours to complete the request, and then you’re all set to get started!