Monday, August 20, 2007

Vanessa Fox Talks In Detail About Googlebot

As a Web developer I'll never cease my quest for the holy grail: understanding just how Google takes all those URLs it crawls and turns them into a tidy little list of serch engine result pages (SERPs).

I know I have a pretty good understanding--relative to that of the layman--but the more I learn the more I find that I don't know squat. But when Vanessa Fox, founder of Google Webmaster Central, authors a blog entitled All About Googlebot, I know I am going to learn something valuable.

I haven't been reading Fox's blog for long. But when I have read something she has written or heard one of her speaches I have found them to be refreshingly open and honest about Google's practices. And while Google may need to protect it's "secret sauce," we Web developers know that a lot of changes the company makes to its algorithms are in an effort to stop black-hat practices. So it's nice to have someone like Vanessa Fox out there to lend some insight to those of us who are doing things the right way and for the right reasons. Here are some Q&As from the recent Search Engine Strategies Conference she shared in her blog:





  1. If my site is down for maintenance, how can I tell Googlebot to come back later rather than to index the "down for maintenance" page?


    You should configure your server to return a status of 503 (network unavailable) rather than 200 (successful). That lets Googlebot know to try the pages again later.



  2. What should I do if Googlebot is crawling my site too much?


    You can contact us -- we'll work with you to make sure we don't overwhelm your server's bandwidth. We're experimenting with a feature in our webmaster tools for you to provide input on your crawl rate, and have gotten great feedback so far, so we hope to offer it to everyone soon.




  3. Is it better to use the meta robots tag or a robots.txt file?


    Googlebot obeys either, but meta tags apply to single pages only. If you have a number of pages you want to exclude from crawling, you can structure your site in such a way that you can easily use a robots.txt file to block those pages (for instance, put the pages into a single directory).




  4. If my robots.txt file contains a directive for all bots as well as a specific directive for Googlebot, how does Googlebot interpret the line addressed to all bots?


    If your robots.txt file contains a generic or weak directive plus a directive specifically for Googlebot, Googlebot obeys the lines specifically directed at it.