Monday, September 6, 2010

Robots.txt - informing search engines

The robots.txt file is stored in the root level directory of the website to inform search engines how to interact with the web page, what to, and what not to go into and list in their directory. The following string is the format that is used and provides an example layout.

The file must reside in the root directory of your web. The URL path (web address) of your robots.txt file should look like this: www.yoursite/robots.txt

User-agent: *
Sitemap: http://www.yoursite/sitemap.xml.gz
Disallow: /secure/

To exclude ALL robots from the server:

User-agent: *
Disallow: /

To exclude a single robot from parts of a server:

User-agent: Named Bot
Disallow: /private/
Disallow: /images-saved/

The Robot Tag in Source Code

Where a robots.txt file can’t be uploaded onto a website server, the following robot tags can be included on the individual html pages:
<META NAME="robots" CONTENT="index,follow">

Robots.txt resources:
http://tools.seobook.com/robots-txt/

Robots.txt File Generator
http://tools.seobook.com/robots-txt/generator/

Analyze robots.txt
http://tools.seobook.com/robots-txt/analyzer/

Improving on Robots Exclusion Protocol
http://googlewebmastercentral.blogspot.com/2008/06/improving-on-robots-exclusion-protocol.html

About /robots.txt
http://www.robotstxt.org/robotstxt.html


Tip: the robots.txt file is not the place where you should include comments in the code, as these can sometime be incorrectly misinterpreted and cause problems with the search spider.