Monday, September 6, 2010

Robots.txt - informing search engines

The robots.txt file is stored in the root level directory of the website to inform search engines how to interact with the web page, what to, and what not to go into and list in their directory. The following string is the format that is used and provides an example layout.

The file must reside in the root directory of your web. The URL path (web address) of your robots.txt file should look like this: www.yoursite/robots.txt

User-agent: *
Sitemap: http://www.yoursite/sitemap.xml.gz
Disallow: /secure/

To exclude ALL robots from the server:

User-agent: *
Disallow: /

To exclude a single robot from parts of a server:

User-agent: Named Bot
Disallow: /private/
Disallow: /images-saved/

The Robot Tag in Source Code

Where a robots.txt file can’t be uploaded onto a website server, the following robot tags can be included on the individual html pages:
<META NAME="robots" CONTENT="index,follow">

Robots.txt resources:

Robots.txt File Generator

Analyze robots.txt

Improving on Robots Exclusion Protocol

About /robots.txt

Tip: the robots.txt file is not the place where you should include comments in the code, as these can sometime be incorrectly misinterpreted and cause problems with the search spider.