Friday, November 13, 2015

Looking at robots.txt for SEO optimisation

What is robots.txt?

Robots.txt is a text file that contains rules to control the behaviour of search engines as they crawl your site, known as the 'robots directive'. You can create using any text editor software such as Notepad.



Each website created may include a robot.txt file located at the root level of the server. It may also be located at domain.com/robots.txt. This file is what search engines initially look for when they start to identify and crawl your website. As a robot directive, it explains to the search engine what it can and cannot do on the website. The file can either communicate to all search engines or selectively isolate and put independent requirements on individual search engines.

What should you exclude in your robots.txt file?

The robot.txt file may exclude areas in the site that you don’t want the search engine to discover, crawl, or particularly index. Without the robot.txt file, the results will be published in the search engine results page, allowing the general public to identify and access sections of the site that may not be appropriate. These sections may be directories of programming elements, a secure or admin section on the site, or components of your content management systems, depending on what you’re using, that can be blocked from the robots so they don’t crawl and index those elements.

Robots.txt Example:


It is important to be aware that some useful sections of the site could be inappropriately blocked. A common example is the /images directory. On average, five to seven percent of search results may be initiated through an image-based search with a user clicking through to the actual website. When the images directory is blocked, search engines may not be able to discover, identify, and list within their own directories of image search capabilities those images that are related to your business. For many businesses, this could benefit and support Search Engine Optimisation activities.

Sitemaps Location



Another thing you can see in a robot.txt file is a pointer to the location of your Sitemap. A Sitemap is a digital directory written in extensible mark-up language (XML) format which lists all the pages of your website that you wish to have indexed by the search engine. Putting a pointer within your robot.txt file to this location allows search engines to automatically discover your Sitemap and additional Sitemap files you have included, such as a geo Sitemap, and an images Sitemap. Supporting search engines in discovering these is beneficial for your site. For this reason, consider having a pointer to either a master Sitemap file or individual Sitemap files for your website.

The robots directive not only blocks certain robots’ actions as well as the pages to be crawled and indexed. It also determines speed. If the site has any particular issues in hosting, for example, where a crawl may slow down the actual site’s performance and affect the users’ experience, you can actually put speed controls to determine how fast the robot approaches and works its way through your website.

Google Webmaster Tools (GWT) and Bing Webmaster Tools (BWT) allows you to remove individual page files through single requests. The crawling speed may also be supported and addressed within GWT and BWT apart from being listed in the robot.txt file. You can even monitor the traffic, speed, and the amount of data used by the search engines over a period of time to determine if the performance relates to any issues or concerns for your website.

HTML Meta Directives

If you want to specify rules per HTML page, you can do this using HTML meta directives. This informs the search engines a specific action for a certain page.

Google, Bing, and Yahoo have implemented a number of HTML Meta directives including the following:
  • NOINDEX META Tag – This tells a crawler not to index a certain page.
  • NOFOLLOW META Tag – This tells a crawler not to follow a link going to another content on a certain page.
  • NOSNIPPET META Tag – This tells a crawler not to display snippets in the search results for a certain page.
  • NOARCHIVE META Tag – This tells a crawler not to show a cached link for a certain page.
  • NOODP META Tag – This tells a crawler not to use a title and snippet from ODP (Open Directory Project) for a certain page.
Having your content indexed by major search engines can be a frustrating and time-consuming experience, but it can be done. By executing the techniques mentioned above, you can get confidential content removed quickly and prevent it from showing on search results pages. 

Search Group, a local Perth SEO company helps keep your business visible online through effective SEO and Internet marketing strategies. Visit www.searchroup.com.au for details about our services.