Nov 15 2009

10 common mistakes using robots.txt on your blog

Robots.txt is a special file which is located in the root of each server which is a plan text file which allows the administrator of a website to define which web content need to be allowed and disallowed for the bot which visitors their website.

All major search engine like Google, Yahaoo and MSN agrees to the Robots Exclusion Protocol. There are several elements that every website owner need to understand for a easing crawling of their website. Following are the top 10 common mistakes to be avoided while create a robots.txt file.

1. Adding robots.txt not under the root directory – This is one of the common mistake webmaster do. They upload the robots.txt file at the wrong place it must reside in the root of the domain and must be named “robots.txt”. A robots.txt file uploaded in subdirectory is not a valid one since blots check for robots.txt file only in the root of the domain name.

User-agent: *
Disallow:

2. Wrong syntax in robots.txt – Another explanation is that the Webmaster used the wrong syntax when creating the robots.txt. Therefore, always double check the robots.txt file using tools like Robots.txt Checker
Here is an example

User-agent: *
Disallow: private.html

We advise you to start a file/directory name with a leading slash char (Example: /private.html).

3. Adding comment at the end of the sentence instead of at the beginning – If you wish to include comments in your robots.txt file, you should precede them with a # sign like this:

# Here are my comments about this entry.
User-agent: *
Disallow:

4. Empty robots.txt file almost like not having one – If you have created a robots.txt file under your root directory and there is nothing in it, then it is similar like not having one. Because nothing is disallowed or no User-agent is given, everything is allowed for every bots.

5. Blocking the pages which you need to get indexed – If you are blocking spider bots and pages using robots.txt you should have thorough understanding of the syntax to be used any mistake can cause you huge problem with the spiderbots.

6. URL’s Paths are case sensitive – URL paths are often case sensitive, so be consistent with the site capitalization WARNING! Many robots and webservers are case-sensitive. So this path will not match any root-level folders named private or PRIVATE.

7. Misspelled robots/user agent names – SpiderBots will ignore mispelled User-Agent names. Check out your raw server log to find User-Agent name which you need to be blocked. Check out UserAgentString.com for a list of User Agent name.

8. Don’t add all the files in one single line – Some of the common mistake is adding all the files under on disallow.
For example

User-agent: *
Disallow: /private/ /images/ /javascript/

This is a wrong syntax and robots will not understand this format. The correct syntax is given below.

User-agent: *
Disallow: /private/
Disallow: /images/
Disallow: /javascript/

9. No allow command in robots.txt – There is only one command that is Disallow: and there is no command called Allow: So if you want to allow the bots to visit the page just don’t add the files.

10. Missing the colon – Missing the colon in Disallow and User-agent entry. Here is one of the example of a missing colon entry.

#This is a wrong entry
User-agent: googlebot
Disallow /

#The correct entry
User-agent: googlebot
Disallow: /

Please leave your comment if you find any other common mistakes which need to be avoided while generating a robots.txt file. Also below are few robots.txt useful resources and tools.

http://www.mcanerin.com/en/search-engine/robots-txt.asp
http://webtools.live2support.com/se_robots.php
http://googlewebmastercentral.blogspot.com/2008/03/speaking-language-of-robots.html

source : thomsonchemmanoor.com



Related Posts:
Sitemaps, Robots File, and Automated Pinging
The four major Search Engines announced recently that they now look for the sitemap URL in a site's robots.txt file. 
Article Title: Avoid These Common Mistakes
Every site you submit articles to may have different editorial style guides. Here are some helpful tips to help you
Creating Compelling Content Write It And They Will Come
In a previous ezinearticles.com article,"I Need real visitors, Not Search Spiders" we discussed the need for content. Well written, insightful,
Blogs Can Greatly Promote Your Home Business
There are a small number of retailers who truly understand that when you create a blog to promote your home
Tips for beginners – How To Earn Money Online From Home ?
The global financial crisis has made so many people jobless. Even if you didn?t lose your job, you may nonetheless
Sponsors:

Comments are closed.

Money Makers

Buy 10,000 links for just 12 USD a month. Rocket your website to the TOP! Get Chitika Premium