Back to the basics: The robots.txt file
By Benj Arriola - Posted on Sat Feb 10, 2007What is a robots.txt file?
Robots.txt is a text file that resides on the root URL of your website. In this file has all the rules of pages you wish to exclude in the search engine results. This file has specific instructions on what files or folders you wish to disallow robots (also called bots, crawlers, spiders) from going over your selected pages and getting them indexed.
Making your own robots.txt file is not difficult. You can make your own by simply placing all pages you wish to disallow search engine robots to index in a text file named robots.txt with content similar to the format below:
User-Agent: *
Disallow: /path/User-Agent: *
Disallow: /path/file.html
The asterisk (*) is a wildcard meaning you are telling all types of robots to follow the disallow rule below it. If you wish to exclude only a specific bot, you need to know the name. For instance to disallow only Google for crawling a certain page, you can do something like:
User-Agent: Googlebot
Disallow: /path/file.html
If you decide to exclude a folder name without a file name, this will exclude all files within that folder. For example:
User-Agent: *
Disallow: /path/
If you have the pages found on /path/file1.html and /path/file2.html, these will both be excluded in the search engine results.
A site without a robots.txt does not mean your site will not be indexed, but when search engines crawl your robots.txt and this forwards to an error 404 page, might cause some problems. And looking at it the other way, a site with robots.txt does not also help increase your ranking directly. It may be of help if you are not allowing search engines to crawl pages that are giving your site some duplicate content.
Now why should you disallow a page? Isn’t it nice to have search engines crawl all your pages? Well sometimes you may have your own reason why to exclude a page or pages in the search results. Some reasons can also be to avoid duplicate content. It’s really up to you what your reasons are. ![]()
One Response to “Back to the basics: The robots.txt file”
Leave a Reply




























February 10th, 2007 at 6:58 am
[...] or maybe even the non-SEO web designer or web developer knows this already. But I just posted in my SEO tips how to do robots.txt. But aside from showing you how to do a robots.txt, my robots.txt shows you how you should not do a [...]