Busby SEO Challenge SEM/SEO Tips

Search Engine Marketing/Search Engine Optimization Tips

Back to the basics: The robots.txt file

By Benj Arriola - Posted on Sat Feb 10, 2007

What is a robots.txt file?

Robots.txt is a text file that resides on the root URL of your website. In this file has all the rules of pages you wish to exclude in the search engine results. This file has specific instructions on what files or folders you wish to disallow robots (also called bots, crawlers, spiders) from going over your selected pages and getting them indexed.

Making your own robots.txt file is not difficult. You can make your own by simply placing all pages you wish to disallow search engine robots to index in a text file named robots.txt with content similar to the format below:

User-Agent: *
Disallow: /path/

User-Agent: *
Disallow: /path/file.html

The asterisk (*) is a wildcard meaning you are telling all types of robots to follow the disallow rule below it. If you wish to exclude only a specific bot, you need to know the name. For instance to disallow only Google for crawling a certain page, you can do something like:

User-Agent: Googlebot
Disallow: /path/file.html

If you decide to exclude a folder name without a file name, this will exclude all files within that folder. For example:

User-Agent: *
Disallow: /path/

If you have the pages found on /path/file1.html and /path/file2.html, these will both be excluded in the search engine results.

A site without a robots.txt does not mean your site will not be indexed, but when search engines crawl your robots.txt and this forwards to an error 404 page, might cause some problems. And looking at it the other way, a site with robots.txt does not also help increase your ranking directly. It may be of help if you are not allowing search engines to crawl pages that are giving your site some duplicate content.

Now why should you disallow a page? Isn’t it nice to have search engines crawl all your pages? Well sometimes you may have your own reason why to exclude a page or pages in the search results. Some reasons can also be to avoid duplicate content. It’s really up to you what your reasons are. :)

Share and Enjoy:
  • blinkbits
  • BlinkList
  • blogmarks
  • co.mments
  • connotea
  • del.icio.us
  • De.lirio.us
  • digg
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • NewsVine
  • Netvouz
  • RawSugar
  • Reddit
  • description
  • Shadows
  • Simpy
  • Smarking
  • Spurl
  • TailRank
  • Wists
  • YahooMyWeb

One Response to “Back to the basics: The robots.txt file”

  1. Ituloy AngSulong News » How to do and not to do robots.txt Says:

    [...] or maybe even the non-SEO web designer or web developer knows this already. But I just posted in my SEO tips how to do robots.txt. But aside from showing you how to do a robots.txt, my robots.txt shows you how you should not do a [...]

Leave a Reply

XHTML: You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Links will be appended with rel="nofollow" attributes.

Locations of visitors to this page
KeywordDiscovery.com Keyword Research Tool Wordtracker Keyword Research