|
||||
|
The Robots.txt FileMany search engines not only remember what is on your website but also provide a cached, or offline copy, as well. With websites that have pages that are updated daily this may not be something that you would want to happen. Especially if you are a retail store and you do not want anyone to be able to look at yesterday's prices. That is where the robots.txt file comes in. Just create this simple text file and then you can specify pages or entire sections that a search engine (or robot) cannot index. Clue: The robots.txt goes in the root directory of your website. All internet robots are required to look for this file in your root directory before scanning your website. Creating the Robots.txt FileAll you have to do is create a text file with the correct commands. A Robots.txt file in its simplest form looks like this. As you can see the '*' sign is a wild card that is used to alloy to all search engines. But you can give specific commands to each search engine as well. The next line used the Disallow command and specifies what the earch engine is not allowed to index. As you can see here, we do not give the search engine any limits as to what it can index on the NetworkClue website. Other optionsDisallow: /folder This first line shows how to block a specific folder from being indexed. The next line shows how to block a specific file. Clue: Even if you are not planning on limiting what a search engine can index, it is a good idea to have a robots file that specifies that. It in essence gives the search engine 'permission' to index your site. Robots.txt commands in the HTML CodeYou can also list robots.txt commands in side the HTML code using Meta Tags. Read about it in our HTML Reference Guide for more information. To read up more on robots.txt files browse to: http://www.robotstxt.org. Article last reviewed: 08/17/2003
|
Related Articles: Advertise Here |
||||||||