NetworkClue.com
NetworkClue Home PageHome Contact UsContact ConsultingConsult
Bulletin Board
Internet Services covers Secrets to hosting websites, Hosting your own web server, and using DNS Servers.Operating Systems leads you through the decision of Linux vs. Windows, ideal installations and setups to create an efficient and redundant environment for your business, and great features to make management easier.Routing & Firewalls contains articles that will allow you to take control of your router. Learn how to protect your company with access lists and advanced firewall techniques.Hardware answers the common questions about Switches vs. Hubs, recommends SysAdmin Tools, and recommendations for adequate power protection.Utilities will cover fighting spam, using Anti-Virus programs effectively and the must haves for every administrator's software toolbox.

Bulletin Board

The Robots.txt File

Many search engines not only remember what is on your website but also provide a cached, or offline copy, as well. With websites that have pages that are updated daily this may not be something that you would want to happen. Especially if you are a retail store and you do not want anyone to be able to look at yesterday's prices.

That is where the robots.txt file comes in. Just create this simple text file and then you can specify pages or entire sections that a search engine (or robot) cannot index.

Clue: The robots.txt goes in the root directory of your website. All internet robots are required to look for this file in your root directory before scanning your website.

Creating the Robots.txt File

All you have to do is create a text file with the correct commands. A Robots.txt file in its simplest form looks like this.

As you can see the '*' sign is a wild card that is used to alloy to all search engines. But you can give specific commands to each search engine as well. The next line used the Disallow command and specifies what the earch engine is not allowed to index. As you can see here, we do not give the search engine any limits as to what it can index on the NetworkClue website.

Other options

Disallow: /folder
Disallow: file.html

This first line shows how to block a specific folder from being indexed. The next line shows how to block a specific file.

Clue: Even if you are not planning on limiting what a search engine can index, it is a good idea to have a robots file that specifies that. It in essence gives the search engine 'permission' to index your site.

Robots.txt commands in the HTML Code

You can also list robots.txt commands in side the HTML code using Meta Tags. Read about it in our HTML Reference Guide for more information.

To read up more on robots.txt files browse to: http://www.robotstxt.org.

Article last reviewed: 08/17/2003


del.icio.us

Created by: Digital Foundation, inc.

Copyright © 2002-2005 Digital Foundation, inc.   www.networkclue.com

All content of the NetworkClue website is copyrighted. Articles, notes, outlines, and all other materials may not be stored on the Internet or sold or placed by themselves or with other material in any electronic or printed format in whole or part. However materials may be referenced by links to the site.

 

Related Articles:
Website Best Practices
Keywords & Meta Tags
Search Engine
   Submittal
HTML Quick Reference
Web Servers

 

Advertise Here