
The robots.txt file is not a typical Google filter but it can have a dramatic impact on how search engines see your website. Most search engines try to respect the wishes of websites owners. They want to know which pages you want them to see and which you want to keep private.
In order for you to communicate with the search engine spiders which visit your site you need to speak the same language as them. The industry standard is called the Robots Exclusion Protocol. According to robotstxt.org, the authority on the robots.txt file:
The Robots Exclusion Protocol is a method that allows Web site administrators to indicate to visiting robots which parts of their site should not be visited by the robot.
In a nutshell, when a Robot vists a Web site, say http://www.foobar.com/, it firsts checks for http://www.foobar.com/robots.txt. If it can find this document, it will analyse its contents for records like:
User-agent: *
Disallow: /to see if it is allowed to retrieve the document.
You can also use a Robots Meta tag:
The Robots META tag allows HTML authors to indicate to visiting robots if a document may be indexed, or used to harvest more links. No server administrator action is required.
Note that currently only a few robots implement this.
In this simple example:
a robot should neither index this document, nor analyse it for links.
As Google and the other search engines will look to your robots file before adding your pages to their index, make sure that you are not using the robots.txt file incorrectly. Improper use can result in Google not adding your website or sections of your site.
Evan Carmichael
















Comments:
One Response to “An Intro To Google Filters – #10 False Use Of The Robots.txt File”