Barry Schwartz gives a reference to discuss the selection priority entries in the robots.txt search engine robots. I've always wondered how you can make such a diverse and fanciful bugs in such a simple file exceptions to the clear and unambiguous format. It is possible, of course, the sin of the large number of extensions, which adds to the standard of every major system is widely known in narrow circles of the robot names: Google, Yahoo, msn, Yandex. But in this case and questions about the robots.txt would arise primarily is Extensions. We return to the priorities.
As is known, the entries in robots.txt separated by blank lines, with each entry – an instruction to one or more robots. Suppose we have the following contents of the file exception: User-agent: * Disallow: / dir / file User-agent: Yandex Disallow: / Reports User-agent: Googlebot Disallow: / users Allow: / best-page.html question was, what directives in this case will be guided by the robot of Google, which for him would be prohibited? You would think that the robot will stumble in the first place on the section for all the robots and namely its rules to take into consideration. This is an incorrect assumption. Robot when parsing the file works like the following algorithm: Gets the full file Allocates the file the correct section Wants 'their' section if its section is found to take the leadership of its instructions. If your partition is not found, looking for a section for all robots if found section for all the robots, takes to the leadership of its instructions If the total section was not found robot believes that can be indexed without exception. Hence, make several conclusions: The order of sections in the file does not matter. If it is found 'own section', then the robot will be guided only by her instructions, ignoring all the rest, so in our example the robot is absolutely fair to Google to index / dir / file. Lack of general sections – to allow robots to index the entire site, not mentioned in any section.
Recent Comments