CNSearch 1.5.1
Structure
The structure and semantics of '/robots.txt" are as follows:
The file must contain one or several records separated by one or several lines (ending with CR, CR/NL, or NL). Each record must contain lines: "<field>:<optional_space><value><optional_space>".
Field <field> is register-independent.
Comments can be included in usual UNIX way: symbol '#' denotes start of a comment, end of line denotes end of a comment.
A record should start with one or more 'User-Agent' lines followed by one or more 'Disallow' lines. Unrecognized lines are ignored.
User-Agent:
- Value of this field must be the name of a search robot. Access rights for the robot are set up in this record;
- Though the standard allows indicating names of several robots, CNSearch recognizes only one, because method of separating names of robots, described in the standard, is not realized in the system.
- Upper or lower-case letters are equal;
- If value of this field is '*', then access rights set up in the record are valid for any search robot requested '/robots.txt' file.
Disallow:
- Value of this field must be a partial URL which should not be indexed. Path to the file must be full or partial. For example, 'Disallow: /help' denies access both to '/help.html' and '/help/index.html', while 'Disallow: /help/' denies access to '/help/index.html' only.
- Any record must contain at least one 'User-Agent' line and one 'Disallow' line.
If '/robots.txt' is empty, do not correspond to the above-mentioned structure and semantics or is missing, then search robots act according to their settings.