CNSearch 1.5.1
Parameters Index
To optimize indexing, you can use specific parameters:
- URL
- Extensions
- Type
- Path
- CharSet
- MaxFiles
- MinWords
- Exclude
- ExcludeVar
- AddOption
- StopWordsFile
- Language
- AFrom
- ATo
- StartWord
- Sleep
- ShowURL
- ShowEmail
- ShowFTP
- Compress
- MetaDescription
- MetaRobots
- UseRobotsTxt
- ConnectCount.
URL url
Address starting with 'http://...' in HTTP-indexing mode, or path to the site copy on a local disk in the local indexing mode.
For example:
For HTTP:
URL http://www.novgorod.ru/frisbee/
For a local disk (Windows):
URL c:/pub/home/frisbee/
For a local disk (Unix):
URL /pub/home/frisbee/
Extensions ext1,ext2,ext3
The parameter defines a list of files extensions to be indexed; it can be used in local disk mode only, and is ignored in HTTP indexing mode. Extensions are separated by "," (comma).
For example:
Extensions htm,html,shtml,shtm
Type typ
The parameter sets type of the search index:
- Normal;
- Abridged - an index file of a smaller size, which does not support displaying part of the text containing the highlighted search words. (See Search module)
Default value - normal
For example:
Type Strict
Path path
The parameter defines a path to the directory containing index and log-files.
For example:
Path c:\www\site.com
or
Path /home/www/site.com
CharSet cset
The parameter defines the method of character coding identification. There the following methods:
- ByMetaTag - identifies character set by means of META tag (default);
- ByHTTPHeader - identifies character set by HTTP header; in case the identification cannot be carried out by HTTP header, the system attempts to define it with the help of META tag. If both variants fail, the system assumes that a document has Windows-1251 character set;
- win-1251 - does not identify character set: by default - win-1251.
- koi8-r - does not identify character set: by default koi8-r.
For example:
CharSet ByHTTPHeader
MaxFiles num
The parameter sets maximum number of files to be indexed (10000 by default). Be careful: many web-servers contain a huge number of looped links.
For example:
MaxFiles 50
MinWords num
The parameter defines minimal number of words within the indexed document. Documents with a smaller number of words will not be added to the search index. This parameter allows improving quality of the search results by means of filtering out small and insignificant documents. Default value is 1.
For example:
MinWords 30
Statistic stat
The parameter defines the method of saving reports which are generated at the end of the indexing process and are saved to stats.log. Available options:
- No - do not save report.
- Append - append to existing file (by default).
- Overwrite - replace existing file.
For example:
Statistic Append
Exclude excl1,excl2,excl3
The parameter defines a list of words to be excluded from the indexing. Addresses containing at least one of the excluded words are not included into the indexing queue as well. Words are separated by "," (comma).
For example:
Exclude editpost.php?,reply.php?,admin/
ExcludeVar var1,var2,var3
The parameter defines a list of variables to be excluded from the site URL's. The variables are separated by "," (comma).
For example:
ExcludeVar PHPSESSID,order
AddOption opt
The parameter sets indexing method and can be used in HTTP indexing mode only. The following values are available:
- Page - only current page is indexed;
- SubPages - all pages which contain address of the starting page in their URL;
- Server - the whole server is indexed.
For example:
AddOption SubPages
StopWordsFile file
The parameter defines the name of the file containing stop-words (see Stop-words).
StopWordsFile stop.txt
The parameter defines the language. Provided that this parameter is specified, the field 'Accept-Language' will be included into the HTTP header. This variable can affect the document contents on some sites.
For example:
Language ru
AFrom path
The parameter defines a substring which will be replaced in URL by the string specified in the parameter ATo.
For example:
AFrom /home/dir/mysite/
ATo http://search.codenet.ru/
ATo url
The parameter defines a substring which will replace AFrom in the URL; it is used together with the AFrom.
For example:
AFrom http://127.0.0.1/
ATo http://www.codenet.ru/
or
AFrom c:/documents/www/www.codenet.ru/
ATo http://www.codenet.ru/
StartWord word
The parameter defines a word to start the indexing from. Page description will be composed of words following the starting one. Hence, it is possible to exclude menus and the like from the description.
For example:
StartWord about
Sleep seconds
The parameter defines the timeout between the site pages indexing (sec).
Example:
Sleep 5
ShowURL yesno
Displays the pages addresses during indexing. Default value is "yes".
Example:
ShowURL no
ShowEmail yesno
Displays the found e-mail addresses (mailto) during indexing. Default value is "no".
Example:
ShowEmail no
ShowFTP yesno
Displays the found FTP-addresses during indexing. Default value is "no".
Example:
ShowFTP no
Compress yesno
Requests the response compression from the server (in case the server supports this feature). Default value is "yes". Incorrect pages compression can lead to indexing failure.
Example:
Compress no
MetaDescription yesno
The parameter defines page description method. Description can be displayed in the search results with the help of the special symbol %E. Available values are "Yes" or "No". Default is 'No'. If 'Yes' is used, the system attempts to get description from '<META name="description...' tag. If tag cannot be found or the value is 'No', description is composed of the first words of the document.
For example:
MetaDescription Yes
MetaRobots yesno
If the parameter has value "No", the tag 'META name="robots"...' is ignored, otherwise the tag is analyzed for presence of NOINDEX, NOFOLLOW, NONE. More details can be found in the section The use of "Robots" META-tags. Default value is "Yes"
For example:
MetaRobots No
UseRobotsTxt <yesno>
If the parameter is set to "Yes", indexing algorithm is taken from the file 'robots.txt', stored in the web-server root directory. Default value is "No". More information on working with 'robots.txt' is available in the section Search robots. Robot's name is "CNSearch".
For example:
UseRobotsTxt yes
ConnectCount <num>
The parameter sets quantity of the remote file requests; default value is 5.
For example:
ConnectCount 10