User Manual / CNSearch 1.5.1
4.2 Search Module
4.2.1 cnsearch.conf
The search module configuration file (cnsearch.conf by default) should be stored in the same directory with the file 'search.exe'(search.cgi for Unix). It is a text file specially optimized for fast processing. Cnsearch.conf consists of two parts:
- Configuration - the search module settings;
- Templates of the search results pages.
The structure of the configuration file looks as follows:
::CONFIG regcode = Enter Oner registration code here ::CONFIG stats = password ::CONFIG content-type = text/html ::HTMLTOP <HTML> <TITLE>This is the top part of the HTML document</TITLE> </HEAD> <BODY> ::HTMLRESULT <P>This the description of the found page. There will be displayed 10 such descriptions. ::HTMLNOTFOUND <P>This text will be displayed if no search results will be found ::HTMLBOTTOM This is the bottom part of the HTML document </BODY> </HTML>
One may use single-line commentaries in the configuration file. Each commentary starts with the symbol "#".
4.2.1.1 Configuration Settings
Configuring part of cnsearch.conf contains the following parameters:
Path
The parameter sets path to the search index. It can be used if you do not intend to store the search index in 'cgi-bin' directory or if you plan to use several search indexes.
For example:
::CONFIG path=/home/www/search/en/
For MS Windows:
::CONFIG path=d:\www\search\en\
Content-Type
The parameter defines Content-type field of the header. Default value is "text/html". Search results can be generated into XML-file as well.
For example:
::CONFIG content-type = text/xml
SearchType
The parameter sets search logic:
- And - pages containing all words of the search query will be found;
- Or - pages containing at least one word of the search query will be found;
- Combined - first "And" results are displayed, then "Or" results with the note "non strict match".
"And" logic is the fastest and is recommended in case the search index size exceeds 100Mb.
"Combined" logic is recommended for usage at small sites with the total of less than 50 pages.
For example:
::CONFIG SearchType = Combined
Stats
The parameter sets password for access to the statistics interface (see Statistics).
For example:
::CONFIG stats = secret
RegCode
The parameter sets the product registration code (see the detailed information at the official site).
For example:
::CONFIG regcode = JF7KF-KFJEP-4KSFT-K49GN-FJ40F
StopWords
The parameter specifies a term denoting stop-words displayed in search results (provided that %P parameter is enabled (found stop-words).
For example:
::CONFIG StopWords =, Ignored Words :
MaxRelevance
The parameter sets maximum relevance of pages displayed at search results. The pages with relevancy value greater than MaxRelevance are ignored. This parameter allows improving search quality by means of "throwing out" pages with suspiciously high relevancy. As a rule, these pages do not contain a large amount of text or contain keywords, which are repeated too often.
For example:
::CONFIG MaxRelevance = 4000
NonStrictMatch
The parameter specifies a term denoting match of the search results to the search request (provided that %S parameter is enabled. It is used only with "Combined" search logic.
For example:
::CONFIG NonStrictMatch = [non strict match]
4.2.1.2 Templates Setting
The template part contains HTML code generating HTML-document with the search results. One should use special symbols within this code, which will be replaced by the corresponding text after the HTML document will be generated:
- %Q - Query text;
- %G - Query text (urlencoded);
- %O - Quantity of found pages;
- %N - Number of a found page;
- %U - URL of a found page;
- %T - Name of a found page;
- %S - Match (displayed only in case of absolute match; otherwise nothing is displayed);
- %R - Relevance of a found page;
- %E - Description of a found page;
- %D - Date of the last update of a found page;
- %C - Character coding of a found page;
- %F - Name of a search script;
- %I - Number of the site in the search index;
- %P - Stop words found in the query;
- %W - Search phrase description;
- %L - Enable sorting by relevance;
- %A - Enable sorting by date of the document update;
- %B - Navigation links (< << 1 2 3 4 5 6 >> >)
For example:
-- cnsearch.conf ----------------------------------------
# This is a cnsearch configuration file
::CONFIG regcode = Enter Oner registration code here
::CONFIG stats = password
::CONFIG content-type = text/html
::CONFIG NonStrictMatch = [non strict match]
::CONFIG StopWords =, Ignored Words :
::CONFIG SearchType = Combined
::HTMLTOP
<HTML>
<HEAD>
<TITLE>Search results - %Q</TITLE>
</HEAD>
<BODY>
<table width=400 height=40 align=center bgcolor=#C0C0C0>
<form action="%F" method=get><tr><td align=center>
<input type=text name=q size=40 maxlength=64 value="%Q">
<input type=submit value="Search">
</td></form></tr></table>
Documents found: %O
<B>%O</B><font color=gray>%W<B>%P</B></font><br>
<br>
<div align=right>
Sort by: <a href="%A">date</a> | <a href="%L">relevancy</a>
</div>
::HTMLRESULT
<HR>
<UL>
<LI>%N. <a href="%U" target=_new>%T</A> <small>
<font color=red>%S</font> [Relevancy: %R]</small>
<UL>
<LI>%E
<LI>%D
<LI>%C
<LI><a href="%U" target=_new>%u</A>
</UL>
</UL>
::HTMLNOTFOUND
<P><font color=red>%Q not found</font>
::HTMLBOTTOM
%B
</BODY>
</HTML>
-- end cnsearch.conf ------------------------------------
4.2.1.3 The Use of Different Templates
The system allows using various templates for creating different search interface modifications and exploiting different indexes during the search process. To use several templates one should set 'template' parameter in the source code of the search form. If 'template' is not set, standard 'cnsearch.conf' template is used.
Any optional name can be used for a template. A template's name should be composed only of Latin letters (upper or lower case) and Arabic numbers; it is not necessary to add 'conf'.
Correct variant:
<input type="hidden" name="template" value="black">
Incorrect variant:
<input type=hidden name="template" value='../black'> <input type=hidden name="template" value='red.htm'>
Below is the example of a template allowing a user to:
- Select the required index for a search. It can also be gained by means of defining the required template in the Path parameter (see Configuration settings).
The following way to index files is defined in the template:
::CONFIG path=/home/www/search/en
- Select the required configuration file for using in a search process (by means of 'template' parameter). This example demonstrates the possibility of selection between en.conf, es.conf, and ru.conf templates (these templates will be shown in the search form in a list).
Example:
-- en.conf ---------------------------------------------
::CONFIG path=/home/www/search/en
::CONFIG regcode = Enter Your registration code here
::CONFIG stats = password
::CONFIG content-type = text/html
::CONFIG NonStrictMatch = [non strict match]
::CONFIG StopWords =, Ignored Words :
::CONFIG SearchType = Combined
::HTMLTOP
<HTML>
<HEAD>
<TITLE>Search results - %Q</TITLE>
</HEAD>
<BODY>
<table width=400 height=40 align=center bgcolor=#C0C0C0>
<form action="%F" method=get><tr><td align=center>
<input type=text name=q size=40 maxlength=64 value="%Q">
<input type=submit value="Search">
<select name=template>
<option value="en">English
<option value="es">Spanish
<option value="ru">Russian
</select>
</td></form></tr></table>
Documents found: %O
<B>%O</B><font color=gray>%W<B>%P</B></font><br>
<br>
<div align=right>
Sort by: <a href="%A">date</a> | <a href="%L">relevancy</a>
</div>
::HTMLRESULT
<HR>
<UL>
<LI>%N. <a href="%U" target=_new>%T</A> <small>
<font color=red>%S</font> [Relevancy: %R]</small>
<UL>
<LI>%E
<LI>%D
<LI>%C
<LI><a href="%U" target=_new>%u</A>
</UL>
</UL>
::HTMLNOTFOUND
<P><font color=red>%Q not found</font>
::HTMLBOTTOM
%B
</BODY>
</HTML>
-- end of en.conf ---------------------------------------
4.2.1.4 Searching Through Selected Sites
Starting with version 1.3 the system supports an option of searching through selected sites. Each site is assigned an order number at the indexing stage, starting with '0', for example:
[job localhost] [Index] URL http://www.mysite.com/ Statistic Append CharSet ByHTTPHeader MaxFiles 10000 StopWordsFile stopwords.txt Exclude search/,mail/,.zip,.gif,.jpg [Index] URL http://www.second.com/ Statistic Append CharSet ByHTTPHeader [Index] URL http://www.test.com/ Statistic Append CharSet ByHTTPHeader
Numbers of sites are assigned as follows:
0 - http://www.mysite.com/ 1 - http://www.second.com/ 2 - http://www.test.com/
Please pay attention to the fact that after the re-indexing one and the same number may be assigned to two different sites. For instance, upon re-indexing by means of the following configuration file:
[job addon] [Index] URL http://www.newsite.com/ Statistic Append CharSet ByHTTPHeader MaxFiles 10000 StopWordsFile stopwords.txt Exclude search/,mail/,.zip,.gif,.jpg
the site http://www.newsite.com/ will be assigned as number "0", or:
0 - http://www.mysite.com/ 0 - http://www.newsite.com/ 1 - http://www.second.com/ 2 - http://www.test.com/
It is necessary to use "d" parameter to perform a search by selected sites. If the parameter is not set (default), the search is performed at all sites.
For example 3:
-- cnsearch.conf ----------------------------------------
::CONFIG regcode = Enter Your registration code here
::CONFIG stats = password
::HTMLTOP
<HTML>
<HEAD>
<TITLE>Search results - %Q</TITLE>
</HEAD>
<BODY>
<table width=400 height=40 align=center bgcolor=#C0C0C0>
<form action="%F" method=get><tr><td align=center>
<input type=text name=q size=40 maxlength=64 value="%Q">
<input type=submit value="Search">
<br>
<select name=d>
<option value="0">www.mysite.com, www.newsite.com
<option value="1">www.second.com
<option value="2">www.test.com
</select>
</td></form></tr></table>
Documents found: %O
<B>%O</B><font color=gray>%W<B>%P</B></font><br>
<br>
<div align=right>
Sort by: <a href="%A">date</a> | <a href="%L">relevancy</a>
</div>
::HTMLRESULT
<HR>
<UL>
<LI>%N. <a href="%U" target=_new>%T</A> <small>
<font color=red>%S</font> [Relevancy: %R]</small>
<UL>
<LI>%E
<LI>%D
<LI>%C
<LI><a href="%U" target=_new>%u</A>
</UL>
</UL>
::HTMLNOTFOUND
<P><font color=red>%Q not found</font>
::HTMLBOTTOM
%B
</BODY>
</HTML>
-- end cnsearch.conf ------------------------------------
4.2.1.5 Grouping of Search Results by Sites
Upon searching through a large amount of sites, search results may be often littered by pages of only one site. For example, for the search phraze "news" all the pages of a news site ending with " // Local news" will be found, and the results from other sites will be shifted back to hundreds or even thousands points.
In order to prevent this situation, large search engines, such as Google, Yandex and Rambler, display only one result form each site. Starting from version 1.5, this option is implemented at CNSearch as well.
to enable grouping by sites, one should add a hidden field group to the search request form:
-- cnsearch.conf ---------------------------------------- .... <BODY> <table width=400 height=40 align=center bgcolor=#C0C0C0> <form action="%F" method=get><tr><td align=center> <input type="text" name="q" size="40" maxlength="64" value="%Q"> <input type="hidden" name="group" value="1"> <input type="submit" value="Search"> </td></form></tr></table> .... -- end cnsearch.conf ------------------------------------
To allow users to perform more detailed search by one site of the search results, one can use the link "more from the site". It can be implemented by means of a special symbol %I:
-- cnsearch.conf ----------------------------------------
....
::HTMLRESULT
....
<LI>%N. <a href="%U" target=_new>%T</A> <small>
<font color=red>%S</font> [Relevancy: %R]</small>
[ <a href="%F?d=%I&q=%G">more from the site</a> ]
<UL>
....
-- end cnsearch.conf ------------------------------------
Back | Manual index | Next