> Main > Products and Services > Purchase > Company
English / Russian

CNSearch

The search engine for web-sites

Current Version

CNSearch 2.0.1

User Manual / CNSearch 1.5.1

4.2 Search Module

4.2.1 cnsearch.conf

The search module configuration file (cnsearch.conf by default) should be stored in the same directory with the file 'search.exe'(search.cgi for Unix). It is a text file specially optimized for fast processing. Cnsearch.conf consists of two parts:

The structure of the configuration file looks as follows:


::CONFIG regcode = Enter Oner registration code here
::CONFIG stats = password
::CONFIG content-type = text/html

::HTMLTOP
<HTML>
<TITLE>This is the top part of the HTML document</TITLE>
</HEAD>
<BODY>

::HTMLRESULT
<P>This the description of the found page.
There will be displayed 10 such descriptions.

::HTMLNOTFOUND
<P>This text will be displayed if
no search results will be found


::HTMLBOTTOM
This is the bottom part of the HTML document
</BODY>
</HTML>

One may use single-line commentaries in the configuration file. Each commentary starts with the symbol "#".

4.2.1.1 Configuration Settings

Configuring part of cnsearch.conf contains the following parameters:

Path

The parameter sets path to the search index. It can be used if you do not intend to store the search index in 'cgi-bin' directory or if you plan to use several search indexes.

For example:

::CONFIG path=/home/www/search/en/

For MS Windows:

::CONFIG path=d:\www\search\en\

Content-Type

The parameter defines Content-type field of the header. Default value is "text/html". Search results can be generated into XML-file as well.

For example:

::CONFIG content-type = text/xml

SearchType

The parameter sets search logic:

"And" logic is the fastest and is recommended in case the search index size exceeds 100Mb.

"Combined" logic is recommended for usage at small sites with the total of less than 50 pages.

For example:

::CONFIG SearchType = Combined

Stats

The parameter sets password for access to the statistics interface (see Statistics).

For example:

::CONFIG stats = secret

RegCode

The parameter sets the product registration code (see the detailed information at the official site).

For example:

::CONFIG regcode = JF7KF-KFJEP-4KSFT-K49GN-FJ40F

StopWords

The parameter specifies a term denoting stop-words displayed in search results (provided that %P parameter is enabled (found stop-words).

For example:

::CONFIG StopWords =, Ignored Words : 

MaxRelevance

The parameter sets maximum relevance of pages displayed at search results. The pages with relevancy value greater than MaxRelevance are ignored. This parameter allows improving search quality by means of "throwing out" pages with suspiciously high relevancy. As a rule, these pages do not contain a large amount of text or contain keywords, which are repeated too often.

For example:

::CONFIG MaxRelevance = 4000

NonStrictMatch

The parameter specifies a term denoting match of the search results to the search request (provided that %S parameter is enabled. It is used only with "Combined" search logic.

For example:

::CONFIG NonStrictMatch = [non strict match]

4.2.1.2 Templates Setting

The template part contains HTML code generating HTML-document with the search results. One should use special symbols within this code, which will be replaced by the corresponding text after the HTML document will be generated:

For example:

-- cnsearch.conf ----------------------------------------
# This is a cnsearch configuration file
                   
::CONFIG regcode = Enter Oner registration code here
::CONFIG stats = password
::CONFIG content-type = text/html
::CONFIG NonStrictMatch = [non strict match]
::CONFIG StopWords =, Ignored Words : 
::CONFIG SearchType = Combined

::HTMLTOP
<HTML>
<HEAD>
<TITLE>Search results - %Q</TITLE>
</HEAD>
<BODY>
<table width=400 height=40 align=center bgcolor=#C0C0C0>
<form action="%F" method=get><tr><td align=center>
<input type=text name=q size=40 maxlength=64 value="%Q">
<input type=submit value="Search">
</td></form></tr></table>
Documents found: %O
   <B>%O</B><font color=gray>%W<B>%P</B></font><br>
<br>
<div align=right>
Sort by: <a href="%A">date</a> | <a href="%L">relevancy</a>
</div>


::HTMLRESULT
<HR>
<UL>
<LI>%N. <a href="%U" target=_new>%T</A> <small>
        <font color=red>%S</font> [Relevancy: %R]</small>
<UL>
<LI>%E
<LI>%D
<LI>%C
<LI><a href="%U" target=_new>%u</A>
</UL>
</UL>


::HTMLNOTFOUND
<P><font color=red>%Q not found</font>


::HTMLBOTTOM
%B
</BODY>
</HTML>
-- end cnsearch.conf ------------------------------------

4.2.1.3 The Use of Different Templates

The system allows using various templates for creating different search interface modifications and exploiting different indexes during the search process. To use several templates one should set 'template' parameter in the source code of the search form. If 'template' is not set, standard 'cnsearch.conf' template is used.

Any optional name can be used for a template. A template's name should be composed only of Latin letters (upper or lower case) and Arabic numbers; it is not necessary to add 'conf'.

Correct variant:

<input type="hidden" name="template" value="black">

Incorrect variant:

<input type=hidden name="template" value='../black'>
<input type=hidden name="template" value='red.htm'>

Below is the example of a template allowing a user to:

Example:

-- en.conf ---------------------------------------------
::CONFIG path=/home/www/search/en
::CONFIG regcode = Enter Your registration code here
::CONFIG stats = password
::CONFIG content-type = text/html
::CONFIG NonStrictMatch = [non strict match]
::CONFIG StopWords =, Ignored Words : 
::CONFIG SearchType = Combined

::HTMLTOP
<HTML>
<HEAD>
<TITLE>Search results - %Q</TITLE>
</HEAD>
<BODY>
<table width=400 height=40 align=center bgcolor=#C0C0C0>
<form action="%F" method=get><tr><td align=center>
<input type=text name=q size=40 maxlength=64 value="%Q">
<input type=submit value="Search">
<select name=template>
<option value="en">English
<option value="es">Spanish
<option value="ru">Russian
</select>
</td></form></tr></table>
Documents found: %O
   <B>%O</B><font color=gray>%W<B>%P</B></font><br>
<br>
<div align=right>
Sort by: <a href="%A">date</a> | <a href="%L">relevancy</a>
</div>


::HTMLRESULT
<HR>
<UL>
<LI>%N. <a href="%U" target=_new>%T</A> <small>
        <font color=red>%S</font> [Relevancy: %R]</small>
<UL>
<LI>%E
<LI>%D
<LI>%C
<LI><a href="%U" target=_new>%u</A>
</UL>
</UL>


::HTMLNOTFOUND
<P><font color=red>%Q not found</font>


::HTMLBOTTOM
%B
</BODY>
</HTML>
-- end of en.conf ---------------------------------------

4.2.1.4 Searching Through Selected Sites

Starting with version 1.3 the system supports an option of searching through selected sites. Each site is assigned an order number at the indexing stage, starting with '0', for example:

[job localhost]
[Index]
URL             http://www.mysite.com/
Statistic       Append
CharSet         ByHTTPHeader
MaxFiles        10000
StopWordsFile   stopwords.txt
Exclude         search/,mail/,.zip,.gif,.jpg
[Index]
URL             http://www.second.com/
Statistic       Append
CharSet         ByHTTPHeader
[Index]
URL             http://www.test.com/
Statistic       Append
CharSet         ByHTTPHeader

Numbers of sites are assigned as follows:

0 - http://www.mysite.com/
1 - http://www.second.com/
2 - http://www.test.com/

Please pay attention to the fact that after the re-indexing one and the same number may be assigned to two different sites. For instance, upon re-indexing by means of the following configuration file:

[job addon]
[Index]
URL             http://www.newsite.com/
Statistic       Append
CharSet         ByHTTPHeader
MaxFiles        10000
StopWordsFile   stopwords.txt
Exclude         search/,mail/,.zip,.gif,.jpg

the site http://www.newsite.com/ will be assigned as number "0", or:

0 - http://www.mysite.com/
0 - http://www.newsite.com/
1 - http://www.second.com/
2 - http://www.test.com/

It is necessary to use "d" parameter to perform a search by selected sites. If the parameter is not set (default), the search is performed at all sites.

For example 3:

-- cnsearch.conf ----------------------------------------
::CONFIG regcode = Enter Your registration code here
::CONFIG stats = password

::HTMLTOP
<HTML>
<HEAD>
<TITLE>Search results - %Q</TITLE>
</HEAD>
<BODY>
<table width=400 height=40 align=center bgcolor=#C0C0C0>
<form action="%F" method=get><tr><td align=center>
<input type=text name=q size=40 maxlength=64 value="%Q">
<input type=submit value="Search">
<br>
<select name=d>
<option value="0">www.mysite.com, www.newsite.com
<option value="1">www.second.com
<option value="2">www.test.com
</select>
</td></form></tr></table>
Documents found: %O
   <B>%O</B><font color=gray>%W<B>%P</B></font><br>
<br>
<div align=right>
Sort by: <a href="%A">date</a> | <a href="%L">relevancy</a>
</div>


::HTMLRESULT
<HR>
<UL>
<LI>%N. <a href="%U" target=_new>%T</A> <small>
        <font color=red>%S</font> [Relevancy: %R]</small>
<UL>
<LI>%E
<LI>%D
<LI>%C
<LI><a href="%U" target=_new>%u</A>
</UL>
</UL>


::HTMLNOTFOUND
<P><font color=red>%Q not found</font>


::HTMLBOTTOM
%B
</BODY>
</HTML>
-- end cnsearch.conf ------------------------------------

4.2.1.5 Grouping of Search Results by Sites

Upon searching through a large amount of sites, search results may be often littered by pages of only one site. For example, for the search phraze "news" all the pages of a news site ending with " // Local news" will be found, and the results from other sites will be shifted back to hundreds or even thousands points.

In order to prevent this situation, large search engines, such as Google, Yandex and Rambler, display only one result form each site. Starting from version 1.5, this option is implemented at CNSearch as well.

to enable grouping by sites, one should add a hidden field group to the search request form:

-- cnsearch.conf ----------------------------------------
....
<BODY>
<table width=400 height=40 align=center bgcolor=#C0C0C0>
<form action="%F" method=get><tr><td align=center>
<input type="text" name="q" size="40" maxlength="64" value="%Q">
<input type="hidden" name="group" value="1">
<input type="submit" value="Search">
</td></form></tr></table>
....
-- end cnsearch.conf ------------------------------------

To allow users to perform more detailed search by one site of the search results, one can use the link "more from the site". It can be implemented by means of a special symbol %I:

-- cnsearch.conf ----------------------------------------
....
::HTMLRESULT
....
<LI>%N. <a href="%U" target=_new>%T</A> <small>
        <font color=red>%S</font> [Relevancy: %R]</small>
        [ <a href="%F?d=%I&q=%G">more from the site</a> ]
<UL>
....
-- end cnsearch.conf ------------------------------------

Back | Manual index | Next