CNStats - the Best Solution for the Site Statistics Problems
The article is dedicated to means and ways of site statistics recording.
The site statistics, as used here, is the visitors' data accumulation along with an instrument for the attendance analysis.
Your web-site visitors are subdivided into two large categories: users and search bots.
We'd like lots of interested people to visit our web-site.
Users are people visiting your site by means of browsers. As a rule, users download the whole web-pages, view pictures, and work with java-script.
These are the most profitable customers; you should have the detailed information about them.
We'd like our site to be found easily by means of search engines (key words, among the top found results). Therefore, it's very significant for us to track robots activity on our web-site. We want to create the site SEO - Search Engine Optimization in order to optimize the site for search bots.
Search bots (robots or crawlers) are applications, performing tasks of search engines and directories. Bots attend all sites for the purpose of updating the search index: they download your web-site pages and that is why your site can be found in Google, for instance.
The main peculiarity of the bots is as follows: they don't download pictures, as the latter are not required for search index.
There are three ways of the attendance statistics accumulation:
- Web-server log-files;
- Data accumulation in the local database (CNStats);
- Data accumulation on an off-site statistics server.
Data Accumulation on an Off-site Statistics Server
There are two key-words: "counter" and "off-site". "Off-site" means that all the information will be stored on a remote server (security problems arise); statistics accuracy will depend on communication channels reliability and the server software. "Counter" denotes that you should install html-code displaying the picture, located on the remote server, onto your server. Therefore, robots are automatically excluded from the list of our visitors.
Thus, a remote statistics server can be useful only in the following situations:
- Participation in the rating of sites with similar subjects (getting visitors who view ratings);
- Impossibility to install your own statistics accumulation and analysis system.
Note: Some servers try to substitute a picture for various inclusions (for instance, php-code can be added to your code). This is a good tendency, but you should be careful about the server being an off-site one. It means that the slightest failure of the remote server can disable your own server.
The key concept is the following: there are no extra log-files whatsoever. Generally speaking, it's the only correct way for data statistics storage for a long period of time (a year or more). However, a log-file is not a site statistics; it's just source data. These files require an application. There are two types of such applications:
- Programs running on a web-server, where the site is located;
- Programs, which require a log-file to be downloaded to Windows computer for local analysis.
A general disadvantage of these applications is the impossibility to fulfill on-line monitoring of the site functioning.
File downloading is rather complicated and disadvantageous.
There is also an apt variant of setting the logs rotation on the server according to the desired storage period and current contents, and then using free log analyzer on the server side. A free analyzer should be able to function efficiently, to set up analysis time periods and fulfill the condition search; this will be quite enough.
Note: you shouldn't store all data forever. In practice, logs just occupy space at your disks. There is a lot of useless information; for instance, pictures downloading data. 30-60 days for a storage period will be quite enough in 99.999% cases.
Thus, log-files should be used in case it's necessary to store all the requests data for the whole period of functioning.
Data Accumulation in the Local Database
This is the only way which allows counting both robots and users, as well as monitoring and analysing their functions right at the moment of action taking place. Instant access is available to any attendance data, stored in the database.
There is a nuance though - database efficiency and advanced complexity of the contents. However, it will be enough to set up the system correctly only once. As far as database efficiency is concerned: if your web-site works with this database, then the statistics will also function as a part of the site.
Thus, data accumulation in the local database is a very convenient way for the following sites:
- Commercial sites, where on-line monitoring of the visitors is significant;
- Newly created sites;
- Small and average sites (about 10 000 unique hosts per day) using database for their core function.
We have considered here only the means of statistic data accumulation; the statistics applications functions will be examined further in the next article.
Finally, we'd like to mention commercial constituent of your site. In any case, you spend money on the site. The site statistics should earn you profit. Here are the questions more useful than any explanations:
External counter, off-site statistics servers: Whom do you bring profit to, using external counters? Whom does the picture advertise? Whom do you raise citation index for? Whom do you pay and what do you get in return?
Server log-files: What do you store gigabytes of logs for? What is the profit of occupying the server place? How often do you have to search for logs more than a month old? Is it convenient? Are log processing applications efficient enough? Do you always respond to visitors' actions immediately?
Data accumulation in the local database: Does your database experience any critical overloads or stay idle? Do you need on-line visitors monitoring? Is it sufficient for you to analyze robots activity on the site?