Logging visitors
All web servers have a log utility of some sort. In Apache the access log is a plain text file. You can read and interpret it by reading it but it is not an efficient way of knowing about visitors. On the other hand there are utilities you can use to various interpret Apache log files and produce nice graphs and reports with statistics. They are just fine. The problem is that they are always external things and in most cases contain too much information. What we need is a simple log that contains the following information:
- Date and time of the visit
- The IP number of the visitor
- The URL he opened
- The address of the referring page
Notes before adding the module
Please note that this module runs one database insert query and at two read queries on every browser request. If you are running a utterly busy site (dozens of hits per second) this may have an effect on the performance. On a normal siten you should not encounter any problems. If you are worried you can decide not to use the module or make a few optimizations.
From the date above it is easy to create filtered views by day, week, month, country, visitor domain or URL, for instance. We will start by creating the database table:
CREATE TABLE cms_log ( id int(11) NOT NULL AUTO_INCREMENT, visit_time datetime DEFAULT NULL, ip varchar(255) DEFAULT NULL, reverse_dns varchar(255) DEFAULT NULL, country char(3) DEFAULT NULL, url varchar(255) DEFAULT NULL, referer varchar(255) DEFAULT NULL, PRIMARY KEY (id), KEY country (country), KEY visit_time (visit_time) )
We get the visit_time from the system clock, ip,url and referer from the $_SERVER array. Country information can be in most cases be obtained from the ip-to-country database. Create the table for it:
CREATE TABLE iptoc ( IP_FROM bigint(16), IP_TO bigint(16), country_code2 char(2), country_code3 char(3), country_name varchar(50), KEY IP_FROM (IP_FROM), KEY IP_TO (IP_TO), KEY country_code3 (country_code3) )
Importing the ip-to-country database
First download the zipped CSV file from here and unzip to a directory of your choice. This is a flat database that contains an awful amount of redundant information. We could optimize things and convert text fields to foreign keys and point to child tables. To get things running quickly we will, however, use the data as it is. So let's create an import script for the data. Here is a ready-made import script that should do the trick:
<?php // change the values to match your system $user = "user"; $pass = "pass"; $host = "localhost"; $dbase = "cms"; $table = "iptoc"; $csv = "ip-to-country.csv"; $link = mysql_connect($host,$user,$pass) or die("Could not connect: ".mysql_error()); $db = mysql_select_db($dbase) or die(mysql_error()); $row = 1; // read the csv file $handle = fopen ($csv,"r"); // delete old data @mysql_query("DELETE FROM iptoc WHERE 1"); while ($data = fgetcsv ($handle, 1000, ",")) { // insert into the table $query= "INSERT INTO iptoc " . "(ip_from, ip_to, " . "country_code2, country_code3, country_name) " . "VALUES('". $data[0]."', '".$data[1]."', '".$data[2]."', '". $data[3]."', '".mysql_real_escape_string($data[4])."')"; $result = mysql_query($query) or die("Invalid query: " . mysql_error().__LINE__.__FILE__); // next row $row++; } fclose ($handle); // close csv file echo "done - delete file '$csv' from your server"; ?>
The logger module
The logger module is pretty simple. It calculates the number of rows in the database, adds the user data and removes rows from the table if ncessary. Please note that this an invisible module and will only produce output when a severe error occurs.
<?php /** * Log visitor information. Output only generated when an error occurs. */ class logvisitors_module extends CmsModule{ // the log table. this MUST exist in th db var $table = 'cms_log'; // ip-to-country database. this MUST exist var $iptoc_table = 'iptoc'; // do not record links from inside the site var $omit_internal_referer = true; // the maximum number of rows in log var $max_log_rows = 1300; // how many to delete after max exceeded var $delete_log_rows = 300; function init(){ // call the parent function to make sure we have DB connection parent::init(); if ($this->vars['table'] != '') $this->table = $this->vars['table']; if ($this->vars['max_log_rows'] != '') $this->max_log_rows = $this->vars['max_log_rows']; if ($this->vars['delete_log_rows'] != '') $this->delete_log_rows = $this->vars['delete_log_rows']; } function fetch(){ // get log row count $logrowresultset = @mysql_query("select count(*) from $this->table"); $row=mysql_fetch_row($logrowresultset); $logrowcount = $row[0]; // try to get proxy information (Apache only) $forwarded_for = $_SERVER['HTTP_X_FORWARDED_FOR']; // if this is not empty and not 'unknown' use this instead of IP if ( ($forwarded_for != '') && ($forwarded_for !='unknown') ){ $ip = $forwarded_for; }else{ // no fowrarding detected, use IP $ip = $_SERVER['REMOTE_ADDR']; } $url = $_SERVER['REQUEST_URI']; $referer= $_SERVER["HTTP_REFERER"]; if ($this->omit_internal_referer) { if (strpos($referer, $_SERVER["SERVER_NAME"])!== false) $redir = ''; } $ip = $_SERVER['REMOTE_ADDR']; $result = @mysql_query("SELECT country_code3 FROM $this->iptoc_table ". "WHERE IP_FROM <= inet_aton('$ip') ". "AND IP_TO >= inet_aton('$ip') "); if (@mysql_num_rows($result) >=1) { $row = mysql_fetch_row($result); $country = $row[0]; } else { // No IP detected, weird stuff $country = "N/A"; } $curtime = date("Y-m-d H:i:s"); $query = "insert into $this->table " . "(visit_time,ip,reverse_dns,country,url,referer) values(". "'$curtime','$ip','','$country','$url','$referer' )"; // now insert the row to the table $result = @mysql_query($query); if (mysql_error() != '') { $errorstr .= "n".mysql_error(); } // every now and then clean up the table if ($logrowcount > $this->max_log_rows){ @mysql_query("DELETE FROM $this->table ORDER BY id " . "LIMIT $this->delete_log_rows"); @mysql_query("OPTIMIZE TABLE $_log_table"); } // if error in insert phase return it in comments if ($errorstr != '') { $retvalue = "<!--$errorstr-->n"; } else { $retvalue = ''; } // finally return an empty string or error msg return $retvalue; } } ?>
All right, that's it. Save this template to modules/nonvisual/logvisitors_module.php Now you need to add the module to some nice place in the main template, why not above the footer.
{cmsmodule name="logvisitors" path="modules/nonvisual"}
<div id="footer">
If everything has been done the visitors will be added to the log and the last thing you need to do is write a log viewer of some sort. We will do that at the admin end, of course.
The log viewer
The viewer will be added a the admin end and we use the DataBrowser class to get the it up an running as easily as possible. The code is not too difficult to understand: It is possible to use a reverse dns lookup the way shown here but it is not recommended due to possible delays in browsing. Write a small batch script instead and install it to your cron job to resolve the unsolved IPs every 15 minutes or so.
<?php require_once "DataBrowser.class.php"; class visitors_list_module extends CmsModule { var $dataBrowser = false; var $table = "cms_log"; var $style = "full"; // summary or full var $date_format = "%d/%m %H:%i"; var $pagenumber = ""; function init(){ parent::init(); // check parameters if ($this->vars['style']=='summary') $this->style = 'summary'; if ($this->vars['date_format'] !="") $this->date_format = $this->varst['date_format']; } // return either whois information or the ip just an example, do not use function cb_who($ip){ $who = gethostbyaddr($ip); if ($who != $ip) return $who; else return ("[No reverse DNS]"); } function cb_url($url){ if (strlen($url)<40) return $url; return substr($url,0,37)."..."; } function fetch(){ $dataBrowser = new DataBrowser($this); $dataBrowser->setSmarty(new Smarty()); $dataBrowser->setPagination(20); $dataBrowser->setMainTable('cms_log'); if (isset($_GET['pagenumber'])){ $this->pagenumber = $_GET['pagenumber']; $dataBrowser->setPageNumber($_GET['pagenumber']); } $EDIT_FORM = "edit_news"; // do we have a 'date' parameter in the URL? $date = $_GET['date']; // if it is -1 we have selected (show all in the dropdown) if ($date != "" && $date != -1){ $date =$_GET['date']; $condition = "WHERE DATE(visit_time)='$date' "; }else { $date = date("Y-n-j"); $condition = "WHERE 1 "; } // just for testing - this can be removed later if ($_GET['style'] == "summary"){ $this->style="summary"; } // create query depending on what we want to see if ($this->style=='summary'){ // summary - group by IP $dataBrowser->setTitle("Visitors daily - Summary by IP"); $query ="SELECT id,". "DATE_FORMAT(visit_time,'$this->date_format') as time,". "ip, reverse_dns,country, COUNT(*),referer ". "FROM cms_log $condition " . "GROUP BY ip ORDER BY ip,visit_time DESC"; } else { // show each click $title = "Visitors daily"; if ($this->pagenumber != "") $title .= " - page ".$this->pagenumber; $dataBrowser->setTitle( $title); $query ="SELECT id, ". "DATE_FORMAT(visit_time,'$this->date_format') as time,". "ip, reverse_dns,country, url, referer ". "FROM cms_log $condition ORDER BY visit_time DESC"; } $dataBrowser->setQuery($query); // no links for editing/deleting/inserting $dataBrowser->setDeleteLinkUrl(""); $dataBrowser->setInsertLinkUrl(""); // add columns to grid // (0=hide,1=grid,2=inline display form,"cb_*"=callback function); $dataBrowser->addColumn("id",10,"text",0); $dataBrowser->addColumn("Time",50,"text",1); $dataBrowser->addColumn("IP",30,"text",1); $dataBrowser->addColumn("Host",200,"text",1); $dataBrowser->addColumn("Cou",30,"text",1); if ($this->style=='summary') { $dataBrowser->addColumn("Clicks","50","text",1); $dataBrowser->addColumn("Referer","100%","text",2); } else{ $dataBrowser->addColumn("URL","100","text",1,"cb_url"); $dataBrowser->addColumn("Referer","100%","text",2); } // show the fields view-labeled "2" between the rows // in case we have an id if (isset($this->GET['id'])) $dataBrowser->setActiveRecord($_GET['id']); // add hidden fields from GET. They are needed for various // links like dropdowns and edit links $dataBrowser->addHiddenFields("act,style"); // add date selection box for date // 'hidden fields' above will be included $dataBrowser->addSelectionForm("date", "Visitors on: ", "SELECT DISTINCT DATE_FORMAT(visit_time,'%Y-%m-%d') " . "FROM $this->table",$date,$_SERVER['PHP_SELF'],"Show all"); return $dataBrowser->fetchTemplate("DataBrowser.tpl"); } } ?>
Save the above as admin/modules/logs/visitors_list_module.php. As it is using the standard DataBrowser template we only need to have the following line added in the main template:
{* visitors log *}
{elseif $smarty.get.act eq "visitors_list"}
{cmsmodule name="visitors_list" path="logs"}
We can also add a couple of additional menu items in the menu.html:<a class="simpleitem" href="?act=visitors_list" title="">View visitor log</a>
<a class="simpleitem" href="?act=visitors_list&style=summary"
title="">Visitor summary</a>
And that's it. It is up to you to add new functionality if you need any.
Addendum: hints for optimizations
There are several ways to increase performance (if you are worried and running a high-traffic site):
- Save the log to a file and transfer it to the database using a cron script at a silent time.
- Do not use a database at all, parse the data from the file in the admin module
- Add a simple lottery that only checks the number of lines once every ten requests
- Do not use the module at all
- Parse the Apache log file using
awkandgrep(yuck!)
All that said I want to point out that I have been using this logger for years without any problems and nobody complaining.
The validity of this site may vary while it is being
developed.
Feel free to test it, though :)