Script day – simple log graphing toolּ

Script day – simple log graphing toolּ Wednesday, November 7th, 2007

I wrote similar versions of this script over the years to analyze all kinds of logs, but here’s one for posterity:

This script is useful if you have a log for which you want to analyze load over time – transactions per second or whatnot (the version below does this for Apache httpd logs, but it can be easily modified to analyze anything). For apache (and most other HTTP servers) there are many readily available log analysis software packages that do a much better job then what one can do in a simple script, but you might not have such software pre-configured or it can’t filter what you need or you just want to analyze something else – in which case this script will come in handy.

The script receives time stamped log events – each event on a line – and collects the temporal information for each line. Then it will dump a simple vertical graph (i.e. time is on the Y axis) of load over time in the resolution that you want. Its output looks something like this:


Oct 30 14:40:00 2007 |#############                                    | 3.8 x/sec
Oct 30 14:50:00 2007 |##########################################       | 6.3 x/sec
Oct 30 15:00:00 2007 |###########################################      | 6.5 x/sec
Oct 30 15:10:00 2007 |#############################################    | 6.6 x/sec
Oct 30 15:20:00 2007 |###############################                  | 5.4 x/sec

(The output is obviously formated for terminals with monospaced fonts, so it won’t look as good here).

As you can see the script will dump summary lines for each time unit (as per the required resolution) with a bar depicting the number of requests and a value noting the number of requests per second (or whatever you want – it is easy to modify).

Running it is very simple – the script accepts log lines from standard input and you normally want to feed it the output of some filter, so for example – lets say I want to check the request load on some Apache httpd server resource accessible under /some/resource, with a 10 minute resolution – so I will run:
grep '/some/resource' /var/log/httpd/access_log | ./graphlog.pl 600
This uses grep to filter the Apache access log and will output only log events that access /some/resource into the graphing script (called graphlog.pl here). The graphing script was instructed to collect log events into batches of 10 minutes (600 seconds) for display. The result looks like what you see above.

Anyway, here’s the script: – with a lot of comments to better let you know what’s going on. Its written in perl and uses the ‘-l‘ switch to instruct perl to add a new line after each print command – this saves me a tiny bit of work as I only print the graph, with one command per line.


#!/usr/bin/perl -l


# some "safety" features for perl which I recommend to always use - it tells the perl 
# compiler to fail the compiling on common errors such as typos in variable names
use strict 'refs';


# I'm using these two time formatting procedures from the POSIX support package
use POSIX qw(mktime ctime);


# we're parsing log lines like this:
# 90.149.88.157 - - [07/Nov/2007:14:00:16 +0200] "GET /some/resource HTTP/1.1" 200 4936 "-" "Mozilla/5.0 (bla bla bla)"
# I'm not filtering it - that is the job of something on the outside


# helper list of known month abbreviations so I can understand english formatted dates
my %months = ( 'Jan' => 0, 'Feb' => 1, 'Mar' => 2, 'Apr' => 3, 'May' => 4, 'Jun' => 5, 'Jul' => 6, 'Aug' => 7, 'Sep' => 8, 'Oct' => 9, 'Nov' => 10, 'Dec' => 11 );


# the method that converts formatted time stamps to useful UNIX time stamps
sub dateToTime($) {
    my ($date) = @_;


    # clean up the input to only leave the date and time in the required formatting, dropping everything
    # before the timestamp
    $date =~ s,^[^\d:/]+,,g;
    # and after the time stamp
    $date =~ s,[^\w\d:/]+.*,,g;
    # sanity check - if the date isn't parseable return an undefined value
    return undef unless $date =~ m|(\d{1,2})/(\w{3})/(\d{4}):(\d{2}):(\d{2}):(\d{2})|;
    # use mktime() to convert the time fields to a unix time stamp
    return mktime($6, $5, $4, $1, $months{$2}, ($3 - 1900));
}


# read the required resolution in seconds from the command line
my $res = shift @ARGV;
# initialize some local variables
my (%counters, $max);


# parse each line from stdin
while (<>) { 
    # the above is perl magic for "read one line from standard input each iteration of the loop"
    # figure out the log event's time stamp
    m|\[(\d+[^\]]+\d+)\]| and my $time = dateToTime($1);
    # count the log event
    $cur = ++$counters{int($time / $res)};
    # check if the new counter value is the current largest counter value - 
    # this is used for normalizing the graph
    $max = $cur if $max < $cur;
}


# normalize on width 80 (just because, change it to whatever is good for your terminal)
my $unit = $max / 80;


# go over each counter, in progressive temporal order and graph the value
foreach my $time (sort { $a < => $b } keys %counters) {
    chomp(my $strtime = ctime($time * $res)); # format the time for display
    my $size = ($counters{$time}/$unit); # figure out the bar size for this event
    # print the bar size with a "something per second" value
    print  "$strtime |" . ("#" x $size) . (" " x (80 - $size)) . "| " . (int($counters{$time}/$res*10)/10) . " x/sec";
}

This entry was posted on Wednesday, November 7th, 2007 at 17:37 and is filed under Projects, Script Day, Software. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Things n' Stuff Thoughts about the universe in general