First release of Logtilla, a web access log analyzer in Erlang

I have written a small Erlang framework for parsing web access logs, called Logtilla, hosted on GitHub. This framework supports parsing logs in the Common Log Format, or in Apache’s Combined Log Format. Thanks to the use of a C port program to do the parsing, Logtilla is very efficient: it can parse and analyze 15,000 entries/sec on my 4-year-old laptop.

Installation

To build it, pull the Git archive from the Logtilla Git repository, and then initialize the build system, configure, and build:

autoreconf -vi
./configure
make

This requires you to install Autoconf, Automake, the asn1c ASN.1-to-C compiler which is used by Logtilla (I have tested that both the released 0.9.21 version and the version in the asn1c SVN repository are usable for Logtilla), and of course any recent version of Erlang/OTP.

Overview of Logtilla

Logtilla consists essentially of a single behaviour module: gen_log_analyzer, which defines the following callbacks:

  • init/1: Initialize the state.
    init(Args::any()) ->
        {'ok', State::any()}
        | 'ignore'
        | {'stop', Reason::any()}.
    
  • handle_log_entry/2: Handle a parsed log entry. The LogEntry record type is defined in header file WebAccessLog.hrl.
    handle_log_entry(LogEntry::#'LogEntry'(), State::any()) ->
        {'ok', NewState::any()}
        | {'error', Reason::any(), NewState::any()}.
    
  • handle_call/3: Handle an application-specific call. This callback is similar to the gen_server:handle_call/3 callback.
    handle_call(Msg::any(), {From::pid(), Tag::any()}, State::any()) ->
        {'reply', Reply::any(), NewState::any()}
        | {'reply', Reply::any(), NewState::any(), Timeout::timeout()}
        | {'noreply', NewState::any()}
        | {'noreply', NewState::any(), Timeout::timeout()}
        | {'stop', Reason::any(), Reply::any(), NewState::any()}.
    
  • handle_cast/2: Handle an application_specific cast. This callback is similar to the gen_server:handle_cast/2 callback.
    handle_cast(Msg::any(), State::any()) ->
        {'noreply', NewState::any()}
        | {'noreply', NewState::any(), Timeout::timeout()}
        | {'stop', Reason::any(), NewState::any()}.
    
  • terminate/2: Cleanup on termination. This callback is similar to the gen_server:terminate/2 callback.
    terminate(Reason::any(), State::any()) ->
        no_return().
    
  • code_change/3: Update the state after a module upgrade. This callback is similar to the gen_server:code_change/3 callback.
    code_change({'down', OldVsn::any()} | OldVsn::any(), State::any(), Extra::any()) ->
        {'ok', NewState::any()}.

The most important callbacks to implement are init/1 and handle_log_entry/2.

Running example

Logtilla contains a basic example module, log/logtilla_test. It counts how many parsed log entries correspond to a query reply for which a length was returned, and how many don’t have a length. This module has no practical purpose, but is useful to illustrate the behaviour callbacks. The module’s most important parts are:

-module(logtilla_test).

% Implement Logtilla's gen_log_analyzer behaviour:
-behaviour(gen_log_analyzer).
% Include Logtilla's header for the definition of the LogEntry record:
-include("WebAccessLog.hrl"). 

% Define and initialize the state:
-record(state, {count_without_length=0, count_with_length=0}).
init([]) ->
  State = #state{},
  {ok, State}.

% Analyze the log entry and update the state:
handle_log_entry(LogEntry, State) ->
  case LogEntry#'LogEntry'.length of
    asn1_NOVALUE ->
      {ok, State#state{
        count_without_length=State#state.count_without_length+1}};
    _Length ->
      {ok, State#state{
        count_with_length=State#state.count_with_length+1}}
  end.

% Implement an application-specific call to return the stats:
handle_call(get_stats, _, State) ->
  {reply, {State#state.count_without_length, State#state.count_with_length},
   State}.

To execute this example to parse a file named /var/log/apache2/access.log:

$ cd src
$ PATH=../c_src:$PATH erl
> {ok, Pid} = gen_log_analyzer:start_link(logtilla_test, [], []).
> ok = gen_log_analyzer:parse(Pid, "/var/log/apache2/access.log").
> gen_log_analyzer:call(Pid, get_stats).

This prints out a tuple with the count of entries without a length and the count of entries with a length.

You must add the c_src directory to the PATH, as it is where the logtilla_parser program is generated, and this program is executed as a port program bygen_log_analyzer to parse the files, so this program must be found in the PATH.

I will soon write other blog posts on the internals of Logtilla (which is the most interesting), and on future works.