A simple logwatch filter for bogofilter

I recently setup my own email server and use bogofilter for spam classification. I also use logwatch to monitor my server. Previously I was using spamassassin for another email setup and I liked how logwatch would show me how many emails spamassassin classified as spam or ham. I wanted the same for bogofilter but logwatch doesn’t support it out of the box, at least not the version I was using.

However, adding custom log filters is quite easy. There might be a better or more correct way to do it but the following worked for me.

The log file

I’m running Debian on my server and bogofilter logs information about classifying emails as well as training spam and ham to /var/log/mail.log. Here are some examples (date and host name omitted).

Classification:

bogofilter[10295]: X-Bogosity: Ham, spamicity=0.0000, version=1.2.4  
bogofilter[10453]: X-Bogosity: Unsure, spamicity=0.9837, version=1.2.4  
bogofilter[10617]: X-Bogosity: Spam, spamicity=1.0000, version=1.2.4  

Training:

bogofilter[23768]: register-n, 1550 words, 1 messages  
bogofilter[11794]: register-s, 874 words, 1 messages  

register-n means that the message was learned as ham.

Logwatch script and configuration

Custom logwatch filters can be added to /etc/logwatch/conf/services/ and /etc/logwatch/scripts/services/ (paths might be different in other distributions). By default logwatch uses all filters, so there is nothing else that needs to be setup besides a configuration file for our new filter and the filter script itself.

Since this is a very simple script with no options, the configuration file just provides the name the filter and which logfile to analyze:

# /etc/logwatch/conf/services/bogofilter.conf  
Title = "Bogofilter"  
LogFile = maillog  

maillog is defined somehow and somewhere else. I found out about this by looking at existing email related filters.

logwatch will provide the log file via standard input to our filter. Most built-in filters are written in perl, but since I don’t know perl and since the language doesn’t seem to matter, I wrote a shell script utilizing awk instead:

# /etc/logwatch/services/bogofilter  
#!/bin/bash  

awk '  
  !/bogofilter/ { next; }  

  /X-Bogosity: Ham,/ {  
    Ham += 1  
  }  
  /X-Bogosity: Spam,/ {  
    Spam += 1  
  }  
  /X-Bogosity: Unsure,/ {  
    Unsure += 1  
  }  

  /register-s/ {  
    if (match($0, /[0-9][0-9]* messages/) != 0) {  
      split(substr($0, RSTART, RLENGTH), parts, " ")  
      SpamLearned += parts[1]  
    }  
  }  

  /register-n/ {  
    if (match($0, /[0-9][0-9]* messages/) != 0) {  
      split(substr($0, RSTART, RLENGTH), parts, " ")  
      HamLearned += parts[1]  
    }  
  }  

  END {  
    if (Ham + Spam + Unsure > 0) {  
      printf "Total Messages: %d\n", Ham + Spam + Unsure  
      printf "  Ham: %d\n", Ham  
      printf "  Spam: %d\n", Spam  
      printf "  Unsure: %d\n\n", Unsure  
    }  

    printf "Learned as spam: %d\n", SpamLearned  
    printf "Learned as ham: %d\n", HamLearned  
  }  
'  

This simple counts the number of spams, hams, unsures, and learnings, and prints a summary that looks like this:

 Total Messages: 32  
   Ham: 21  
   Spam: 2  
   Unsure: 9  

 Learned as spam: 2  
 Learned as ham: 0  

Built-in scripts are more complex since they often support options and print different output depending on the configured level of detail. But this script was all I needed.