Felix Kling

Logwatch filter for Bogofilter

I recently setup my own email server and use bogofilter for spam classification. I also use logwatch to monitor my server. Previously I was using spamassassin for another email setup and I liked how logwatch would show me how many emails spamassassin classified as spam or ham. I wanted the same for bogofilter but logwatch doesn’t support it out of the box, at least not the version I was using.

However, adding custom log filters is quite easy. There might be a better or more correct way to do it but the following worked for me.

The log file #

I’m running Debian on my server and bogofilter logs information about classifying emails as well as training spam and ham to /var/log/mail.log.
Here are some examples (date and host name omitted).

Classification:

bogofilter[10295]: X-Bogosity: Ham, spamicity=0.0000, version=1.2.4
bogofilter[10453]: X-Bogosity: Unsure, spamicity=0.9837, version=1.2.4
bogofilter[10617]: X-Bogosity: Spam, spamicity=1.0000, version=1.2.4

Training:

bogofilter[23768]: register-n, 1550 words, 1 messages
bogofilter[11794]: register-s, 874 words, 1 messages

register-n means that the message was learned as ham.

Logwatch script and configuration #

Custom logwatch filters can be added to /etc/logwatch/conf/services/ and /etc/logwatch/scripts/services/ (paths might be different in other distributions). By default logwatch uses all filters, so there is nothing else that needs to be setup besides a configuration file for our new filter and the filter script itself.

Since this is a very simple script with no options, the configuration file just provides the name the filter and which logfile to analyze:

# /etc/logwatch/conf/services/bogofilter.conf
Title = "Bogofilter"
LogFile = maillog

maillog is defined somehow and somewhere else. I found out about this by looking at existing email related filters.

logwatch will provide the log file via standard input to our filter. Most built-in filters are written in perl, but since I don’t know perl and since the language doesn’t seem to matter, I wrote a shell script utilizing awk instead:

# /etc/logwatch/services/bogofilter
#!/bin/bash

awk '
  !/bogofilter/ { next; }

  /X-Bogosity: Ham,/ {
    Ham += 1
  }
  /X-Bogosity: Spam,/ {
    Spam += 1
  }
  /X-Bogosity: Unsure,/ {
    Unsure += 1
  }

  /register-s/ {
    if (match($0, /[0-9][0-9]* messages/) != 0) {
      split(substr($0, RSTART, RLENGTH), parts, " ")
      SpamLearned += parts[1]
    }
  }

  /register-n/ {
    if (match($0, /[0-9][0-9]* messages/) != 0) {
      split(substr($0, RSTART, RLENGTH), parts, " ")
      HamLearned += parts[1]
    }
  }

  END {
    if (Ham + Spam + Unsure > 0) {
      printf "Total Messages: %d\n", Ham + Spam + Unsure
      printf "  Ham: %d\n", Ham
      printf "  Spam: %d\n", Spam
      printf "  Unsure: %d\n\n", Unsure
    }

    printf "Learned as spam: %d\n", SpamLearned
    printf "Learned as ham: %d\n", HamLearned
  }
'

This simple counts the number of spams, hams, unsures, and learnings, and prints a summary that looks like this:

 Total Messages: 32
   Ham: 21
   Spam: 2
   Unsure: 9

 Learned as spam: 2
 Learned as ham: 0

Built-in scripts are more complex since they often support options and print different output depending on the configured level of detail. But this script was all I needed.