Logwatch filter for Bogofilter
I recently setup my own email server and use bogofilter for spam classification. I also use logwatch to monitor my server. Previously I was using spamassassin for another email setup and I liked how logwatch would show me how many emails spamassassin classified as spam or ham. I wanted the same for bogofilter but logwatch doesn’t support it out of the box, at least not the version I was using.
However, adding custom log filters is quite easy. There might be a better or more correct way to do it but the following worked for me.
The log file #
I’m running Debian on my server and bogofilter logs information about
classifying emails as well as training spam and ham to /var/log/mail.log
.
Here are some examples (date and host name omitted).
Classification:
bogofilter[10295]: X-Bogosity: Ham, spamicity=0.0000, version=1.2.4
bogofilter[10453]: X-Bogosity: Unsure, spamicity=0.9837, version=1.2.4
bogofilter[10617]: X-Bogosity: Spam, spamicity=1.0000, version=1.2.4
Training:
bogofilter[23768]: register-n, 1550 words, 1 messages
bogofilter[11794]: register-s, 874 words, 1 messages
register-n
means that the message was learned as ham.
Logwatch script and configuration #
Custom logwatch filters can be added to /etc/logwatch/conf/services/
and
/etc/logwatch/scripts/services/
(paths might be different in other
distributions). By default logwatch uses all filters, so there is nothing else
that needs to be setup besides a configuration file for our new filter and the
filter script itself.
Since this is a very simple script with no options, the configuration file just provides the name the filter and which logfile to analyze:
# /etc/logwatch/conf/services/bogofilter.conf
Title = "Bogofilter"
LogFile = maillog
maillog
is defined somehow and somewhere else. I found out about this by
looking at existing email related filters.
logwatch will provide the log file via standard input to our filter. Most
built-in filters are written in perl, but since I don’t know perl and since the
language doesn’t seem to matter, I wrote a shell script utilizing awk
instead:
# /etc/logwatch/services/bogofilter
#!/bin/bash
awk '
!/bogofilter/ { next; }
/X-Bogosity: Ham,/ {
Ham += 1
}
/X-Bogosity: Spam,/ {
Spam += 1
}
/X-Bogosity: Unsure,/ {
Unsure += 1
}
/register-s/ {
if (match($0, /[0-9][0-9]* messages/) != 0) {
split(substr($0, RSTART, RLENGTH), parts, " ")
SpamLearned += parts[1]
}
}
/register-n/ {
if (match($0, /[0-9][0-9]* messages/) != 0) {
split(substr($0, RSTART, RLENGTH), parts, " ")
HamLearned += parts[1]
}
}
END {
if (Ham + Spam + Unsure > 0) {
printf "Total Messages: %d\n", Ham + Spam + Unsure
printf " Ham: %d\n", Ham
printf " Spam: %d\n", Spam
printf " Unsure: %d\n\n", Unsure
}
printf "Learned as spam: %d\n", SpamLearned
printf "Learned as ham: %d\n", HamLearned
}
'
This simple counts the number of spams, hams, unsures, and learnings, and prints a summary that looks like this:
Total Messages: 32
Ham: 21
Spam: 2
Unsure: 9
Learned as spam: 2
Learned as ham: 0
Built-in scripts are more complex since they often support options and print different output depending on the configured level of detail. But this script was all I needed.