Spam is one of the most common avenues of annoyance for the average user. How do I block it? What did I do wrong? If you’re a cPanel user, then you’re in luck, because there’s way to set up intelligent (human) filtering, so the spam you receive will exponentially decline based on your effort.
The Problem
The problem with continuous training is that it doesn’t exist by default, so for the time being we have to script it. You continue to mark as spam, but you keep getting it, well that’s because you’re not training false positives and false negatives. A false positive (in this context) is a piece of mail that got marked that’s actually legit. A false negative would be a piece of spam that was marked as ham (legit).
The Solution
Create Spam and Ham training folders
In your email client, create 2 folders underneath the INBOX folder
train-ham train-spam
Enable SpamAssassin and Disable SpamBox
Following this guide, go to cPanel > Apache SpamAssassin
Make sure the button says “Disable Apache SpamAssassin” to indicate that it’s currently enabled.
Make sure Auto-delete Spam is disabled, otherwise you’ll delete messages that SpamAssassin marks as spam, which you don’t want, because it will also filter and additionally delete false positives.
Make sure Enable Spam Box is showing on the button to indicate that SpamBox is currently disabled.
Install the script and configure training directories
Once you’ve done all that, download this script to your user’s home directory as sa-train.sh and give it execute permissions:
Now you’ll need a few things:
-
- The email username you want to train
- The domain at which the email address exists
- The exact folder name(s) you created in your email client
- Your cPanel username
Configure the folders in the configuration files
Create 2 files in the .spamassassin folder called:
username_domain_salearn-spam.conf username_domain_salearn-ham.conf
These files should contain 2 entries for each “folder” you have set up for training. One of these should be “new” to combat unread emails, and “cur” to tag existing (read) emails that were moved to those folders. You wouldn’t see the “cur” and “new” folders unless you were looking at the actual filesystem. The file for the email itself moves from “new” to “cur” when it’s read.
Let’s say that our email is example@somecooldomain.com and our cPanel username is somecool. Let us also say we created the folders in an email client already. The actual folder names on disk would be prefixed with a ‘.’ period. Here’s what the file’s contents would look like:
/home/somecool/.spamassassin/example_somecooldomain.com_salearn-spam.conf
/home/somecool/mail/somecooldomain.com/example/.train-spam/new /home/somecool/mail/somecooldomain.com/example/.train-spam/cur
/home/somecool/.spamassassin/example_somecooldomain.com_salearn-ham.conf
/home/somecool/mail/somecooldomain.com/example/.train-ham/new /home/somecool/mail/somecooldomain.com/example/.train-ham/cur
Turning it on
So you have the configuration files set, and it’s all ready. How do I turn it on?
Easy, set a nightly crontab (or even hourly if you wish) to run this script directly. The cron command for putting into cPanel would be (for example):
bash /home/somecool/sa-train.sh
A manual crontab string would look like:
0 2 * * * bash /home/somecool/sa-train.sh
This would run at 2AM every day to re-train both ham and spam folders,
Conclusion
After you’ve set all this up, all you need to do is drag and drop false positive messages (legit email marked as spam) to train-ham, and false negatives (spam emails that don’t have ***SPAM*** in the subject, appearing in the Inbox) to the train-spam folder. Just keep emailing as usual, and over a few days to a week, you should see an exponential decline in the amount of spammy email misidentified as legit, and legit email misidentified as spammy.
Sorry, I know this is an old tutorial. But I was wondering, should this script delete mail from the train-spam folder and move mail from the train-ham folder to the inbox folder when it executes?
If not, is there a way to implement this?
Hi Darren,
You could modify the script to do so, perhaps after the $SALEARN commands, provided you know the exact paths. It doesn’t do any of that currently, so you could set folder retention on the train-spam folder in your client, and manually drag and drop false positive messages in train-ham back to your INBOX folder for the time being. I’d imagine that you could rm and mv the files from each respective folder after ~/.spamassassin/user_prefs has been updated though. Do let me know if you update the script, and I can update this one with it.