Reading time: 8m
Spam is one of the most common avenues of annoyance for the average user. How do I block it? What did I do wrong? If you’re a cPanel user, then you’re in luck, because there’s way to set up intelligent (human) filtering, so the spam you receive will exponentially decline based on your effort.
The problem with continuous training is that it doesn’t exist by default, so for the time being we have to script it. You continue to mark as spam, but you keep getting it, well that’s because you’re not training false positives and false negatives. A false positive (in this context) is a piece of mail that got marked that’s actually legit. A false negative would be a piece of spam that was marked as ham (legit).
Create Spam and Ham training folders
In your email client, create 2 folders underneath the INBOX folder
Enable SpamAssassin and Disable SpamBox
Following this guide, go to cPanel > Apache SpamAssassin
Make sure the button says “Disable Apache SpamAssassin” to indicate that it’s currently enabled.
Make sure Auto-delete Spam is disabled, otherwise you’ll delete messages that SpamAssassin marks as spam, which you don’t want, because it will also filter and additionally delete false positives.
Make sure Enable Spam Box is showing on the button to indicate that SpamBox is currently disabled.
Install the script and configure training directories
Once you’ve done all that, download this script to your user’s home directory as sa-train.sh and give it execute permissions:
Now you’ll need a few things:
- The email username you want to train
- The domain at which the email address exists
- The exact folder name(s) you created in your email client
- Your cPanel username
Configure the folders in the configuration files
Create 2 files in the .spamassassin folder called:
These files should contain 2 entries for each “folder” you have set up for training. One of these should be “new” to combat unread emails, and “cur” to tag existing (read) emails that were moved to those folders. You wouldn’t see the “cur” and “new” folders unless you were looking at the actual filesystem. The file for the email itself moves from “new” to “cur” when it’s read.
Let’s say that our email is firstname.lastname@example.org and our cPanel username is somecool. Let us also say we created the folders in an email client already. The actual folder names on disk would be prefixed with a ‘.’ period. Here’s what the file’s contents would look like:
Turning it on
So you have the configuration files set, and it’s all ready. How do I turn it on?
Easy, set a nightly crontab (or even hourly if you wish) to run this script directly. The cron command for putting into cPanel would be (for example):
A manual crontab string would look like:
0 2 * * * bash /home/somecool/sa-train.sh
This would run at 2AM every day to re-train both ham and spam folders,
After you’ve set all this up, all you need to do is drag and drop false positive messages (legit email marked as spam) to train-ham, and false negatives (spam emails that don’t have ***SPAM*** in the subject, appearing in the Inbox) to the train-spam folder. Just keep emailing as usual, and over a few days to a week, you should see an exponential decline in the amount of spammy email misidentified as legit, and legit email misidentified as spammy.