This is a guide to effectively use SpamAssassin's sa-learn
to help train tokens for spam flagging accuracy.
When using sa-learn
, you must be very diligent and persistent in classifying both the "HAM" and "SPAM" emails for any accounts with sa-learn
enabled!
Given a typical selection of your incoming mail classified as spam or ham (non-spam), this tool will feed each mail to SpamAssassin, allowing it to 'learn' what signs are likely to mean spam, and which are likely to mean ham. Simply run this command once for each of your mail folders, and it will learn
from the mail therein. Note that csh-style globbing in the mail folder names is supported; in other words, listing a folder name as *
will scan every folder that matches. See Mail::SpamAssassin::ArchiveIterator
for more details. SpamAssassin remembers which mail messages it has learnt already, and will not re-learn those messages again, unless you use the --forget option. Messages learnt as spam will have SpamAssassin markup removed, on the fly. If you make a mistake and scan a mail as ham when it is spam, or vice versa, simply rerun this command with the correct classification, and the mistake will be corrected. SpamAssassin will automatically 'forget' the previous indications. Users of spamd
who wish to perform training remotely, over a network, should investigate the spamc -L
switch.
Official sa-learn documentation can be found here: 1)sa-learn doc
This will only work for email accounts using IMAP.
Executing sa-learn commands must be performed by the 'mail account owner' to function properly. When attempting to run these commands as the 'root' user, use the following syntax to run them as the user:
sudo -H -u <CPANELUSER> bash -c '/usr/local/cpanel/3rdparty/bin/sa-learn <COMMANDS>'
sa-learn -p /home/USER/.spamassassin/user_prefs --spam /home/USER/mail/DOMAIN.TLD/ACCOUNT/.Junk/{cur,new}
sa-learn
– on WHM v76 servers, it is /usr/local/cpanel/3rdparty/bin/sa-learn
Learned tokens from 214 message(s) (1009 message(s) examined)
sa-learn -p /home/USER/.spamassassin/user_prefs --ham /home/USER/mail/DOMAIN.TLD/ACCOUNT/.Seen/{cur,new}
sa-learn --dump magic
# sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 1242 0 non-token data: nspam 0.000 0 3872 0 non-token data: nham 0.000 0 155784 0 non-token data: ntokens 0.000 0 1404770116 0 non-token data: oldest atime 0.000 0 1411483933 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1410340539 0 non-token data: last expiry atime 0.000 0 5529600 0 non-token data: last expire atime delta 0.000 0 137169 0 non-token data: last expire reduction count
You can run these commands manually whenever you'd like, especially if you like control, however it can become a chore and demanding process. A lot of people prefer cron jobs for this. I'd only recommend that you only perform the cron jobs once per day, during non-peak hours.
Also, remember that training works both ways, if non-spam is making it in the spam folder, move it into "Seen" so it can learn properly for false-positives.
If you try this, feel free to comment and let us know your results!