KNOWNHOST WIKI

User Tools

Site Tools


email:exim:training-spamassassins-accuracy-with-sa-learn

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
email:exim:training-spamassassins-accuracy-with-sa-learn [2019/11/20 15:17]
Derrick B.
email:exim:training-spamassassins-accuracy-with-sa-learn [2020/06/16 16:01] (current)
Karson N.
Line 1: Line 1:
 ====== Training SpamAssassin's accuracy with sa-learn ====== ====== Training SpamAssassin's accuracy with sa-learn ======
-{{howhard>5}}+
 This is a guide to effectively use SpamAssassin's ''sa-learn'' to help train tokens for spam flagging accuracy.  This is a guide to effectively use SpamAssassin's ''sa-learn'' to help train tokens for spam flagging accuracy. 
  
-<WRAP center round tip 60%>+{{howhard>5}} 
 +<WRAP center round tip 50%>
 When using ''sa-learn'', you must be very diligent and persistent in classifying both the "HAM" and "SPAM" emails for any accounts with ''sa-learn'' enabled! When using ''sa-learn'', you must be very diligent and persistent in classifying both the "HAM" and "SPAM" emails for any accounts with ''sa-learn'' enabled!
 </WRAP> </WRAP>
  
 +\\
 ===== What is 'sa-learn'? ===== ===== What is 'sa-learn'? =====
  
 Given a typical selection of your incoming mail classified as spam or ham (non-spam), this tool will feed each mail to SpamAssassin, allowing it to 'learn' what signs are likely to mean spam, and which are likely to mean ham. Simply run this command once for each of your mail folders, and it will ''learn'' from the mail therein. Note that csh-style globbing in the mail folder names is supported; in other words, listing a folder name as ''*'' will scan every folder that matches. See ''Mail::SpamAssassin::ArchiveIterator'' for more details. SpamAssassin remembers which mail messages it has learnt already, and will not re-learn those messages again, unless you use the **%%--%%forget** option. Messages learnt as spam will have SpamAssassin markup removed, on the fly. If you make a mistake and scan a mail as ham when it is spam, or vice versa, simply rerun this command with the correct classification, and the mistake will be corrected. SpamAssassin will automatically 'forget' the previous indications. Users of ''spamd'' who wish to perform training remotely, over a network, should investigate the ''spamc -L'' switch. Given a typical selection of your incoming mail classified as spam or ham (non-spam), this tool will feed each mail to SpamAssassin, allowing it to 'learn' what signs are likely to mean spam, and which are likely to mean ham. Simply run this command once for each of your mail folders, and it will ''learn'' from the mail therein. Note that csh-style globbing in the mail folder names is supported; in other words, listing a folder name as ''*'' will scan every folder that matches. See ''Mail::SpamAssassin::ArchiveIterator'' for more details. SpamAssassin remembers which mail messages it has learnt already, and will not re-learn those messages again, unless you use the **%%--%%forget** option. Messages learnt as spam will have SpamAssassin markup removed, on the fly. If you make a mistake and scan a mail as ham when it is spam, or vice versa, simply rerun this command with the correct classification, and the mistake will be corrected. SpamAssassin will automatically 'forget' the previous indications. Users of ''spamd'' who wish to perform training remotely, over a network, should investigate the ''spamc -L'' switch.
  
-//Official sa-learn documentation can be found here: [[http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html|sa-learn doc]]//+//Official sa-learn documentation can be found here: ((http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html))[[http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html|sa-learn doc]]//
  
 +\\
 ===== Getting started ===== ===== Getting started =====
 <WRAP center round important 100%> <WRAP center round important 100%>
Line 18: Line 21:
 </WRAP> </WRAP>
  
 +\\
 ==== Create the spam/ham folders ==== ==== Create the spam/ham folders ====
   * In your preferred mail client, open the email account you are configuring this for.    * In your preferred mail client, open the email account you are configuring this for. 
   * Inside of your Inbox create at least two new folders/directories. One for the untrained spam, and one (or more) for your ham (not spam). For this article, we'll be using "Junk" as our untrained spam folder, and "Seen" as our ham folder.   * Inside of your Inbox create at least two new folders/directories. One for the untrained spam, and one (or more) for your ham (not spam). For this article, we'll be using "Junk" as our untrained spam folder, and "Seen" as our ham folder.
  
 +\\
 ==== Being diligent and pro-active ==== ==== Being diligent and pro-active ====
   * Create your new ritual of how you regularly check email. When you receive new email (and read it), start moving the email to one of the two folders. If the email is good mail, move it to "Seen". If it's bad/spam that SpamAssassin didn't already catch, move it to "Junk".   * Create your new ritual of how you regularly check email. When you receive new email (and read it), start moving the email to one of the two folders. If the email is good mail, move it to "Seen". If it's bad/spam that SpamAssassin didn't already catch, move it to "Junk".
   * This is the most difficult part of training properly, however will provide the best results. It will take some time, however the more tokens SpamAssassin is able to collect, the more accurate it will become!   * This is the most difficult part of training properly, however will provide the best results. It will take some time, however the more tokens SpamAssassin is able to collect, the more accurate it will become!
  
 +\\
 ==== Running the sa-learn command ==== ==== Running the sa-learn command ====
 <WRAP center round important 100%> <WRAP center round important 100%>
-Executing sa-learn commands must be performed by the 'mail account owner' to function properly. When attempting to run these commands as the 'root' user, use the following syntax to run them as the user: +Executing sa-learn commands must be performed by the 'mail account owner' to function properly. When attempting to run these commands as the 'root' user, use the following syntax to run them as the user:
  
 sudo -H -u <CPANELUSER> bash -c '/usr/local/cpanel/3rdparty/bin/sa-learn <COMMANDS>' sudo -H -u <CPANELUSER> bash -c '/usr/local/cpanel/3rdparty/bin/sa-learn <COMMANDS>'
Line 42: Line 48:
   * Checking learned tokens.   * Checking learned tokens.
     * Run: ''sa-learn %%--%%dump magic''     * Run: ''sa-learn %%--%%dump magic''
-    * <code># sa-learn --dump magic +<code> 
-0.000 0 3 0 non-token data: bayes db version +  # sa-learn --dump magic 
-0.000 0 1242 0 non-token data: nspam +  0.000 0 3 0 non-token data: bayes db version 
-0.000 0 3872 0 non-token data: nham +  0.000 0 1242 0 non-token data: nspam 
-0.000 0 155784 0 non-token data: ntokens +  0.000 0 3872 0 non-token data: nham 
-0.000 0 1404770116 0 non-token data: oldest atime +  0.000 0 155784 0 non-token data: ntokens 
-0.000 0 1411483933 0 non-token data: newest atime +  0.000 0 1404770116 0 non-token data: oldest atime 
-0.000 0 0 0 non-token data: last journal sync atime +  0.000 0 1411483933 0 non-token data: newest atime 
-0.000 0 1410340539 0 non-token data: last expiry atime +  0.000 0 0 0 non-token data: last journal sync atime 
-0.000 0 5529600 0 non-token data: last expire atime delta +  0.000 0 1410340539 0 non-token data: last expiry atime 
-0.000 0 137169 0 non-token data: last expire reduction count</code>+  0.000 0 5529600 0 non-token data: last expire atime delta 
 +  0.000 0 137169 0 non-token data: last expire reduction count 
 +</code> 
     * nspam - Number of spam messages examined.     * nspam - Number of spam messages examined.
     * nham - Number of (non-spam) messages examined.     * nham - Number of (non-spam) messages examined.
email/exim/training-spamassassins-accuracy-with-sa-learn.1574284634.txt.gz · Last modified: 2019/11/20 15:17 by Derrick B.