SpamAssasssin is a mail filter that identifies spam. You can read more about it at the SpamAssassin home page.
Recently, the way that SpamAssassin works on Mathnet was changed. Spam assassin no longer uses filesystem-based configuration files or bayes databases. All such data is stored in a MySQL database.
SpamAssassin is run as part of the CSG Maintained Spam Filter. If you are using the CSG Maintained Spam Filter, SpamAssassin is already being run on your incoming mail. If you are not using the CSG Maintained Spam Filter, you can invoke SpamAssassin from a mail filter such as procmail.
SpamAssassin accepts the following configuration commands.
SpamAssassin user preferences are stored in a database. To update your preferences, use the SpamAssassin user preferences form.
By default, SpamAssassin implements an auto-learning Bayesian filter. Messages that score extremely high and messages that score extremely low are "learned". Over time, SpamAssassin should become a better filter automatically. More detailed information about Bayesian filtering as implemented in SpamAssassin can be found in the sa-learn man page.
You can help SpamAssassin learn the difference between spam and non-spam by saving messages to mailboxes and running the following commands:
/usr/local/spamassassin/bin/sa-learn --mbox --spam SPAM_MAILBOX
/usr/local/spamassassin/bin/sa-learn --mbox --nonspam NON-SPAM_MAILBOX
Where SPAM_MAILBOX is just a normal UNIX mailbox format file that contains messages thatyou have identified as spam and NON-SPAM_MAILBOX is a normal UNIX mailbox format file that contains messages that you have identified as not being spam. The sa-learn program parses the email messages that you feed to it and stores tokens and probabilities in a database. After you have fed a messages to sa-learn, it has no further use for the messages. You can delete them or not as you wish. More detailed information about sa-learn can be found in the sa-learn man page.
As of Nov 15, 2006, the old, file based, bayes databases are no longer supported. If you would like to take your old data and import it into the new bayes database, do:
Note that there are two different versions of sa-learn used. The first command uses the old one to export token data from the old system. The second command uses the new version of sa-learn to import the old data into the new system. Finally, the last command uses the old sa-learn to delete the old data.
At some point in the future, the old version of sa-learn will be removed.
Some interesting background information regarding spam and the Bayesian filtering algorithm used in SpamAssasssin can be found in the following papers by Paul Graham:
and in the Freshmeat article on spam filters.