Use sa-learn to teach spamassassin what is spam

For a couple of years now I have been using Spamassassin with great success. Over time I occasionally would teach Spamassassin what spam is with the ‘sa-learn—spam mboxdata’ command. It has been awhile since I did this and figured it was time once again to teach Spamassassin a thing or two about spam.

Locally I use only Apple for my computers, and Mail.app for my software, but this will work equally from Windows machines. My servers all run RHEL4.

For a couple of weeks I collected any spam that got through to me, into a folder I call SPAM. Then when I think there is enough spam in the folder—in this case 487 spam—I export the SPAM folder into an mbox format file. On Apple this is easily done using this hint from MacOSXHints. Windows users I am not sure about, but a quick search on Google should reveal some results.

Next we FTP our mbox folder to the server. Then we simply run:

sa_learn—mbox—spam mboxdata

Replace ‘mboxdata’ with the name of the file you uploaded. You should see a message that looks similar to this:

Learned from 473 message(s) (487 message(s) examined).

My reason for posting this is that I forgot to add the—mbox flag and simply got a message that looked like this:

Learned from 0 message(s) (0 message(s) examined).

Have fun with it. Let me know if you see anything I missed.

Makewhatis gives ‘zcat: stdout: Broken pipe’ errors

This morning I noticed for the first time in my nightly cron jobs, that Makewhatis on my RedHat Enterprise Linux (RHEL) box, was giving ‘zcat: stdout: Broken pipe’ errors.

A quick search on the now defunct EV1Servers forums showed that skeeter1jd had the solution:

Edit /etc/cron.weekly/00-makewhatis.cron:

change the this line:

makewhatis -w

to:

makewhatis -w -u

What is makewhatis (man makewhatis):

makewhatis reads all the manual pages contained in the given sections of manpath or the preformatted pages contained in the given sections of catpath. For each page, it writes a line in the whatis database; each line consists of the name of the page and a short description, separated by a dash.  The description is extracted using the content of the NAME section of the manual page.

Blank emails, or ‘Failed to link message body between queues’

Occasionally I get a server that starts tossing out emails with no message body—blank emails. Today I noticed that I also get a ‘Failed to link message body between queues’ error in my daily (nightly really) LogWatch. After a little research I re-discovered the fix to the blank email message body… I hope.

The solution, rebuild the Bayes database. This is easy to do and only takes a couple of minutes at most on a modern machine. On this computer in particular, I am running Ensim 4.x on RHEL. All updates through up2date are installed. From the CLI (command line interface) do the following. Note you must be root:

# su
# sa-learn –-sync (that is two dashes not one)
synced Bayes databases from journal in 0 seconds: 4 unique entries (4 total entries)

Note that rebuilding the Bayes database has also been known to fix stuck spamd processes. So if you see one or more spamd processes stuck on ‘top’, this might be the solution on that too.

MySQL export/import and charset problems

Today I discovered I messed up a database upgrade in MySQL. I had a rather large database which I exported from my main server and imported onto my testing server. Both machines had the same version of MySQL. I ran some update scripts on the testing server DB, verified the data, then exported the updated database, and reimported it into the main servers MySQL database.

Then I notice only after a few days that the data I re-imported had all the accented characters broken. Somewhere in my conversion the Latin1 charset did not export in UTF8 from Mysql as it was supposed to in MySQL 4.1.12 or greater!

The big issue here is that I could not simply run the upgrade again, as I would lose all changes to the database in the past week.

After some research I found the following bit of information using iconv ( iconv - Convert encoding of given files from one encoding to another)

Step 1
Re-export the data from the database normalling using
mysqldump—add-drop-table databasename > mydump.sql

Step 2
Use iconv to conver tthe format from UTF8 to ISO-8859-1
iconv -f UTF-8 -t ISO-8859-1 mydump.sql -o newdump.sql

Step 3
Re-import newdump.sql into MySQL with
mysql databasename

< newdump.sql

Now, if everything is normal you should have a great working database again. Or, if like with me, Murphy didn’t allow it to work normally, you might have gotten this error:
iconv: illegal input sequence at position 49823

It took me awhile to find the solution, but after much head banging, I finally found the following solution, whiched worked the charm, I might add!

Step 2
iconv -f UTF-8 -t ISO-8859-1//TRANSLIT mydump.sql -o newdump.sql

This will get the conversion done using transliteration, which means that when a character cannot be represented in the target character set, it can be approximated through one or several similarly looking characters (source: www.gnu.org).

Limiting sendmails Max Recipients

There have been a lot of attempts lately on peoples servers to send spam through online forms. This is done by creating a crafty email with a carefully placed Bcc: tag which is then remotely submitted to your unsuspecting form on your server.

All the email that goes through your form now from this person script will look like it came from your server. And hey, sendmail won’t know what is going on because it thinks you, a trusty ole soul, is sending this bulk email.

Apart from making your forms secure, you can deter the spammers by limiting how many recipients sendmail will send to with each email. Currently, if a spammer puts 100 email address in the fake Bcc: field in the spam, sendmail will gladly forward that email to those 100 recipients. All within the matter of a few minutes.

To limit this amount simply follow these instruction:

Step 1 - Backup your working copy of sendmail.cf before starting.

cp /etc/mail/sendmail.cf /etc/mail/sendmail.cf.working

Step 2 - Modify the existing copy of sendmail.cf

pico -w /etc/mail/sendmail.cf

Find the following line:

#O MaxRecipientsPerMessage=0

and change it to look like this:

O MaxRecipientsPerMessage=15

Save your file, and restart sendmail:

/sbin/service sendmail restart

Sendmail will now only send an email that has 15 or less recipients. Either in the To:, Cc: or Bcc: fields.

Happy mailing!

Page 18 of 19 pages ‹ First  < 16 17 18 19 >