Debugging MailScanner: cleaning messages

Last week I did a short post on testing MailScanner to be sure it was scanning properly. An alert reader pointed out that I missed the easy solution, which was to simply use 'MailScanner --lint' to check if MailScanner was running properly.

Since I wrote that article, I found that my MailScanner was again hogging the CPU. Last time it was an issue with f-prot interaction, this time it is a problem apparently with MailScanner.

Here is what I did to debug this issue, and here is the post that helped me learn more about MailScanner.

First thing was to check top:

7078 root      25   0 28936  23m 1960 R   99  0.6   3:19.60 MailScanner

When I did a strace -p 7078 I saw nothing. Literally. So that was not much help. Next up, see what ps has for me:

ps -ax | grep MailScanner
[07:40:59 root@secure site1]# ps ax | grep MailScanner
 7077 ?        Ss     0:00 MailScanner: master waiting for children, sleeping
 7078 ?        R     14:43 MailScanner: cleaning messages
 7085 ?        S      0:00 MailScanner: waiting for messages
 7123 ?        S      0:00 MailScanner: waiting for messages
 7127 ?        S      0:00 MailScanner: waiting for messages
 7170 ?        S      0:00 MailScanner: waiting for messages

There is my stuck process, and apparently it is stuck cleaning messages. But which messages? The MailScanner working directory has more than one email in it, and to be picky I would prefer not to delete a valid email. Next step to be thorough is to do an strace on MailScanner itself and see what it is up to. You will see a lot of output, and then finally you should see a list of PID's coupled with the email that is stuck:

strace -f /usr/sbin/MailScanner
[pid 24167] open("/home/virtual/FILESYSTEMTEMPLATE/services/sendmail/mqueue/qfmBJAmQ7J011131", O_RDWR|O_LARGEFILE) = 7
[pid 24167] ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfddf558) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 24167] _llseek(7, 0, [0], SEEK_CUR) = 0
[pid 24167] fstat64(7, {st_mode=S_IFREG|0600, st_size=1182, ...}) = 0
[pid 24167] fcntl64(7, F_SETFD, FD_CLOEXEC) = 0
[pid 24167] fcntl64(7, F_SETLK64, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}, 0x99a50a8) = -1 EAGAIN (Resource temporarily unavailable)
[pid 24167] close(7)                    = 0
[pid 24167] open("/home/virtual/FILESYSTEMTEMPLATE/services/sendmail/mqueue/qfmBJE3lHG021993", O_RDWR|O_LARGEFILE) = 7

Note the different PID on these files that seem to cause MailScanner to loop indefinitely: [pid 24167]. Now we can run a trace on that PID:

strace -p 24167
open("/home/virtual/FILESYSTEMTEMPLATE/services/sendmail/mqueue/qfmBJAmQ7J011131", O_RDWR|O_LARGEFILE) = 7
ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfddf558) = -1 ENOTTY (Inappropriate ioctl for device)
_llseek(7, 0, [0], SEEK_CUR)            = 0
fstat64(7, {st_mode=S_IFREG|0600, st_size=1182, ...}) = 0
fcntl64(7, F_SETFD, FD_CLOEXEC)         = 0
fcntl64(7, F_SETLK64, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}, 0x99a50a8) = -1 EAGAIN (Resource temporarily unavailable)
close(7)                                = 0
open("/home/virtual/FILESYSTEMTEMPLATE/services/sendmail/mqueue/qfmBJE3lHG021993", O_RDWR|O_LARGEFILE) = 7

Now we will get more looping text, but this time it will show the emails that MailScanner is stuck on:

/home/virtual/FILESYSTEMTEMPLATE/services/sendmail/mqueue/qfmBJAmQ7J011131
/home/virtual/FILESYSTEMTEMPLATE/services/sendmail/mqueue/qfmBJE3lHG021993

While MailScanner is still running, you can go to the MailScanner working directory and see a list of folders named after the MailScanner PID's:

cd /home/virtual/FILESYSTEMTEMPLATE/services/mailscanner/MailScanner.work
[08:53:51 root@secure MailScanner.work]# ls -la
total 28
drwx------ 7 root root 4096 Dec 19 08:13 .
drwxr-xr-x 4 root root 4096 Mar 20  2008 ..
drwx------ 2 root root 4096 Dec 19 08:39 24167
drwx------ 2 root root 4096 Dec 19 08:46 26134
drwx------ 2 root root 4096 Dec 19 08:29 26135
drwx------ 2 root root 4096 Dec 19 08:46 26136
drwx------ 2 root root 4096 Dec 19 08:47 26137

Go into the problem PID's directory and you will see a list of files that make up the email that is stuck. This might help you determine why the email is stuck. In my case, there was a virus attached to an email, which, like the this post pointed out, was also causing MailScanner to loop.

Yes, you can easily just go to /home/virtual/FILESYSTEMTEMPLATE/services/sendmail/mqueue/ and delete all the emails in the queue and restart MailScanner, but I wanted to be sure that one of the emails was really the problem.

This little how-to does not fix the real issue of MailScanner getting stuck in an endless loop, but merely solves the current issue of MailScanner hogging the CPU. We can fully expect the issue to rear its ugly little head again at some point. Hopefully by that time, we will have solved the root issue.

Read more…

Comments