NRAO Home  >  Green Bank  |  Wiki Topic:    GB > Computing > SpamMitigationstuff (r1.1 vs. r1.15)
   Changes | Index | Search | Go
 <<O>>  Difference Topic SpamMitigationstuff (r1.15 - 28 Jun 2006 - WolfgangBaudler)

 <<O>>  Difference Topic SpamMitigationstuff (r1.14 - 18 Nov 2005 - ChrisClark)
Added:
>
>

This page is now obsolete. See the GOLD BOOK.



 <<O>>  Difference Topic SpamMitigationstuff (r1.13 - 28 Sep 2005 - ChrisClark)
Changed:
<
<

The following awk script generates a tidy summary.

# Tool to print a summary of a unix mbox. Assumptions on
# the start and end of messages in the mbox are based on
# Appendix D of RFC822
#
BEGIN { last_line = "something";total_messages=0;}
{
 if(NR==1)last_line="";                 # First line is a message
 if($0 == "") last_line=$0;             # Is this the blank line preceding a new message?
 if($1=="From"  && last_line=="")
  {
   from[total_messages]=$2;             # We have the start of a new message
   print_subject=1;                     # So the next subject line is wanted
  }
 if($1=="Subject:" && print_subject==1)
  {
   $1="";
   subject[total_messages]=substr($0,1,50);          # Snag the subject line
   print_subject=0;                     # Ignore any more until the next message
   total_messages+=1;
  }
}
END {
   print total_messages" quarantined messages:\n";
   for(iter=(total_messages-1);iter>=0;iter-=1)
    {
     printf("%-25s",substr(from[iter],1,25));
     print " " subject[iter];
    }
  }

A copy of this script can be found in /home/nraosoft/apps/quarantine/spam-digest copy to /opt/services/quarantine/spam-digest on the delivery host.

>
>

An awk script that generates a tidy summary is in /home/nraosoft/apps/quarantine/spam-digest copy to /opt/services/quarantine/spam-digest on the delivery host.

Added:
>
>

# Modified 09/14/05 to not send digest if quarantine is empty #

Changed:
<
<

test -a /netapp/users/$name/.QUARANTINE_DIGEST

>
>

-a /netapp/users/$name/.QUARANTINE_DIGEST && -s /users/$name/webmail/QUARANTINE ?

Deleted:
<
<

echo quarantine flag found for $name creating digest

Added:
>
>

Changed:
<
<

Pat's method would require a per user crontab entry and per user lograotate configs. I had a look at the docs and wildcarding is supported so the following will rotate ALL the users quarantines.

>
>

Roatation of the spam files is now a little more complex. It is controlled by 2 logrotate entries and a simple craonjob.

Changed:
<
<



>
>

The sequence goes:

Changed:
<
<

/users/*/webmail/QUARANTINE { rotate 4 weekly nocompress missingok create }

>
>

  1. rotate QUARANTINE daily
  2. run cronjob that appends the newly rotated QUARANTINE.1 to OLDQUARANTINE daily
  3. delete QUARANTINE.1 daily
  4. rotate OLDQUARANTINE weekly
Changed:
<
<

The create directive is required so that the quarantine is created with the right ownership/permissions. Procmail will create a missing folder but doesn't set the ownership/permissions correctly.

>
>

Find the files in /home/nraosoft/apps/quarantine/quarantine ..../quarantine-weekly and ..../logrotate-cronjob

Changed:
<
<

Find this file in /home/nraosoft/apps/quarantine/logrotate.conf. Copy it to /etc/logrotate.d/quarantine it will get run weekly and keep the last 4 weeks worth of quarantine files.

>
>

Copy quarantine and quarantine-weekly to /etc/logrotate.d/ and logrotate-cronjob to /etc/cron.daily

Changed:
<
<

  1. Create the QUARANTINE folder (Simply touching it into existence is fine)
  2. Ensure ownership and permissions of webmail and QUARANTINE are right
  3. Touch .QUARANTINE_SPAM
>
>

  1. Create the QUARANTINE and OLDQUARANTINE folders (Simply touching them into existence is fine)
  2. Ensure ownership and permissions of webmail and *QUARANTINE are right
  3. Touch .QUARANTINE_SPAM (optionally .QUARANTINE_FOREIGN)
Added:
>
>

09/28/05 Modified logrotation to do it daily with wekkly archive.


 <<O>>  Difference Topic SpamMitigationstuff (r1.12 - 16 Sep 2005 - ChrisClark)
Added:
>
>

  1. echo webmail/QUARANTINE >> .mailboxlist (again check ownership/permissions) to subscribe the folder.
Added:
>
>

09/16/05 Added tweadon, mholstin. Added line about subscribing the quarantine foler.


 <<O>>  Difference Topic SpamMitigationstuff (r1.11 - 15 Sep 2005 - ChrisClark)
Added:
>
>

09/15/05 If a user accesses their quarantine through an mua other than webmail and they have filtering within the mua then simply opening the quarantine folder can cause it to be emptied to wherever their filters say. This generates a helpdesk ticket pointing out the inaccuracy of the digest email.


 <<O>>  Difference Topic SpamMitigationstuff (r1.10 - 14 Sep 2005 - ChrisClark)
Changed:
<
<

09/13/05 add nradziwi 09/14/05 added koneil & bmckean. Also modified the digest cronjob to not send an email if a users quarantine folder is empty.

>
>

09/13/05 add nradziwi
09/14/05 added koneil, degan & bmckean. Also modified the digest cronjob to not send an email if a users quarantine folder is empty.


 <<O>>  Difference Topic SpamMitigationstuff (r1.9 - 14 Sep 2005 - ChrisClark)
Changed:
<
<

>
>

09/14/05 added koneil & bmckean. Also modified the digest cronjob to not send an email if a users quarantine folder is empty.


 <<O>>  Difference Topic SpamMitigationstuff (r1.8 - 13 Sep 2005 - ChrisClark)
Added:
>
>

09/13/05 add nradziwi


 <<O>>  Difference Topic SpamMitigationstuff (r1.7 - 12 Sep 2005 - ChrisClark)
Added:
>
>


Update 09/12/05

Green Bank now has 8 alpha testers. So far so good. Of the 8, 6 are squirrelmail users already. One is a vm user and one is on thunderbird.


 <<O>>  Difference Topic SpamMitigationstuff (r1.6 - 12 Sep 2005 - ChrisClark)

 <<O>>  Difference Topic SpamMitigationstuff (r1.5 - 12 Sep 2005 - ChrisClark)
Added:
>
>

Update:

Rather than have the digest generation keyed off the .QUARANTINE_SPAM file add another seperate file for requesting daily digests, .QUARANTINE_DIGEST Should we add other central quarantine rules this will save some tortuous logic later on. It also more closely follows the puremessage paradigm of allowing a user to have the digest or not.
Changed:
<
<

test -a /netapp/users/$name/.QUARANTINE_SPAM

>
>

test -a /netapp/users/$name/.QUARANTINE_DIGEST

Added:
>
>


How to get a user opted in.

  1. Ensure a webmail folder exists. If the user is already using an MUA that understands mbox and can access the users unix filespace use a link or whatever is most appropriate.
  2. Create the QUARANTINE folder (Simply touching it into existence is fine)
  3. Ensure ownership and permissions of webmail and QUARANTINE are right
  4. Touch .QUARANTINE_SPAM
  5. If user desires the daily digest also touch .QUARANTINE_DIGEST into existence.
  6. Check ownership/permissions on the above 2 'trigger' files.
  7. Educate user on squirrelmail basics. How to access QUARANTINE, how to 'deliver' a false positive. Actually run them through it.
  8. Ensure user is aware of location of documentation on squirrelmail and spam quaranting setup.
  9. Relax in the knowledge of a job well done

 <<O>>  Difference Topic SpamMitigationstuff (r1.4 - 12 Sep 2005 - ChrisClark)
Added:
>
>

Others that spring to mind are:

  • .DELETE_SPAM
  • .DELETE_VIRUS
  • .QUARANTINE_FOREIGN (Foreign char sets)
  • .DELETE_FOREIGN

 <<O>>  Difference Topic SpamMitigationstuff (r1.3 - 11 Sep 2005 - ChrisClark)
Changed:
<
<

Basic idea is to have a central procmail rule that diverts email tagged by spamassassin to a users webmail directory. Users can then access the quarantine using squirrelmail or any other mua that understands and can access the mbox file in which such messages are stored.

>
>

Basic idea is to have a central procmail rule that diverts email tagged by spamassassin to a users webmail directory. Users can then access the quarantine using squirrelmail or any other mua that understands and can access the mbox file in which such messages are stored. As part of the system users get a daily digest email of what is currently in their quarantine.

Changed:
<
<

The central procmail rule checks for the existence of this file and only proceeds if it does exist. The meat of the file is below:

>
>

The Procmailrc

The first rule in the procmailrc file effectively whitelists the daily digest emails to prevent them being sent to the quarantine.

The second rule checks for the existence of $HOME/.QUARANTINE_SPAM file and only proceeds if it does exist.

Changed:
<
<

# Test

>
>

# Whitelist the digests! :0H: * ^Subject: .*(NRAO Daily Quarantine Digest) $DEFAULT

# quarantine spam

Changed:
<
<

The matching rule above is pretty conservative, matching anything that scores 5 or more or matches one of the tests for known spam relays etc.

>
>

The matching rule above is pretty conservative, matching anything that scores 5 or more or matches one of the tests for known spam relays etc. I have not set it to do any logging as it produces reams of output that can be gleaned from /var/log/maillog anyway.

Added:
>
>

A suitable procmailrc is in /home/nraosoft/apps/quarantine/promailrc just copy it to /etc/procmailrc on the mail delivery host.

Changed:
<
<

if(NR==1)next; # Ignore first message, it's a dummy

>
>

if(NR==1)last_line=""; # First line is a message

Changed:
<
<

A copy of this script can be found in /home/nraosoft/apps/quarantine/spam-digest

>
>

A copy of this script can be found in /home/nraosoft/apps/quarantine/spam-digest copy to /opt/services/quarantine/spam-digest on the delivery host.

Added:
>
>

Changed:
<
<

if -f /users/$i/.QUARANTINE_SPAM || /users/$i/.QUARANTINE_VIRUS then awk -f digest-script /users/$i/webmail/QUARANTINE | mail -s "NRAO Daily Quarantine Digest" $i

>
>

name=`basename $i`

if test -a /netapp/users/$name/.QUARANTINE_SPAM then echo quarantine flag found for $name creating digest awk -f /opt/services/quarantine/spam-digest /users/$name/webmail/QUARANTINE | /bin/mail -s "NRAO Daily Quarantine Digest" $name fi

Added:
>
>

This script will need to be customised for each site the "test -a" line needs to be looking at the real /users NOT an automounted area.

Changed:
<
<

>
>

Again, a copy of this is in /home/nraosoft/apps/quarantine/digest-cronjob. Copy to /opt/services/quarantine/digest-cronjob on the delivery host and make crontab entry: " 0 7 * * 1-5 /opt/services/quarantine/digest-cronjob" to mail out digests at 07:00 Mon-Fri or customise to your preference.

Changed:
<
<

Pat's method would require a per user crontab entry and per user lograotate configs. Can we find a way to do this that requires only a single logrotate config?

>
>

Pat's method would require a per user crontab entry and per user lograotate configs. I had a look at the docs and wildcarding is supported so the following will rotate ALL the users quarantines.

Deleted:
<
<

Seems that wildcarding should work so something like:

Added:
>
>

create

Changed:
<
<

should work. NEED TO TEST THIS!

>
>

The create directive is required so that the quarantine is created with the right ownership/permissions. Procmail will create a missing folder but doesn't set the ownership/permissions correctly.

Find this file in /home/nraosoft/apps/quarantine/logrotate.conf. Copy it to /etc/logrotate.d/quarantine it will get run weekly and keep the last 4 weeks worth of quarantine files.

Changed:
<
<

  • Digest contents
    Should it list only messages that arrived since the last digest was sent or everything currently in the quarantine? I favour the latter as it is much simpler to do. Listing the contents in reverse order of arrival probably makes more sense that the current method and is dead easy to do.
  • Digest sanitation
    Should the digest script try and sanitize the subject lines? This may well be an n-complex problem anyway but there is a possibility tht the digest may well get quarantined if too many subject lines are too spammy. Currently the script chops the subject line off at 50 characters, could it be truncated further?
>
>

  • Digest contents
    Should it list only messages that arrived since the last digest was sent or everything currently in the quarantine? I favour the latter as it is much simpler to do. Listing the contents in reverse order of arrival makes more sense to me than oldest to newest.
  • Digest sanitation
    Should the digest script try and sanitize the subject lines? Some users may be offended by the contents. To be honest I don't think we can really do much about it.
Changed:
<
<

  • Personal procmailrc files
    Need to ceck that the existence of a central procmail does not stop a users procmail from being run as well. I'd be gobsmacked if a ~/.procmailrc wasn't run as well. Also which order? I suspect global then personal.
  • procmail logging
    Let's not! If a user wants it they can do it themselves.
>
>

  • Personal procmailrc files
    If a message is shunted into quarantine by the global procmail then the users procmail is not run. Otherwise it is.

 <<O>>  Difference Topic SpamMitigationstuff (r1.2 - 11 Sep 2005 - ChrisClark)
Changed:
<
<

for(iter=0;iter<=total_messages-1;iter+=1)

>
>

for(iter=(total_messages-1);iter>=0;iter-=1)

Changed:
<
<

printf("%-30s",substr(from[iter],1,30));

>
>

printf("%-25s",substr(from[iter],1,25));

Added:
>
>

A copy of this script can be found in /home/nraosoft/apps/quarantine/spam-digest

Changed:
<
<

>
>

  • Quarantine folder name
    Just call it QUARANTINE and have done.

 <<O>>  Difference Topic SpamMitigationstuff (r1.1 - 11 Sep 2005 - ChrisClark)
Added:
>
>

%META:TOPICINFO{author="ChrisClark" date="1126408500" format="1.0" version="1.1"}% %META:TOPICPARENT{name="WebHome"}%

Spam Quarantining Stuff

Basic idea is to have a central procmail rule that diverts email tagged by spamassassin to a users webmail directory. Users can then access the quarantine using squirrelmail or any other mua that understands and can access the mbox file in which such messages are stored.

The scheme will be opt-in for existing users and opt-out for new users. Whether the scheme is active for a user will depend on the existence of a file in their home directory called ".QUARANTINE_SPAM" The existence of this file indicates that the user is opted in. Conversley it's absence indicates the user is opted out.

The central procmail rule checks for the existence of this file and only proceeds if it does exist. The meat of the file is below:

# Test
:0:
* ? test  -f $HOME/.QUARANTINE_SPAM
* ^X-MailScanner-(SpamScore: sssss|SpamCheck:.(ORDB|INfinite-Monkeys|SpamAssassin| SBL+XBL))
$HOME/webmail/QUARANTINE


The matching rule above is pretty conservative, matching anything that scores 5 or more or matches one of the tests for known spam relays etc.

The scheme can easily be extended, for instance a user could have messages tagged as virus sent to the quarantine as well by a second rule in the global procmail that only runs if another file .QUARANTINE_VIRUS exists in the users home directory.

:0:
* ? test  -f $HOME/.QUARANTINE_VIRUS
* ^Subject: .*\{VIRUS?\}
$HOME/webmail/QUARANTINE

Daily Digest

To save users having to manually check their quarantine we need a method to send them an email with a list of what is currently in their quarantine. The following awk script generates a tidy summary.

# Tool to print a summary of a unix mbox. Assumptions on
# the start and end of messages in the mbox are based on
# Appendix D of RFC822
#
BEGIN { last_line = "something";total_messages=0;}
{
 if(NR==1)next;                         # Ignore first message, it's a dummy
 if($0 == "") last_line=$0;             # Is this the blank line preceding a new message?
 if($1=="From"  && last_line=="")
  {
   from[total_messages]=$2;             # We have the start of a new message
   print_subject=1;                     # So the next subject line is wanted
  }
 if($1=="Subject:" && print_subject==1)
  {
   $1="";
   subject[total_messages]=substr($0,1,50);          # Snag the subject line
   print_subject=0;                     # Ignore any more until the next message
   total_messages+=1;
  }
}
END {
   print total_messages" quarantined messages:\n";
   for(iter=0;iter<=total_messages-1;iter+=1)
    {
     printf("%-30s",substr(from[iter],1,30));
     print " " subject[iter];
    }
  }

Sample output:

61 quarantined messages:

Tehran@techsoftamerica.com      {SPAM?} FW: Shy Lady in prrrevet action
yenalykyf@info.com.tr           {SPAM?} Busty amateur on table
hwang@news.com                  {SPAM?} Girl in nude pantyhose
name@hsuchi.net                 {SPAM?} Babe Hardcore Pussy Fucked & Cum Covered
pmk@sesmail.com                 {SPAM?} Dirty Bitch Suck & Messy Facial Cumshot
ZEPKTI@radiance-ind.com         {SPAM?} Re: [IMPORTANT] Notice to Home Owner [531
wqfgnujhqdlbwf@yahoo.com        {SPAM?} Pay Less For Branded Watches 4Dv1
TAJRAXYWSBPH@alti-byg.dk        {SPAM?} valium
Lisha@marshjewelers.com         Your limited time savings code, don't delay!
bernard@radiomexico.com         {SPAM?} Asian Babe Blwojob Hardcroe scrutiny
dulfer@earthlink.net            {SPAM?} helen Clark it's happeend derivate

A single cronjob to fire off something like the following will take care of sending all opted in people a digest:

for i in /users/*
do
  if -f /users/$i/.QUARANTINE_SPAM || /users/$i/.QUARANTINE_VIRUS then
      awk -f digest-script /users/$i/webmail/QUARANTINE | mail -s "NRAO Daily Quarantine Digest" $i
done

Rotation of quarantine

Pat's method would require a per user crontab entry and per user lograotate configs. Can we find a way to do this that requires only a single logrotate config?

Seems that wildcarding should work so something like:


  /users/*/webmail/QUARANTINE {
    rotate 4
    weekly
    nocompress
    missingok
  }

should work. NEED TO TEST THIS!

Design questions

  • Digest frequency
    Daily. More frequently is annoying, less often is also bad imho.
  • Digest contents
    Should it list only messages that arrived since the last digest was sent or everything currently in the quarantine? I favour the latter as it is much simpler to do. Listing the contents in reverse order of arrival probably makes more sense that the current method and is dead easy to do.
  • Digest sanitation
    Should the digest script try and sanitize the subject lines? This may well be an n-complex problem anyway but there is a possibility tht the digest may well get quarantined if too many subject lines are too spammy. Currently the script chops the subject line off at 50 characters, could it be truncated further?
  • Logrotate frequency
    Weekly. Doing it more often, especially if we use the compress option in logrotate, makes acessing quarantine folders older than a day or two a royal pain. Think about getting back from vacation! This may in fact be an argument for not compressing with logrotate.
  • Personal procmailrc files
    Need to ceck that the existence of a central procmail does not stop a users procmail from being run as well. I'd be gobsmacked if a ~/.procmailrc wasn't run as well. Also which order? I suspect global then personal.
  • procmail logging
    Let's not! If a user wants it they can do it themselves.

-- ChrisClark - 11 Sep 2005


Topic SpamMitigationstuff . { View | Diffs | r1.15 | > | r1.14 | > | r1.13 | More }
Revision r1.1 - 11 Sep 2005 - 03:15 GMT - ChrisClark
Revision r1.15 - 28 Jun 2006 - 18:04 GMT - WolfgangBaudler
Content copyright © 1999-2007 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.