[SURBL-Discuss] Re: SURBL scoring (fwd)

Raymond Dijkxhoorn raymond at prolocation.net
Wed Jul 14 18:08:46 CEST 2004


Hi!

Some talks about SURBL on the MailScanner list. This might be interesting 
for some of you:

---------- Forwarded message ----------
Date: Tue, 13 Jul 2004 17:07:14 -0400
From: John Lundin <lundin at CAVTEL.NET>
Reply-To: MailScanner mailing list <MAILSCANNER at jiscmail.ac.uk>
To: MAILSCANNER at jiscmail.ac.uk
Subject: Re: SURBL scoring

On Mon, 12 Jul 2004, Raymond Dijkxhoorn wrote:
>> If the tests aren't very independent, should I reduce the scores
>> when using more than one test?  We delete mail that scores over 12,
>> with these cumulative scores a false positive could result in lost
>> mail. Should I worry about that?
>
> They are completely independant. See it as 3 regular RBL checks, if
> you have a open proxy its also listed in all 3 (most likely). If its
> listed in 3, and it scores 12 you are about as positive as it can be
> that its spam...

(cough) Well, since no one else spoke up... IMO, you should worry.
And the problem is about to get worse; there's a new list in beta.

A few days after adding WS to spamcop_uri, I had a friend's letter
wind up in my spam folder. He was building a new computer and had sent
me a parts list for comment. One of his possible suppliers turned out
to be in SC and WC. (You can guess what one of my comments was.)

o Do you really want to lose every message containing the hot URI?
   And any followup that quotes it?

o They wouldn't be completely independent. Similar sets of spammers,
   same URI being matched against in the message.

Personally, I do worry about forcing high-scoring spam status based
on any single content feature. I scored the RBI_URL checks fairly low
(3.0), and added a few meta-rules to soften multiple impact. This was
guess by eyeball. I haven't gotten around to playing with the math,
but have started to keep statistics to base new scores on.

FWIW, I maintain MS on one old spam-ridden site. About 95% of its
inbound mail currently scores as spam. 83% of that spam hits at least
one URI_RBL rule. 31% of spam (37% of spam hits with URI_RBL's) hit
all four of AB, OB, SC and WS, and 53% (63%) hit three or more! Of the
"non-spam", 1.4% still has at least one URI_RBL hit.

What I added to spamcop_uri.cf (first pass):

meta OB_SC_URI_RBL (SPAMCOP_URI_RBL && OB_URI_RBL)
describe OB_SC_URI_RBL  Compensate if both spamcop and OB trigger
score OB_SC_URI_RBL     -1.5

meta AB_SC_URI_RBL (SPAMCOP_URI_RBL && AB_URI_RBL)
describe AB_SC_URI_RBL  Compensate if both AB and SC trigger
score AB_SC_URI_RBL     -1.5

meta OB_WS_URI_RBL (OB_URI_RBL && WS_URI_RBL)
describe OB_WS_URI_RBL  Compensate if both WS and OB trigger
score OB_WS_URI_RBL     -1.0

I'd be interested to know what other people do to fix this.

--
   lundin at cavtel.net
  "By the time they had diminished from 50 to 8,
the other dwarves began to suspect 'Hungry' ..."

-------------------------- MailScanner list ----------------------
To leave, send    leave mailscanner    to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/     and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html


More information about the Discuss mailing list