[SURBL-Discuss] Proposal for moving forward with JP list

John Lundin lundin at cavtel.net
Thu Sep 23 19:52:12 CEST 2004


On Thu, Sep 23, 2004 at 06:56:04AM -0700, Jeff Chan wrote:
> On Thursday, September 23, 2004, 6:21:13 AM, John Lundin wrote:
> > The other is more about how people use scores. As we do a better job
> > of spotting and reduce FPs the SpamAssassin scores will go up. This is
> > good, right?  Well, maybe. There are six URIRL's in SpamAssassin 3.0
> > already. And as scored, a -single- feature in the text of the message
> > can trigger a spam score of 9.9 (without bayes) or 12.4 (with). Now.
> > This scares me, since some systems discard spam above a certain score.
> 
> Are the scores cumulative like that?  I thought I heard they
> are either/or, perhaps in the context of multi and urirhssub.

Oooh, yeah. And they usually do go off in multiples.

Some percentages from a small ISP, last two months inbound mail:

Detected 4.616% as not spam (including FFP's):
99.144%  (no URI_RBL found) 
 0.488%  WS_URI_RBL
 0.226%  OB_URI_RBL
 0.051%  OB_URI_RBL WS_URI_RBL
 0.037%  SPAMCOP_URI_RBL
 0.017%  OB_URI_RBL SPAMCOP_URI_RBL
 0.012%  SPAMCOP_URI_RBL WS_URI_RBL
 0.012%  OB_URI_RBL SPAMCOP_URI_RBL WS_URI_RBL
 0.005%  AB_URI_RBL OB_URI_RBL SPAMCOP_URI_RBL
 0.005%  AB_URI_RBL OB_URI_RBL
 0.002%  AB_URI_RBL

Detected 95.384% as spam:
34.538%  AB_URI_RBL OB_URI_RBL SPAMCOP_URI_RBL WS_URI_RBL
14.623%  (no URI_RBL found) 
14.359%  OB_URI_RBL WS_URI_RBL
10.442%  OB_URI_RBL SPAMCOP_URI_RBL WS_URI_RBL
 7.551%  WS_URI_RBL
 3.153%  AB_URI_RBL SPAMCOP_URI_RBL WS_URI_RBL
 3.031%  AB_URI_RBL OB_URI_RBL SPAMCOP_URI_RBL
 3.006%  OB_URI_RBL
 2.681%  AB_URI_RBL OB_URI_RBL WS_URI_RBL
 1.936%  SPAMCOP_URI_RBL WS_URI_RBL
 1.648%  OB_URI_RBL SPAMCOP_URI_RBL
 1.105%  AB_URI_RBL WS_URI_RBL
 1.055%  AB_URI_RBL OB_URI_RBL
 0.340%  SPAMCOP_URI_RBL
 0.340%  AB_URI_RBL SPAMCOP_URI_RBL
 0.172%  AB_URI_RBL
 0.010%  AB_URI_RBL PH_URI_RBL SPAMCOP_URI_RBL WS_URI_RBL
 0.005%  PH_URI_RBL WS_URI_RBL
 0.004%  OB_URI_RBL PH_URI_RBL WS_URI_RBL
 0.001%  AB_URI_RBL PH_URI_RBL WS_URI_RBL
 0.001%  PH_URI_RBL SPAMCOP_URI_RBL WS_URI_RBL
 0.000%  PH_URI_RBL

Over a third of all spam inbound hit all four URIRLs.
Less that half of that number hit no URIRLs.
But even less, only 11.069%, hit just one URIRL.

Under SA2.6, I compensated by adding in second-order meta rules with
negative scores, but as the number of urirls goes up that becomes
unwieldy fast.

> > If we assume that JP gets the same confidence that SC has, that
> > inflates the score to 13.8 or 16.6. That's a lot of certainty to
> > invest in one lone URI. Especially given that evil URIs do [...]
> 
> JP should score about the same as OB since they have similar
> spam detection and FP rates.  SC has a lower FP rate (good)
> and somewhat lower hit rates (less good) than JP or OB.  The
> lower FP rate rightly counts more, so SC scores higher.

That would drop it to 11.9 or 15.6. :-)

I worry most about quoting and notification scenarios.

-- 
  lundin at cavtel.net
"ASCII stupid question, get a stupid ANSI."


More information about the Discuss mailing list