[SURBL-Discuss] Fwd: Re: Why such a low score?

Jeff Chan jeffc at surbl.org
Wed Sep 29 17:17:27 CEST 2004


This is a forwarded message
From: Matt Kettler <mkettler at evi-inc.com>
To: Chris Santerre <csanterre at MerchantsOverseas.com>, users at spamassassin.apache.org
Date: Wednesday, September 29, 2004, 8:13:27 AM
Subject: Why such a low score?

===8<==============Original message text===============
At 10:55 AM 9/29/2004, Chris Santerre wrote:
>What was the reason WS got such a low score in SA 3.0??? .5 is a joke! Hell
>BigEvil was scored a 3 and now one complained, and it is the same data!! I
>don't understand. Did the mass check not go well?

Upon closer inspection, the WS mass-check went pretty well, but WS had the 
greatest number of nonspam hits of all the SURBL lists. It also hit the 
most spam, but the OB list hit nearly as much spam, and almost no nonspam.

Since the GA treats FP's as 100 times worse than FNs, the GA is going to 
heavily bias the score of any overlapping spam hits to the one that has the 
least nonspam hits. I suspect that in the spam cases, most of the WS hits 
also hit either OB or SC, which have better FP ratios, and the scores 
assigned reflect this.

Admittedly the amount of nonspam WS hit is small (0.4%), but that's over 6 
times more nonspam than OB did, and 100 times more than SC did.

Thus WS got a lowish score not for being a bad rule, but for not doing as 
well as it's neighbors that catch the same spams.

 From STATISTICS-set1.txt
OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  10.497  15.8904   0.0008    1.000   0.98    2.01  URIBL_AB_SURBL
  18.019  27.2741   0.0046    1.000   0.97    3.90  URIBL_SC_SURBL
  49.029  74.1861   0.0654    0.999   0.74    2.00  URIBL_OB_SURBL
  51.999  78.4712   0.4756    0.994   0.45    0.54  URIBL_WS_SURBL
   0.010   0.0146   0.0012    0.927   0.39    0.84  URIBL_PH_SURBL

 From STATISTICS-set3.txt:
OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
   7.022  14.4233   0.0061    1.000   0.95    4.26  URIBL_SC_SURBL
  30.471  62.5514   0.0632    0.999   0.74    3.21  URIBL_OB_SURBL
   2.950   6.0208   0.0385    0.994   0.73    0.42  URIBL_AB_SURBL
  33.807  68.9994   0.4494    0.994   0.47    1.46  URIBL_WS_SURBL
   0.019   0.0390   0.0008    0.981   0.44    2.00  URIBL_PH_SURBL

grep SURBL 50_scores.cf:
score URIBL_AB_SURBL 0 2.007 0 0.417
score URIBL_OB_SURBL 0 1.996 0 3.213
score URIBL_PH_SURBL 0 0.839 0 2.000
score URIBL_SC_SURBL 0 3.897 0 4.263
score URIBL_WS_SURBL 0 0.539 0 1.462



===8<===========End of original message text===========



More information about the Discuss mailing list