This is a forwarded message
From: Matt Kettler <mkettler(a)evi-inc.com>
To: Chris Santerre <csanterre(a)MerchantsOverseas.com>, users(a)spamassassin.apache.org
Date: Wednesday, September 29, 2004, 8:13:27 AM
Subject: Why such a low score?
===8<==============Original message text===============
At 10:55 AM 9/29/2004, Chris Santerre wrote:
>What was the reason WS got such a low score in SA 3.0??? .5 is a joke! Hell
>BigEvil was scored a 3 and now one complained, and it is the same data!! I
>don't understand. Did the mass check not go well?
Upon closer inspection, the WS mass-check went pretty well, but WS had the
greatest number of nonspam hits of all the SURBL lists. It also hit the
most spam, but the OB list hit nearly as much spam, and almost no nonspam.
Since the GA treats FP's as 100 times worse than FNs, the GA is going to
heavily bias the score of any overlapping spam hits to the one that has the
least nonspam hits. I suspect that in the spam cases, most of the WS hits
also hit either OB or SC, which have better FP ratios, and the scores
assigned reflect this.
Admittedly the amount of nonspam WS hit is small (0.4%), but that's over 6
times more nonspam than OB did, and 100 times more than SC did.
Thus WS got a lowish score not for being a bad rule, but for not doing as
well as it's neighbors that catch the same spams.
From STATISTICS-set1.txt
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
10.497 15.8904 0.0008 1.000 0.98 2.01 URIBL_AB_SURBL
18.019 27.2741 0.0046 1.000 0.97 3.90 URIBL_SC_SURBL
49.029 74.1861 0.0654 0.999 0.74 2.00 URIBL_OB_SURBL
51.999 78.4712 0.4756 0.994 0.45 0.54 URIBL_WS_SURBL
0.010 0.0146 0.0012 0.927 0.39 0.84 URIBL_PH_SURBL
From STATISTICS-set3.txt:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
7.022 14.4233 0.0061 1.000 0.95 4.26 URIBL_SC_SURBL
30.471 62.5514 0.0632 0.999 0.74 3.21 URIBL_OB_SURBL
2.950 6.0208 0.0385 0.994 0.73 0.42 URIBL_AB_SURBL
33.807 68.9994 0.4494 0.994 0.47 1.46 URIBL_WS_SURBL
0.019 0.0390 0.0008 0.981 0.44 2.00 URIBL_PH_SURBL
grep SURBL 50_scores.cf:
score URIBL_AB_SURBL 0 2.007 0 0.417
score URIBL_OB_SURBL 0 1.996 0 3.213
score URIBL_PH_SURBL 0 0.839 0 2.000
score URIBL_SC_SURBL 0 3.897 0 4.263
score URIBL_WS_SURBL 0 0.539 0 1.462
===8<===========End of original message text===========