Hi
Has someone recent statistic on the FP's for the different sublists in multi?
And if possible, has anybody statistics from FP's that where on several of the sublists -at the same time-?
Something "like" the dns-hits in :
http://www.surbl.org/permuted-hits.out.txt
(which Google didn't find alink to, maybe adding a link inside one of the www.surbl.org pages would be nice...)
I'm thinking of adding scoring each sublist to let the user decide on the FP safety. Is very easy for me to generate and seems easy to configure even for end users:
a*[sc] + b*[ws] + c*[ob] + d* [jp] + e * [ab] + f * [ph] >= 100
for example
d could 100 --> even a hit on jp is spam
b could be 99 --> need at least another entry
a,c could be 50 --> need more...
Alain
On Friday, February 11, 2005, 2:47:33 PM, Alain Alain wrote:
Has someone recent statistic on the FP's for the different sublists in multi?
FP rates vary depending on the ham corpus. In other words, your FP rates may not be the same as someone else's FP rates if your mail differs from theirs, which is almost certainly the case.
That said, here are some results Daniel Quinlan posted from the mass-checks on the SpamAssassin corpora around 26 January 2005:
Weekly mass-check results for SURBL:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME 217996 164295 53701 0.754 0.00 0.00 (all messages) 100.000 75.3661 24.6339 0.754 0.00 0.00 (all messages as %) 11.644 15.4490 0.0037 1.000 0.98 3.90 URIBL_SC_SURBL 39.572 52.4976 0.0261 1.000 0.98 3.00 URIBL_JP_SURBL 51.955 68.9236 0.0391 0.999 0.96 2.00 URIBL_OB_SURBL 5.690 7.5492 0.0000 1.000 0.95 2.01 URIBL_AB_SURBL 53.948 71.5238 0.1769 0.998 0.83 0.54 URIBL_WS_SURBL 0.030 0.0396 0.0000 1.000 0.51 0.84 URIBL_PH_SURBL
SC and AB have much better real world results than show above because their time period is much shorter than the test corpora's.
Also note that the JP data is now removed from the WS data, and some old data was removed from WS. So the WS spam and ham hit rates have probably both decreased since this check was done. JP should be about the same.
And if possible, has anybody statistics from FP's that where on several of the sublists -at the same time-? Something "like" the dns-hits in : http://www.surbl.org/permuted-hits.out.txt (which Google didn't find alink to, maybe adding a link inside one of the www.surbl.org pages would be nice...)
I'm thinking of adding scoring each sublist to let the user decide on the FP safety. Is very easy for me to generate and seems easy to configure even for end users:
a*[sc] + b*[ws] + c*[ob] + d* [jp] + e * [ab] + f * [ph] >= 100
for example
d could 100 -->> even a hit on jp is spam b could be 99 -->> need at least another entry a,c could be 50 -->> need more...
I don't think that is known yet. I had proposed setting up some test lists with combinations like this, but got no response. ;-)
If it *is* known I think we'd all like to hear about it. :-) Jeff C. -- "If it appears in hams, then don't list it."