-----Original Message----- From: Jeff Chan [mailto:jeffc@surbl.org] Sent: Thursday, December 09, 2004 2:06 AM To: SURBL Discuss Subject: Re: [SURBL-Discuss] FP rate
On Wednesday, December 8, 2004, 8:51:04 PM, RD RD wrote:
Hello List,
Is there an available % FP rate for SC and JP ? I thought I should ask to help me decide whether I should
keep dropping
e-mails w/ blacklisted URIs.
FP rates vary a lot depending on whose spam and ham corpra are used, but here are some from SpamAssassin corpora against older versions of the SURBL lists. Most of the SURBL lists now have better FP rates due to cleanup we've been doing along the way:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME 2424443 2357143 67300 0.972 0.00 0.00 (all messages) 100.000 97.2241 2.7759 0.972 0.00 0.00 (all messages as %) 7.595 7.8122 0.0045 0.999 1.00 0.00 URIBL_SC_SURBL 76.754 78.9448 0.0178 1.000 0.80 0.00 URIBL_OB_SURBL 77.230 79.4340 0.0208 1.000 0.60 1.00 URIBL_PJ_SURBL 0.985 1.0126 0.0045 0.996 0.50 0.00 URIBL_AB_SURBL 82.119 84.4600 0.1367 0.998 0.40 0.00 URIBL_WS_SURBL 0.021 0.0216 0.0045 0.829 0.00 0.00 URIBL_PH_SURBL
I still can't believe SC is beating WS in FP rate!!! Bah!!!! Must reduce FP rate more!! Must perse thru thousands of domains.....mind going numb.........
Ahhhh but look at the spam hit rates! Who's your daddy now?! :-)
--Chris
From: Jeff Chan [mailto:jeffc@surbl.org]
OVERALL% SPAM% HAM% S/O RANK SCORE NAME 2424443 2357143 67300 0.972 0.00 0.00 (all messages) 100.000 97.2241 2.7759 0.972 0.00 0.00 (all messages as %) 7.595 7.8122 0.0045 0.999 1.00 0.00 URIBL_SC_SURBL 76.754 78.9448 0.0178 1.000 0.80 0.00 URIBL_OB_SURBL 77.230 79.4340 0.0208 1.000 0.60 1.00 URIBL_PJ_SURBL 0.985 1.0126 0.0045 0.996 0.50 0.00 URIBL_AB_SURBL 82.119 84.4600 0.1367 0.998 0.40 0.00 URIBL_WS_SURBL 0.021 0.0216 0.0045 0.829 0.00 0.00 URIBL_PH_SURBL
I should have mentioned that these data are from 8 September. The current rates are probably slightly+ different.
Jeff C. -- "If it appears in hams, then don't list it."
On Thu, Dec 09, 2004 at 10:43:01AM -0800, Jeff Chan wrote:
I should have mentioned that these data are from 8 September. The current rates are probably slightly+ different.
My latest results, btw:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME 119502 106956 12546 0.895 0.00 0.00 (all messages) 70.083 78.3023 0.0080 1.000 1.00 0.00 URIBL_JP_SURBL 66.425 74.2146 0.0159 1.000 0.99 0.00 URIBL_OB_SURBL 71.986 80.4265 0.0319 1.000 0.99 0.00 URIBL_WS_SURBL 22.178 24.7793 0.0000 1.000 0.97 0.00 URIBL_SC_SURBL 15.251 17.0397 0.0000 1.000 0.96 0.00 URIBL_AB_SURBL 0.018 0.0178 0.0239 0.426 0.46 0.00 URIBL_PH_SURBL
It's important to note that these are not live results.
BTW: the PH "false positives" are from mailing lists which mention identify theft/etc sites, so that's fine.
On Thursday, December 9, 2004, 11:12:35 AM, Theo Dinter wrote:
On Thu, Dec 09, 2004 at 10:43:01AM -0800, Jeff Chan wrote:
I should have mentioned that these data are from 8 September. The current rates are probably slightly+ different.
My latest results, btw:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME 119502 106956 12546 0.895 0.00 0.00 (all messages) 70.083 78.3023 0.0080 1.000 1.00 0.00 URIBL_JP_SURBL 66.425 74.2146 0.0159 1.000 0.99 0.00 URIBL_OB_SURBL 71.986 80.4265 0.0319 1.000 0.99 0.00 URIBL_WS_SURBL 22.178 24.7793 0.0000 1.000 0.97 0.00 URIBL_SC_SURBL 15.251 17.0397 0.0000 1.000 0.96 0.00 URIBL_AB_SURBL 0.018 0.0178 0.0239 0.426 0.46 0.00 URIBL_PH_SURBL
A couple things perhaps worth adding:
1. The SC and AB spam detection rates would likely be closer to the 70% range if the spam corpus were restricted to the same time periods as the SC and AB data of 3 and 7 days respectively.
2. Theo's ham corpus is a subset of the collective SpamAssassin ham corpus, so the FPs for different populations may be different. Relative differences between FP rates are meaningful within this corpus.
Jeff C. -- "If it appears in hams, then don't list it."
Hi!
OVERALL% SPAM% HAM% S/O RANK SCORE NAME 2424443 2357143 67300 0.972 0.00 0.00 (all messages) 100.000 97.2241 2.7759 0.972 0.00 0.00 (all messages as %) 7.595 7.8122 0.0045 0.999 1.00 0.00 URIBL_SC_SURBL 76.754 78.9448 0.0178 1.000 0.80 0.00 URIBL_OB_SURBL 77.230 79.4340 0.0208 1.000 0.60 1.00 URIBL_PJ_SURBL 0.985 1.0126 0.0045 0.996 0.50 0.00 URIBL_AB_SURBL 82.119 84.4600 0.1367 0.998 0.40 0.00 URIBL_WS_SURBL 0.021 0.0216 0.0045 0.829 0.00 0.00 URIBL_PH_SURBL
I still can't believe SC is beating WS in FP rate!!! Bah!!!! Must reduce FP rate more!! Must perse thru thousands of domains.....mind going numb.........
These stats are pretty old, meanwhile things have improved, as for example Theo's stats shows....
Would be nice to rerun some large piles of data again.
Bye, Raymond.