On Thursday, December 9, 2004, 11:12:35 AM, Theo Dinter wrote:
On Thu, Dec 09, 2004 at 10:43:01AM -0800, Jeff Chan wrote:
I should have mentioned that these data are from 8 September. The current rates are probably slightly+ different.
My latest results, btw:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME 119502 106956 12546 0.895 0.00 0.00 (all messages) 70.083 78.3023 0.0080 1.000 1.00 0.00 URIBL_JP_SURBL 66.425 74.2146 0.0159 1.000 0.99 0.00 URIBL_OB_SURBL 71.986 80.4265 0.0319 1.000 0.99 0.00 URIBL_WS_SURBL 22.178 24.7793 0.0000 1.000 0.97 0.00 URIBL_SC_SURBL 15.251 17.0397 0.0000 1.000 0.96 0.00 URIBL_AB_SURBL 0.018 0.0178 0.0239 0.426 0.46 0.00 URIBL_PH_SURBL
A couple things perhaps worth adding:
1. The SC and AB spam detection rates would likely be closer to the 70% range if the spam corpus were restricted to the same time periods as the SC and AB data of 3 and 7 days respectively.
2. Theo's ham corpus is a subset of the collective SpamAssassin ham corpus, so the FPs for different populations may be different. Relative differences between FP rates are meaningful within this corpus.
Jeff C. -- "If it appears in hams, then don't list it."