Ryan Thompson wrote to Raymond Dijkxhoorn and SURBL Discussion list:
Jeff Chan wrote to SURBL Discuss:
Aha, but I'm not too interested in their spams. I'm interested in their hams for use in FP detection. Hams probably don't change as rapidly as spams......
I'm running a mass-check against the entire 2003 public corpus. (about 6000 messages total). I'll post the results once it's done and I've collected easy groupings.
OK. Here it is:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME 6048 1898 4150 0.314 0.00 0.00 (all messages) 100.000 31.3823 68.6177 0.314 0.00 0.00 (all messages as %) 1.091 3.4773 0.0000 1.000 1.00 4.00 URIBL_OB_SURBL 5.258 9.0622 3.5181 0.720 0.29 3.00 URIBL_WS_SURBL 0.000 0.0000 0.0000 0.500 0.14 5.00 URIBL_AB_SURBL 0.000 0.0000 0.0000 0.500 0.14 2.00 URIBL_PH_SURBL 0.000 0.0000 0.0000 0.500 0.14 4.00 URIBL_SC_SURBL 0.265 0.3688 0.2169 0.630 0.00 1.00 URIBL_PJ_SURBL
I don't have time to go through the results right now, but feel free:
Ham that hit any URIBL rule: http://ry.ca/geturi/pc-ham-uribl.log (14K) Full ham log: http://ry.ca/geturi/pc-ham.log (340K) Full spam log: http://ry.ca/geturi/pc-spam.log (159K)
Really, the first one is the interesting one, but the full logs might be interesting if you want to do your own frequency comparisons.
What you want to do is go through pc-ham-uribl.log, and check each message mentioned in the log in the SA public corpus to see if you have any FP candidates or not.
- Ryan