Jeff Chan jeffc@surbl.org writes:
Would you please do a spam and ham corpora check with "sa.surbl.org" whenever you can? We'd really like to know any false positives to remove if that's possible to determine.
Pretty high FPs.
OVERALL% SPAM% HAM% S/O RANK SCORE NAME 11189 1200 9989 0.107 0.00 0.00 (all messages) 100.000 10.7248 89.2752 0.107 0.00 0.00 (all messages as %) 6.095 56.2500 0.0701 0.999 1.00 1.00 URIBL_SC_SURBL 6.855 59.7500 0.5006 0.992 0.98 1.00 URIBL_SBL 9.545 72.8333 1.9421 0.974 0.95 0.01 T_URIBL_SA_SURBL 0.116 0.5000 0.0701 0.877 0.58 0.01 T_URIBL_DSBL
The FP rate is higher than even SBL (which gets some collateral damage with URIBL due to the domain->NS->A method that focuses in on name servers). A few of my FPs are actually legitimate anti-spam domains.
We'd really like to know any false positives to remove if that's possible to determine.
Since the list is not regenerated every 4 days, I'm not sure it's a good idea for SA corpus maintainers to submit false positives since the FP rate would then be lower for us, but not much lower for most people. In other words, it would introduce a large corpus bias.
We deliberately did not want to combine Bill's list and mine not so much due to not-invented-here syndrome but because their source data is so different, and because their size and time factors are pretty radically different at present. I gave some of the original reasons in the proposed announcement which I had not forwarded here yet, but will now.
You could continue to offer separate queries for people who are mirroring the zones. A lot of blacklists offer both separate and multiple queries.
Daniel