[SURBL-Discuss] Re: Please sanity check sa.surbl.org announcement
quinlan at pathname.com
Mon Apr 12 10:32:06 CEST 2004
Jeff Chan <jeffc at surbl.org> writes:
> Would you please do a spam and ham corpora check with
> "sa.surbl.org" whenever you can? We'd really like to know any
> false positives to remove if that's possible to determine.
Pretty high FPs.
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
11189 1200 9989 0.107 0.00 0.00 (all messages)
100.000 10.7248 89.2752 0.107 0.00 0.00 (all messages as %)
6.095 56.2500 0.0701 0.999 1.00 1.00 URIBL_SC_SURBL
6.855 59.7500 0.5006 0.992 0.98 1.00 URIBL_SBL
9.545 72.8333 1.9421 0.974 0.95 0.01 T_URIBL_SA_SURBL
0.116 0.5000 0.0701 0.877 0.58 0.01 T_URIBL_DSBL
The FP rate is higher than even SBL (which gets some collateral damage
with URIBL due to the domain->NS->A method that focuses in on name
servers). A few of my FPs are actually legitimate anti-spam domains.
> We'd really like to know any false positives to remove if that's
> possible to determine.
Since the list is not regenerated every 4 days, I'm not sure it's a good
idea for SA corpus maintainers to submit false positives since the FP
rate would then be lower for us, but not much lower for most people. In
other words, it would introduce a large corpus bias.
> We deliberately did not want to combine Bill's list and mine not
> so much due to not-invented-here syndrome but because their
> source data is so different, and because their size and time
> factors are pretty radically different at present. I gave some
> of the original reasons in the proposed announcement which I had
> not forwarded here yet, but will now.
You could continue to offer separate queries for people who are
mirroring the zones. A lot of blacklists offer both separate and
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting
More information about the Discuss