[SURBL-Discuss] Fwd: Re: Please sanity check sa.surbl.org announcement

Jeff Chan jeffc at surbl.org
Mon Apr 12 15:35:49 CEST 2004


This is a forwarded message
From: Daniel Quinlan <quinlan at pathname.com>
To: Jeff Chan <jeffc at surbl.org>
Date: Monday, April 12, 2004, 9:32:06 AM
Subject: Please sanity check sa.surbl.org announcement

===8<==============Original message text===============
Jeff Chan <jeffc at surbl.org> writes:

> Would you please do a spam and ham corpora check with
> "sa.surbl.org" whenever you can?  We'd really like to know any
> false positives to remove if that's possible to determine.

Pretty high FPs.

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  11189     1200     9989    0.107   0.00    0.00  (all messages)
100.000  10.7248  89.2752    0.107   0.00    0.00  (all messages as %)
  6.095  56.2500   0.0701    0.999   1.00    1.00  URIBL_SC_SURBL
  6.855  59.7500   0.5006    0.992   0.98    1.00  URIBL_SBL
  9.545  72.8333   1.9421    0.974   0.95    0.01  T_URIBL_SA_SURBL
  0.116   0.5000   0.0701    0.877   0.58    0.01  T_URIBL_DSBL

The FP rate is higher than even SBL (which gets some collateral damage
with URIBL due to the domain->NS->A method that focuses in on name
servers).  A few of my FPs are actually legitimate anti-spam domains.

> We'd really like to know any false positives to remove if that's
> possible to determine.

Since the list is not regenerated every 4 days, I'm not sure it's a good
idea for SA corpus maintainers to submit false positives since the FP
rate would then be lower for us, but not much lower for most people.  In
other words, it would introduce a large corpus bias.

> We deliberately did not want to combine Bill's list and mine not
> so much due to not-invented-here syndrome but because their
> source data is so different, and because their size and time
> factors are pretty radically different at present.  I gave some
> of the original reasons in the proposed announcement which I had
> not forwarded here yet, but will now.

You could continue to offer separate queries for people who are
mirroring the zones.  A lot of blacklists offer both separate and
multiple queries.

Daniel
 
-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

===8<===========End of original message text===========

-- 
Jeff Chan
mailto:jeffc at surbl.org-nospam
http://www.surbl.org/



More information about the Discuss mailing list