[SURBL-Discuss] RFC: consensus list?

Jeff Chan jeffc at surbl.org
Sat Nov 13 09:14:24 CET 2004


On Friday, November 12, 2004, 7:38:45 AM, John Wilcock wrote:
> On Fri, 12 Nov 2004 04:31:37 -0800, Jeff Chan wrote:
>> We could probably experiment and try some different approaches
>> and see how they test out on corpora and live mail servers.

> Should be very easy to test with SpamAssassin for anyone with a decent
> corpus - just write some meta rules to simulate the intersections (or
> Ryan's suggested additive combinations). 

And I have another technique I can use here:  Take the lists
and permutations of lists then see what percentage of each of
those hit DNS queries matching blocklists in general.  Recall
that we now have statistics about whitelist, blocklist and
unmatched DNS queries sampled from a DNS server.  That means
we can estimate spam detection rates by lists and permutations
of lists purely based on SURBL DNS hits.

This is not as good as proper corpus checks, since our
blocklist hits may include some FPs, but it does give some
indication of the general spam detection rates of the lists
or their permutations.  The best of those results could then
be checked against hand-checked corpora with some confidence
that we're at least checking the most promising ones.

Gonna code this up....

Jeff C.
--
"If it appears in hams, then don't list it."



More information about the Discuss mailing list