[SURBL-Discuss] Don't want to give ham lists for creating whitelists

Daniel Quinlan quinlan at pathname.com
Fri Apr 30 19:32:02 CEST 2004


(was: Want your ham lists for creating whitelists (Was: Re: [SURBL-Discuss] This ROCKS!))

Jeff Chan <jeffc at surbl.org> writes:

> Is there any way we can get the message body URI domains (or even raw
> messages) from some of the ham lists?

This is a really really really really really really bad idea.

> The intent is not to "cheat" the S/O scores, but to compile a good,
> hand checked whitelist of legitimate message body domains.  *I don't
> really care where they come from*, but we have this nice, untapped
> source of them just sitting there, and a glaring need for some
> legitimate domains to keep off the blocklists....

I'm very concerned that if people who plan on submitting their corpus
results into the SpamAssassin mass-check process for 3.0 will end up
with no ham hits whereas your average user will have some.  Cheating is
not the word I would use, but it would completely completely throw off
our GA process by making it look like SURBL cannot ever issue a false
positive, screwing over non-developers.

Even if you only get results from people who are never going to
submitting corpus results, you're going to end up with a very heavy
whitelist bias towards technical users, penalizing non-technical users
who will have the rule scored too high compared to what it should be.
 
> I'm results oriented.  If there are some good hand-checked ham lists,
> I see no practical reason why we should not use them to generate
> message body domain whitelists.

See above.

It would be much better if you came up with other methods for generating
a whitelist besides using our benchmark data which is comparatively tiny
compared to the world of ham.

One other note: just because a URL comes out of ham does *not* mean it
should be listed in SURBL.  I expect spam URLs to be present in some
ham, discussion of spam mostly, so there should always be a small false
positive rate (and the rules should be appropriately scored).

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting


More information about the Discuss mailing list