Don't want to give ham lists for creating whitelists
quinlan at pathname.com
Fri Apr 30 19:32:02 CEST 2004
(was: Want your ham lists for creating whitelists (Was: Re: [SURBL-Discuss] This ROCKS!))
Jeff Chan <jeffc at surbl.org> writes:
> Is there any way we can get the message body URI domains (or even raw
> messages) from some of the ham lists?
This is a really really really really really really bad idea.
> The intent is not to "cheat" the S/O scores, but to compile a good,
> hand checked whitelist of legitimate message body domains. *I don't
> really care where they come from*, but we have this nice, untapped
> source of them just sitting there, and a glaring need for some
> legitimate domains to keep off the blocklists....
I'm very concerned that if people who plan on submitting their corpus
results into the SpamAssassin mass-check process for 3.0 will end up
with no ham hits whereas your average user will have some. Cheating is
not the word I would use, but it would completely completely throw off
our GA process by making it look like SURBL cannot ever issue a false
positive, screwing over non-developers.
Even if you only get results from people who are never going to
submitting corpus results, you're going to end up with a very heavy
whitelist bias towards technical users, penalizing non-technical users
who will have the rule scored too high compared to what it should be.
> I'm results oriented. If there are some good hand-checked ham lists,
> I see no practical reason why we should not use them to generate
> message body domain whitelists.
It would be much better if you came up with other methods for generating
a whitelist besides using our benchmark data which is comparatively tiny
compared to the world of ham.
One other note: just because a URL comes out of ham does *not* mean it
should be listed in SURBL. I expect spam URLs to be present in some
ham, discussion of spam mostly, so there should always be a small false
positive rate (and the rules should be appropriately scored).
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting
More information about the Discuss