New subject: [SURBL-Discuss] Re: Don't want to give ham lists for creating whitelists

1 May 2004


      (was: Want your ham lists for creating whitelists (Was: Re: [SURBL-Discuss] This ROCKS!))
Jeff Chan jeffc@surbl.org writes:
...
Is there any way we can get the message body URI domains (or even raw
messages) from some of the ham lists?
This is a really really really really really really bad idea.
...
The intent is not to "cheat" the S/O scores, but to compile a good,
hand checked whitelist of legitimate message body domains.  *I don't
really care where they come from*, but we have this nice, untapped
source of them just sitting there, and a glaring need for some
legitimate domains to keep off the blocklists....
I'm very concerned that if people who plan on submitting their corpus
results into the SpamAssassin mass-check process for 3.0 will end up
with no ham hits whereas your average user will have some.  Cheating is
not the word I would use, but it would completely completely throw off
our GA process by making it look like SURBL cannot ever issue a false
positive, screwing over non-developers.
Even if you only get results from people who are never going to
submitting corpus results, you're going to end up with a very heavy
whitelist bias towards technical users, penalizing non-technical users
who will have the rule scored too high compared to what it should be.
...
I'm results oriented.  If there are some good hand-checked ham lists,
I see no practical reason why we should not use them to generate
message body domain whitelists.
See above.
It would be much better if you came up with other methods for generating
a whitelist besides using our benchmark data which is comparatively tiny
compared to the world of ham.
One other note: just because a URL comes out of ham does *not* mean it
should be listed in SURBL.  I expect spam URLs to be present in some
ham, discussion of spam mostly, so there should always be a small false
positive rate (and the rules should be appropriately scored).
Daniel
-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

[SURBL-Discuss] Don't want to give ham lists for creating whitelists