On Saturday, August 28, 2004, 12:48:08 PM, Raymond Dijkxhoorn wrote: (Fred writes:)
First off I am looking for people who use SpamAssassin and have access to a corpus of ham e-mail. The score for the WS SURBL test in 3.0 is very low, I am really focused on improving this and hope to get the score to 2 or 3 by the release of SA 3.1 I want to track down False Positives and get these whitelisted.
We also want to get a higher score for that list, especially since its really effective. So if you can help out weeding the 'bad' ones, really appreciated.
FPs hurt effectiveness because they discourage people from using WS or SRUBLs in the first place. It is therefore crucial to get the FPs out. It's easy to lose focus on accuracy when we take the idea "get as many spam domains as possible". That's good, but the primary goal should be to reduce FPs. The low automatic scoring of WS is a useful indication of the FP issue.
IMO, the best way to stop FPs is to keep them off the list in the first place. Here are some ideas for that:
1. Older registered domains should require a large amount of evidence before they are added. Outblaze only lists domains that have registrations 90 days old or newer. That policy prevents many FPs since the professional spammer seem to change domains frequently. There is statistical evidence that the spam domain is only used for 3 days on average. Therefore, listing an old, established, real company with a 1990s registration should seem highly suspect, for example.
2. If a domain has legitimate uses, it should not be added to any list. Yes that means a spam or two will be missed in a few borderline cases, but it's better to miss a few spams than to be used to block someone's possibly legitimate mail.
3. Legitimacy is something that's best determined by manual, human checking. Purely automated tools are probably not adequate. Therefore all list submissions should have careful, experienced human-checking.
Can anyone think of other ideas? Perhaps we should make these into some rules for list inclusion.
Jeff C.