[forwarding my reply to Tony at Outblaze with his permission]
On Thursday, October 14, 2004, 8:34:41 PM, Tony RT wrote:
[jeffc@surbl.org - Thu Oct 14 14:00:25 2004]:
Thanks Tony.
May I suggest that you consider checking a domain before listing it? Just because a few customers consider it spam doesn't necessarily mean other customers might not want to get it. I ask because there seem to be some legitimate sites getting onto your lists which some customers may legitimately want to get. For example none of the recent FPs have had to do with pills, mortgages, warez, etc.
Another recent example is browsehappy.com run by the Web Standards Project:
http://webstandards.org/act/campaign/happy/
which seem pretty unlikely to be professional or even casual spammers, no matter what users may report. Users are sometimes wrong, so data should be checked, IMO.
Jeff C.
Jeff, browsehappy.com problem was reported back to us by schampeon and immediately removed.
The approach we take (if its new and appears in reported spam) does have FPs, I agree - but we havent been able to find a good way to "check".
We do look (cursory) at all the blocked domains per day and if anything obvious shows up we do remove them. The problem is that detailed looking by a human is not really practical given the volume of domains we block by day.
As you know/see, we are very responsive and remove very quickly.
If you have any suggestions on how to improve the process, I'm all ears and will implement your suggestions as long as it doesnt consume too much human time (checking 100s of domains 1 by 1 is just not practical).
Cheers, TB
Hi Tony, Thanks indeed for your responsiveness in removing FPs, and addressing the concerns of us and your users. Regarding some checks that can be done on the incoming data, many of the suggestions in our draft policy for manual lists can be automated:
http://www.surbl.org/policy.html
and some of those may perhaps be useful for your checking of incoming suspected spam domains. What I'd suggest is perhaps using these to score new domains and to flag ones that rise above a certain score.
For example, any domain in SBL probably can be blacklisted immediately. Any domain not in SBL probably begins to add to a ham score, though not 100%. If you have access to the headers, and the senders are in xbl.spamuahs.org, then the domain should probably be listed. Any sender IP not in XBL probably should get ham points. Any domain with few or zero NANAS hits may be hammy, Domains in DMOZ, Wikipedia, etc should perhaps get ham points since it's unlikely the human editors of those would add or allow spam domains, etc.
Obviously most of the spam domains we get are fully spammy. Perhaps some of these metrics can help flag ones that are less spammy and worthy of a little further checking?
Your feedback, comments, questions, etc would be welcomed since we intend to use a policy like this for our own manual list, ws.surbl.org. We may adopt other parts of this for our automated lists also.
Cheers,
Jeff C.
P.S. Do you mind if I publish this response on our SURBL discussion list? -- "If it appears in hams, then don't list it."