On Tuesday, August 17, 2004, 8:25:31 AM, Andy Warner wrote:
The AbuseButler data seems to have had a fairly low FP rate in large part because it is based on weighted reporting. Only the most frequently reported domains make it onto the list. It isn't perfect and there have been some FPs (mainly on very popular brand name domains that are misreported and get past whitelisting). If other folks want to pass along their URI hits to help improve the volume ratings feel free to drop me a line. At the moment weighted SpamCop data is still the largest source of data, but private trap data volume is growing.
The data in sc.surbl.org is also weighted based on number of reports. (You and I came up with very similar solutions for handling the SpamCop data.)
But the WS list source data does not always have this "spam volume" data behind it. In some cases, the source data are just singular lists with no counts of how often they appeared in spam. So weighting is probably not available across all of the WS data.
I agree it's a useful concept though. I think of it as a form of voting
Jeff C.