On Friday, September 10, 2004, 7:33:10 AM, Chris Santerre wrote:
From: Jeff Chan [mailto:jeffc@surbl.org]
On Thursday, September 9, 2004, 5:34:05 PM, Jeff Chan wrote:
My first pass at cleaning the resolved IP data would be to take the to 70th percentile of IP addresses and only use those to check domain resolved IPs to. It's not perfect, but it should cut down on the uncertainty.
I should add that this mostly applies to data where we have a constant feed of actual spam reports such as from SpamCop. It does not apply as strongly to data sources where we only have a unitary list of domains, for example where each domain appears once over the whole list. Though even there, it applies weakly, for example a dozen domains that all resolve to the same network probably could be used to bias future domains appearing in the same network towards list inclusion.
But when you have a stream of reports about the *same domain*, then you can get better statistics about that domain or it's resolved IP. There simply more data to work with in more meaningful ways.
Holy confusion! I can't tell where you are on this subject now Jeff :)
Are you saying , that if we get really good data like what was in my original post, and we keep the data in the 90th percentile area, then we might possibly be able to list the IP hosts and have SURBL check against it? If so..I'm up for that.
Granted it would take a little more research then just a domain listing, but I think the benefits are very good. Especially if we keep it only high ranking IP offenders. I mean, we may add less then 50 IPs a year? Just the really nasty spammers.
If you're talking about adding resolved IP addresses to SURBLs, no we're not going to do that. :-(
What I'm talking about is an internal process where we keep track of resolved IP addresses and use that to add new domains to SURBLs sooner if they resolve to a similar IP range (probably /24s). We would use the resolved IP addresses to add domains to sc.surbl.org and possibly other lists sooner. Most would probably get added on the first report. :-)
http://www.surbl.org/faq.html#numbered
Jeff C.