>> My first pass at cleaning the resolved IP data would be to take
>> the to 70th percentile of IP addresses and only use those to
>> check domain resolved IPs to.  It's not perfect, but it should
>> cut down on the uncertainty.
>I should add that this mostly applies to data where we have a
>constant feed of actual spam reports such as from SpamCop.  It
>does not apply as strongly to data sources where we only have a
>unitary list of domains, for example where each domain appears
>once over the whole list.  Though even there, it applies weakly,
>for example a dozen domains that all resolve to the same network
>probably could be used to bias future domains appearing in the
>same network towards list inclusion.
>But when you have a stream of reports about the *same domain*,
>then you can get better statistics about that domain or it's
>resolved IP.  There simply more data to work with in more
>meaningful ways.

Holy confusion! I can't tell where you are on this subject now Jeff :) 

Are you saying , that if we get really good data like what was in my
original post, and we keep the data in the 90th percentile area, then we
might possibly be able to list the IP hosts and have SURBL check against it?
If so..I'm up for that. 

Granted it would take a little more research then just a domain listing, but
I think the benefits are very good. Especially if we keep it only high
ranking IP offenders. I mean, we may add less then 50 IPs a year? Just the
really nasty spammers. 

Anyway, its been a great discussion. 

If we've learned anything in the last 24 hours, its that the Patriots
defense needs some work against the run game ;)

--Chris (Go Tom Brady!)

