On Thursday, September 9, 2004, 2:28:07 PM, Ryan Thompson wrote:
However, for all we know *so far*, 219.254.32.111 could be a HA cluster of a few dozen machines, and, while there may be 200 pill spammers on that cluster, there may be 20,000 other legit sites.
With our current data, we can't make either determination. But, using forward zone data, we can do forward lookups, and track them in a database. Then, do forward lookups on SURBL data to get the IPs of spammers, and (algorithmically!) find correlations.
The programming effort to implement this would not be trivial, not to mention processing power and bandwidth, to do the initial run. The datasets (.com!) are huge. After that, we just have to periodically sample for new, removed, and changed domains, at which point the processing will be reduced.
.com is so large and rapidly changing as to be practically unknowable. That's what I mean by "can't".
By the time you have all of .com fully cataloged, it will have changed significantly.
Really the only ones who could collectively determine how spammy a particular virtual host IP is are the domain registrars working together and pooling all their registration data then resolving every hostname and building a database of all the resolved IPs mapped back into all of their domain names.
If you can't see all the good guy domains on a virtual hosting IP, then you can't see who else you would block.
Jeff C.