On Thursday, September 9, 2004, 2:49:39 PM, System Dan Mahoney wrote:
On Thu, 9 Sep 2004, Jeff Chan wrote:
On Thursday, September 9, 2004, 2:28:07 PM, Ryan Thompson wrote:
However, for all we know *so far*, 219.254.32.111 could be a HA cluster of a few dozen machines, and, while there may be 200 pill spammers on that cluster, there may be 20,000 other legit sites.
With our current data, we can't make either determination. But, using forward zone data, we can do forward lookups, and track them in a database. Then, do forward lookups on SURBL data to get the IPs of spammers, and (algorithmically!) find correlations.
The programming effort to implement this would not be trivial, not to mention processing power and bandwidth, to do the initial run. The datasets (.com!) are huge. After that, we just have to periodically sample for new, removed, and changed domains, at which point the processing will be reduced.
.com is so large and rapidly changing as to be practically unknowable. That's what I mean by "can't".
By the time you have all of .com fully cataloged, it will have changed significantly.
Really the only ones who could collectively determine how spammy a particular virtual host IP is are the domain registrars working together and pooling all their registration data then resolving every hostname and building a database of all the resolved IPs mapped back into all of their domain names.
That's not how DNS works.
-Dan
Exactly my point. It is not reverse DNS.
It would be a separate, extremely large database of all DNS information and all registration information. That would be the only way to know all the domains that use a given IP address, unless the hosting providers would give us all the information about their virtual hosting accounts, which seems unlikely.
Jeff C.