[SURBL-Discuss] Start an IP list to block?

Ryan Thompson ryan at sasknow.com
Thu Sep 9 23:28:07 CEST 2004


Chris Santerre wrote to SURBL Discussion list (E-mail):

> OK, this isn't the first time we've had this discussion, but Raymond
> and I felt this should be made public again. He ran thru some tests of
> 1500+ domains and found the following data. Looks like they maybe send
> from zombies, and never their hosts. IPs are similar across the board.
>
> So is there a way to use the IP info in a good way? Could SA or SURBL
> do a quick ping of the URL and match against a URL? This would allow
> us to simply list 1 IP instead of all these domains.
>
> (I'm well aware of virtual hosts! So only the filthiest of spammers
> would be put on this IP list. Then their IP better boot them or anyone
> hosted on that box would feel the rath of SURBL.)

I talked to Raymond about this, too... and, basically, here are my
big thoughts:

We need to find the correlation of IP addresses to hostnames. See
http://whois.sc/ ; I can, with some help, duplicate what they're doing
in a way that will help us fight spam.

Then, for 219.254.32.111, we could see that there are, say, 200 sites
hosted at that IP, and, after some hand checking, identify that all of
them belong to spammers.

However, for all we know *so far*, 219.254.32.111 could be a HA cluster
of a few dozen machines, and, while there may be 200 pill spammers on
that cluster, there may be 20,000 other legit sites.

With our current data, we can't make either determination. But, using
forward zone data, we can do forward lookups, and track them in a database.
Then, do forward lookups on SURBL data to get the IPs of spammers, and
(algorithmically!) find correlations.

The programming effort to implement this would not be trivial, not to
mention processing power and bandwidth, to do the initial run. The
datasets (.com!) are huge. After that, we just have to periodically
sample for new, removed, and changed domains, at which point the
processing will be reduced.

Still, there's no way I have time or money to do this alone, given my
current commitments. I *wish* I could spend my whole day fighting spam.
I'd need a fair amount of real help. It'd be good to make happen,
though, considering we could then *proactively* list domains (or IPs)
with a high degree of confidence and little or no collateral damage.
(Because we can *measure* collateral damage if we know which other
domains are hosted on a particular IP). And there would be many many
other statistical benefits we could gain.

- Ryan

-- 
   Ryan Thompson <ryan at sasknow.com>

   SaskNow Technologies - http://www.sasknow.com
   901-1st Avenue North - Saskatoon, SK - S7K 1Y4

         Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
   Toll-Free: 877-727-5669     (877-SASKNOW)     North America


More information about the Discuss mailing list