[SURBL-Discuss] Re[2]: Start an IP list to block?

Pete McNeil madscientist at microneil.com
Fri Sep 10 18:00:16 CEST 2004


On Friday, September 10, 2004, 10:43:39 AM, Jeff wrote:

<snip/>

>> Holy confusion! I can't tell where you are on this subject now Jeff :)

<snip/>

JC> If you're talking about adding resolved IP addresses to SURBLs,
JC> no we're not going to do that.   :-(

JC> What I'm talking about is an internal process where we keep track
JC> of resolved IP addresses and use that to add new domains to
JC> SURBLs sooner if they resolve to a similar IP range (probably
JC> /24s).  We would use the resolved IP addresses to add domains
JC> to sc.surbl.org and possibly other lists sooner.  Most would
JC> probably get added on the first report.  :-)

I recommend a bit of caution on this point. My preliminary data on
using /24s to drive recursive domain additions is that it is prone to
false positives - The network surrounding a given web host is
frequently populated with non-spam servers it seems... at least
frequently enough that it's a challenge to generalize in this way.

(I have also observed random changes to these IPs, I believe in an
effort to thwart automated attacks. On occasion these domains may
point to random legitimate services - the only safe, simple way to
know is to look... that's such an aggressive, forward thinking
countermeasure that I almost didn't believe it when I saw it and I
probably wouldn't have caught it if I weren't looking for it.)

One of the reasons we are able to entertain this kind of analysis is
that we (humans) are heavily involved in a continuous refinement and
posting process - this allows us to provide tuning inputs that are
difficult to quantify for any autonomous AI. We're working in concert
with our tools so our techniques don't easily translate out of that
environment --- but they do often point in directions where more
automation is possible.

It's worth noting that the spammers are accellerating in their efforts
to blend their presence with legitimate services, equipment, etc... I
suppose this is a natural response to the kinds of automated
countermeasures that have been put in place. The upshot of this is
that automated schemes must become increasingly sophisticated
(intelligent) in order to maintain accuracy.

That said, there are some ways to leverage this data when it is
qualified properly. For example, a clean spamtrap - particularly one
spawned from dictionary attacks - can provide a ready stream of
messages from which you can derive domains through recursive ip
references.

It is common for Snake-Oil spammers to leverage a family of domains at
one time for a new campaign and to have these point to a single IP or
a small group of IPs. As a result it is possible to carefully select a
domain (from URI) in one of these messages and then leverage the
resolved IP of that domain to automatically derive the other members
of the family from the spamtrap data. You cannot reliably use open
message data to derive this however since there is a significant risk
of tagging a legitimate virtual host --- however, the spamtrap data
does not have this problem _USUALLY_.

This technique also is limited, however, and requires some significant
review/monitoring. Also, spammers are already complicating this
mechanisms (they're thinking ahead more)... There are a growing number
of cases where randomstuff.example.com resolves to something different
than /example.com and in fact /example.com is often targeted to some
random legitimate host - so you need to target the larger URI to
extract the filtering candidate... There is no simple way of knowing
the conditions - only complex ways.

I probably shouldn't go into much more about this because it will
become confusing - Like I said, many of our techniques only work
within the infrastructure we have created. Where I can offer any
useful insight I will though.

I recommend taking a look at this data yourself with the resources you
have by building prototypes of your mechanisms, then test the bitz out
of them until the statistics reveal themselves. As a matter of
practice this is the only way to know what will work and how it can be
applied.

Hope this helps,

_M





More information about the Discuss mailing list