[SURBL-Discuss] Re: Start an IP list to block?

Jeff Chan jeffc at surbl.org
Sat Sep 11 00:48:25 CEST 2004


On Friday, September 10, 2004, 10:40:39 AM, Pete McNeil wrote:
> On Friday, September 10, 2004, 1:13:38 PM, Jeff wrote:

JC>> Thanks for your comments.  By "recursive domain additions" to you
JC>> mean to initiate a proactive search of domains within a given
JC>> network?  What I'm proposing is not to actively try to search,
JC>> but simply to bias the inclusion of domains that are *actually
JC>> reported to us as being in spams*.

> What I mean by "recursive domain additions" (an internal name I use
> for this process) is something like this:

> 1. Spamtrap sources the addition of a domain (URI) to the blacklist.

> 2. A subset of domains in the blacklist are resolved to IPs and those
> IPs are added to an internal reference list.

> 3. Subsquent clean spamtrap sources are scanned for domain URI that
> resolve to IPs on the reference list and if found these new domains
> are added to the blacklist (or at least recommended as candidates).

Aha, the space I was referring to was SpamCop reports, which
AFIAK are human.  SpamCop does get trap data, but I'm not exactly
sure what they do with it.

That said some of the same techniques might apply to our use of
spamtrap data, provided hand-checking is also done.

Otherwise your description matches ours.

> So, this is not a proactive search really - rather the capture of one
> domain predisposes the candidate generator to capture additional
> domains that resolve to the same IP(s).

Got it.  That is similar to the principle I was proposing.  ;-)

> (Candidate generator = AI monitoring spamtrap data to extract URI and
> recommend them as candidates for the black list).

> --- Sorry for the complexity here, I'm used to thinking in terms of
> our system and it is sometimes difficult to describe the concepts
> outside of that context.

We all get accustomed to thinking in terms of our own systems,
which sometimes is why explanations like this are needed to
clear things up.  I find it sometimes helps to try to step
back and describe an outsider's view of things.  I don't always
succeed or remember to do that.  ;-)

JC>> Hopefully my description of the difference makes some sense
JC>> and it can be seen why the potential for false inclusions
JC>> might be lower when the space is *actual spam reports*, and
JC>> not the space of all domains hosted in nearby networks.

> Clearly. *actual spam reports* is analogous to clean spamtrap data -
> though I presume it may also include some non-spamtrap data submitted
> by users. You are definitely on the right track - that is, I think
> we're on the same page generally.

The SpamCop data I assume to be *human-sourced* reports.  That's
what I meant by "actual spam reports".  "Human spam reports"
would have been more descriptive.

> The caution is - even with very strong spamtraps there are errors in
> this process often enough to require some extra research before gating
> the new "candidates" into the blacklist, IME.

Our use of spamtraps (mostly into the WS and OB lists) are
carefully tested.  The WS entries are supposed to all be hand
checked, since we all agree that purely automatic methods let in
too many FPs.  Human checkers make mistakes too, though we're
trying to cut down on those errors, for example by suggesting
some requirements such as:

1.  Domain age.  Older domains should only be added with a lot
of evidence.  Most spammer domains are no more than a week or
two old, often less than a few days old.

2.  Only add domains that only appear in spams.  Don't add
domains that appear in hams.

The second seems the hardest to get across, even though it should
seem pretty obvious.  The problem seems to be that people say
"yep, I've seen a spam with this domain so I'm adding it".  If
so, that's not the right criterion.

Thanks for comparing notes!  :-)

Jeff C.



More information about the Discuss mailing list