[SURBL-Discuss] Re: Start an IP list to block?

11 Sep 2004

      On Friday, September 10, 2004, 10:40:39 AM, Pete McNeil wrote:
...
On Friday, September 10, 2004, 1:13:38 PM, Jeff wrote:
JC>> Thanks for your comments.  By "recursive domain additions" to you
JC>> mean to initiate a proactive search of domains within a given
JC>> network?  What I'm proposing is not to actively try to search,
JC>> but simply to bias the inclusion of domains that are *actually
JC>> reported to us as being in spams*.
...
What I mean by "recursive domain additions" (an internal name I use
for this process) is something like this:
...

Spamtrap sources the addition of a domain (URI) to the blacklist.

...

A subset of domains in the blacklist are resolved to IPs and those

IPs are added to an internal reference list.
...

Subsquent clean spamtrap sources are scanned for domain URI that

resolve to IPs on the reference list and if found these new domains
are added to the blacklist (or at least recommended as candidates).
Aha, the space I was referring to was SpamCop reports, which
AFIAK are human.  SpamCop does get trap data, but I'm not exactly
sure what they do with it.
That said some of the same techniques might apply to our use of
spamtrap data, provided hand-checking is also done.
Otherwise your description matches ours.
...
So, this is not a proactive search really - rather the capture of one
domain predisposes the candidate generator to capture additional
domains that resolve to the same IP(s).
Got it.  That is similar to the principle I was proposing.  ;-)
...
(Candidate generator = AI monitoring spamtrap data to extract URI and
recommend them as candidates for the black list).
...
--- Sorry for the complexity here, I'm used to thinking in terms of
our system and it is sometimes difficult to describe the concepts
outside of that context.
We all get accustomed to thinking in terms of our own systems,
which sometimes is why explanations like this are needed to
clear things up.  I find it sometimes helps to try to step
back and describe an outsider's view of things.  I don't always
succeed or remember to do that.  ;-)
JC>> Hopefully my description of the difference makes some sense
JC>> and it can be seen why the potential for false inclusions
JC>> might be lower when the space is *actual spam reports*, and
JC>> not the space of all domains hosted in nearby networks.
...
Clearly. *actual spam reports* is analogous to clean spamtrap data -
though I presume it may also include some non-spamtrap data submitted
by users. You are definitely on the right track - that is, I think
we're on the same page generally.
The SpamCop data I assume to be *human-sourced* reports.  That's
what I meant by "actual spam reports".  "Human spam reports"
would have been more descriptive.
...
The caution is - even with very strong spamtraps there are errors in
this process often enough to require some extra research before gating
the new "candidates" into the blacklist, IME.
Our use of spamtraps (mostly into the WS and OB lists) are
carefully tested.  The WS entries are supposed to all be hand
checked, since we all agree that purely automatic methods let in
too many FPs.  Human checkers make mistakes too, though we're
trying to cut down on those errors, for example by suggesting
some requirements such as:
1.  Domain age.  Older domains should only be added with a lot
of evidence.  Most spammer domains are no more than a week or
two old, often less than a few days old.
2.  Only add domains that only appear in spams.  Don't add
domains that appear in hams.
The second seems the hardest to get across, even though it should
seem pretty obvious.  The problem seems to be that people say
"yep, I've seen a spam with this domain so I'm adding it".  If
so, that's not the right criterion.
Thanks for comparing notes!  :-)
Jeff C.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[SURBL-Discuss] Re: Start an IP list to block?