Re: [SURBL-Discuss] RFC: How to use new data source: URIs advertised through CBL-listed senders

19 Apr 2005


      On Tuesday, April 19, 2005, 2:02:10 AM, John Wilcock wrote:
...
Jeff Chan wrote:
...
One of the goals of looking at URIs appearing on the CBL traps in
messages also triggering CBL inclusion is to get listings of new
URIs into SURBLs sooner.  One of the valid criticisms of SURBLs
is that there is too much delay between the time a URI is first
used and it gets listed in SURBLs.  This is a problem with RBLs
in general, and it means that the targeted senders (or URIs) have
a window of time before detection and list inclusion where they
can send unhindered.
...
...
Our challenge therefore is to find ways to use those
while excluding the FPs.  Some solutions that have been proposed
so far are:
...
...
What strikes me most is the fundamental incompatibility between aiming 
to reduce the window of opportunity before a URI gets onto any lists, 
yet using inclusion on other lists as a way of confirming the validity 
of the data.
I agree that depending on inclusion in other lists can
sometime mean that we're dependent on the other lists and will
therefore lag them if we try to depend on them.  On the other
hand things like SBL inclusion does not necessarily have that
result.  SBL lists IP ranges belonging to spammers.  If a spammer
registers a brand new domain but points web, NS or MX service
into SBL-listed space, then the domain could in principle be
listed immediately, by virtue of IP matching and not the domain
itself matching any other list.  IOW matches like that permit
immediate listing of completely new domains that don't appear as
domains in other lists.
The inclusions based on other lists represents a separate
approach to try to reach into the "noise" of low-hit-count
records to see if any useful data can be grabbed from it.  It's
generally not our primary use of the data.  We will use other
techniques such as looking at the volume of hits per record to
get new records, do some tuning etc.
Suggestions of other methods of correlating the data to dig
deeper into the noise are welcomed.
...
How about a multi-level system, where any (non-whitelisted) URI in the 
CBL data is immediately included on the first level, then gradually gets 
promoted to the higher levels once it is corroborated by further 
reports, inclusion in other lists, manual confirmation or whatever.
The last byte of the A record could be used to indicate the level.
The number of levels and the details of promotion/demotion strategies 
would obviously need to be worked out and refined over time.
...
Logically the lower levels would have higher FP rates, but can be given 
lower SA scores (or equivalent weightings in other client apps).
...
John.
Right, but it probably should be kept in mind that some
SURBL-using applications may not be doing weight-type scoring.
Some may be doing outright yes/no blocking.  I also prefer the
more difficult approach of trying to say a record belongs to hard
core spammers or it doesn't.  I'm not a big fan of uncertain or
grey results.  Especially given applications that do outright
blocking, listings may be most useful when they're either black
or white.
Jeff C.
--
"If it appears in hams, then don't list it."

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [SURBL-Discuss] RFC: How to use new data source: URIs advertised through CBL-listed senders