[SURBL-Discuss] RFC: How to use new data source: URIs advertised
through CBL-listed senders
jeffc at surbl.org
Tue Apr 19 12:54:28 CEST 2005
On Tuesday, April 19, 2005, 2:02:10 AM, John Wilcock wrote:
> Jeff Chan wrote:
>> One of the goals of looking at URIs appearing on the CBL traps in
>> messages also triggering CBL inclusion is to get listings of new
>> URIs into SURBLs sooner. One of the valid criticisms of SURBLs
>> is that there is too much delay between the time a URI is first
>> used and it gets listed in SURBLs. This is a problem with RBLs
>> in general, and it means that the targeted senders (or URIs) have
>> a window of time before detection and list inclusion where they
>> can send unhindered.
>> Our challenge therefore is to find ways to use those
>> while excluding the FPs. Some solutions that have been proposed
>> so far are:
> What strikes me most is the fundamental incompatibility between aiming
> to reduce the window of opportunity before a URI gets onto any lists,
> yet using inclusion on other lists as a way of confirming the validity
> of the data.
I agree that depending on inclusion in other lists can
sometime mean that we're dependent on the other lists and will
therefore lag them if we try to depend on them. On the other
hand things like SBL inclusion does not necessarily have that
result. SBL lists IP ranges belonging to spammers. If a spammer
registers a brand new domain but points web, NS or MX service
into SBL-listed space, then the domain could in principle be
listed immediately, by virtue of IP matching and not the domain
itself matching any other list. IOW matches like that permit
immediate listing of completely new domains that don't appear as
domains in other lists.
The inclusions based on other lists represents a separate
approach to try to reach into the "noise" of low-hit-count
records to see if any useful data can be grabbed from it. It's
generally not our primary use of the data. We will use other
techniques such as looking at the volume of hits per record to
get new records, do some tuning etc.
Suggestions of other methods of correlating the data to dig
deeper into the noise are welcomed.
> How about a multi-level system, where any (non-whitelisted) URI in the
> CBL data is immediately included on the first level, then gradually gets
> promoted to the higher levels once it is corroborated by further
> reports, inclusion in other lists, manual confirmation or whatever.
> The last byte of the A record could be used to indicate the level.
> The number of levels and the details of promotion/demotion strategies
> would obviously need to be worked out and refined over time.
> Logically the lower levels would have higher FP rates, but can be given
> lower SA scores (or equivalent weightings in other client apps).
Right, but it probably should be kept in mind that some
SURBL-using applications may not be doing weight-type scoring.
Some may be doing outright yes/no blocking. I also prefer the
more difficult approach of trying to say a record belongs to hard
core spammers or it doesn't. I'm not a big fan of uncertain or
grey results. Especially given applications that do outright
blocking, listings may be most useful when they're either black
"If it appears in hams, then don't list it."
More information about the Discuss