On Tuesday, April 19, 2005, 2:02:10 AM, John Wilcock wrote:
Jeff Chan wrote:
One of the goals of looking at URIs appearing on the CBL traps in messages also triggering CBL inclusion is to get listings of new URIs into SURBLs sooner. One of the valid criticisms of SURBLs is that there is too much delay between the time a URI is first used and it gets listed in SURBLs. This is a problem with RBLs in general, and it means that the targeted senders (or URIs) have a window of time before detection and list inclusion where they can send unhindered.
...
Our challenge therefore is to find ways to use those while excluding the FPs. Some solutions that have been proposed so far are:
...
What strikes me most is the fundamental incompatibility between aiming to reduce the window of opportunity before a URI gets onto any lists, yet using inclusion on other lists as a way of confirming the validity of the data.
I agree that depending on inclusion in other lists can sometime mean that we're dependent on the other lists and will therefore lag them if we try to depend on them. On the other hand things like SBL inclusion does not necessarily have that result. SBL lists IP ranges belonging to spammers. If a spammer registers a brand new domain but points web, NS or MX service into SBL-listed space, then the domain could in principle be listed immediately, by virtue of IP matching and not the domain itself matching any other list. IOW matches like that permit immediate listing of completely new domains that don't appear as domains in other lists.
The inclusions based on other lists represents a separate approach to try to reach into the "noise" of low-hit-count records to see if any useful data can be grabbed from it. It's generally not our primary use of the data. We will use other techniques such as looking at the volume of hits per record to get new records, do some tuning etc.
Suggestions of other methods of correlating the data to dig deeper into the noise are welcomed.
How about a multi-level system, where any (non-whitelisted) URI in the CBL data is immediately included on the first level, then gradually gets promoted to the higher levels once it is corroborated by further reports, inclusion in other lists, manual confirmation or whatever. The last byte of the A record could be used to indicate the level. The number of levels and the details of promotion/demotion strategies would obviously need to be worked out and refined over time.
Logically the lower levels would have higher FP rates, but can be given lower SA scores (or equivalent weightings in other client apps).
John.
Right, but it probably should be kept in mind that some SURBL-using applications may not be doing weight-type scoring. Some may be doing outright yes/no blocking. I also prefer the more difficult approach of trying to say a record belongs to hard core spammers or it doesn't. I'm not a big fan of uncertain or grey results. Especially given applications that do outright blocking, listings may be most useful when they're either black or white.
Jeff C. -- "If it appears in hams, then don't list it."