[SURBL-Discuss] Re: New xs.surbl list
jeffc at surbl.org
Fri Apr 22 10:38:15 CEST 2005
[forwarded with Paul's permission. Please comment.]
From: Jeff Chan
To: Paul Shupak
Date: Friday, April 22, 2005, 12:06:46 AM
Subject: New xs.surbl list
On Thursday, April 21, 2005, 6:16:16 PM, Paul wrote:
> I don't know the current method used to decide when to add domains
> to your new list, and I definitely see *much* smaller levels of spam than
> many others on the various mailing lists.
It's the top 97th percentile of hits, which only gets about a
hundred more new records (domains and IPs) than we already have
in SURBLs. We can crank that up later when we get the FP issues
nailed down better, though improved processing techniques.
> However, my own experience, so far,
> is that absolutely no "zero-hour" spams have been caught,
Yes, that's because the new URI hit counts must overcome the
mass of earlier reports. There may be smarter ways to organize
this, but simply lowering the threshold of inclusion (e.g., going
to the 98th percentile) would get more on the list sooner.
> but very many Spamcop
> reports (about 1/2 hour later) do trigger; So I have to agree with the few who
> have suggested a much more aggressive decision about when to add.
Yes, I agree too. :-) When I announced the list for testing I
said we'd start conservative to get a feeling for the data.
> I'm making my own proposal here to avoid the political complications
> of which sources I will suggest using:
> Clearly, since the attempt is to catch domains, and RHS list is of no
> value for verification. I would propose a "point system" where you assign one
> point each for every hit of both the URI's own IP and each of its name servers
> IPs for each of the following lists:
[ combined-HIB.dnsiplists.completewhois.com is a composite list
of bogus IP blocks, hijacked IPs, and blocks with invalid whois
data. See: http://www.completewhois.com/bogons/bogons_usage.htm ]
> and one point for each SpamCop report. A likely total score of 5 or 6 should
> probably trigger its inclusion - i.e. it would be possible to get on the list
> by having the original URI and two of the name servers on each list, despite
> not yet having any SpamCop reports (in the middle of the day, SpamCop slows
> down greatly - sometimes to a crawl, with one hour+ turn-around time between
> reporting and verification - and you get the data some time later!).
> Possibly adding some of the even more aggressive lists like FIVETEN
> and NOMOREFUNN at a half point (or other lower weight) could help also. You
> can alway revisit the entries at a later time (i.e. if the "aggressive" points
> are used to add the entry, the timeout can be set low -- If a more conservative
> scheme later applies also, the timeout can be raised to its full value).
> I think this would give a much better chance of catching "zero-hour"
> spam. So far, I have 12 SpamCop reports, that have hit XS (about 10 hours of
> use), but not a single original spam (out of ~110).
> Just an idea.
mailto:jeffc at surbl.org
More information about the Discuss