[forwarded with Paul's permission. Please comment.]
From: Jeff Chan To: Paul Shupak Date: Friday, April 22, 2005, 12:06:46 AM Subject: New xs.surbl list
On Thursday, April 21, 2005, 6:16:16 PM, Paul wrote:
Jeff,
I don't know the current method used to decide when to add domains
to your new list, and I definitely see *much* smaller levels of spam than many others on the various mailing lists.
It's the top 97th percentile of hits, which only gets about a hundred more new records (domains and IPs) than we already have in SURBLs. We can crank that up later when we get the FP issues nailed down better, though improved processing techniques.
However, my own experience, so far, is that absolutely no "zero-hour" spams have been caught,
Yes, that's because the new URI hit counts must overcome the mass of earlier reports. There may be smarter ways to organize this, but simply lowering the threshold of inclusion (e.g., going to the 98th percentile) would get more on the list sooner.
but very many Spamcop reports (about 1/2 hour later) do trigger; So I have to agree with the few who have suggested a much more aggressive decision about when to add.
Yes, I agree too. :-) When I announced the list for testing I said we'd start conservative to get a feeling for the data.
I'm making my own proposal here to avoid the political complications
of which sources I will suggest using:
Clearly, since the attempt is to catch domains, and RHS list is of no
value for verification. I would propose a "point system" where you assign one point each for every hit of both the URI's own IP and each of its name servers IPs for each of the following lists:
sbl.spamhaus.org combined-HIB.dnsiplists.completewhois.com
[ combined-HIB.dnsiplists.completewhois.com is a composite list of bogus IP blocks, hijacked IPs, and blocks with invalid whois data. See: http://www.completewhois.com/bogons/bogons_usage.htm ]
and one point for each SpamCop report. A likely total score of 5 or 6 should probably trigger its inclusion - i.e. it would be possible to get on the list by having the original URI and two of the name servers on each list, despite not yet having any SpamCop reports (in the middle of the day, SpamCop slows down greatly - sometimes to a crawl, with one hour+ turn-around time between reporting and verification - and you get the data some time later!).
Possibly adding some of the even more aggressive lists like FIVETEN
and NOMOREFUNN at a half point (or other lower weight) could help also. You can alway revisit the entries at a later time (i.e. if the "aggressive" points are used to add the entry, the timeout can be set low -- If a more conservative scheme later applies also, the timeout can be raised to its full value).
I think this would give a much better chance of catching "zero-hour"
spam. So far, I have 12 SpamCop reports, that have hit XS (about 10 hours of use), but not a single original spam (out of ~110).
Just an idea.
Bye,
paul
Good suggestions.
Jeff C.