[forwarded with Paul's permission. Please comment.]
From: Jeff Chan To: Paul Shupak Date: Friday, April 22, 2005, 12:06:46 AM Subject: New xs.surbl list
On Thursday, April 21, 2005, 6:16:16 PM, Paul wrote:
Jeff,
I don't know the current method used to decide when to add domains
to your new list, and I definitely see *much* smaller levels of spam than many others on the various mailing lists.
It's the top 97th percentile of hits, which only gets about a hundred more new records (domains and IPs) than we already have in SURBLs. We can crank that up later when we get the FP issues nailed down better, though improved processing techniques.
However, my own experience, so far, is that absolutely no "zero-hour" spams have been caught,
Yes, that's because the new URI hit counts must overcome the mass of earlier reports. There may be smarter ways to organize this, but simply lowering the threshold of inclusion (e.g., going to the 98th percentile) would get more on the list sooner.
but very many Spamcop reports (about 1/2 hour later) do trigger; So I have to agree with the few who have suggested a much more aggressive decision about when to add.
Yes, I agree too. :-) When I announced the list for testing I said we'd start conservative to get a feeling for the data.
I'm making my own proposal here to avoid the political complications
of which sources I will suggest using:
Clearly, since the attempt is to catch domains, and RHS list is of no
value for verification. I would propose a "point system" where you assign one point each for every hit of both the URI's own IP and each of its name servers IPs for each of the following lists:
sbl.spamhaus.org combined-HIB.dnsiplists.completewhois.com
[ combined-HIB.dnsiplists.completewhois.com is a composite list of bogus IP blocks, hijacked IPs, and blocks with invalid whois data. See: http://www.completewhois.com/bogons/bogons_usage.htm ]
and one point for each SpamCop report. A likely total score of 5 or 6 should probably trigger its inclusion - i.e. it would be possible to get on the list by having the original URI and two of the name servers on each list, despite not yet having any SpamCop reports (in the middle of the day, SpamCop slows down greatly - sometimes to a crawl, with one hour+ turn-around time between reporting and verification - and you get the data some time later!).
Possibly adding some of the even more aggressive lists like FIVETEN
and NOMOREFUNN at a half point (or other lower weight) could help also. You can alway revisit the entries at a later time (i.e. if the "aggressive" points are used to add the entry, the timeout can be set low -- If a more conservative scheme later applies also, the timeout can be raised to its full value).
I think this would give a much better chance of catching "zero-hour"
spam. So far, I have 12 SpamCop reports, that have hit XS (about 10 hours of use), but not a single original spam (out of ~110).
Just an idea.
Bye,
paul
Good suggestions.
Jeff C.
At 01:38 2005-04-22 -0700, Jeff Chan wrote:
[forwarded with Paul's permission. Please comment.]
I don't know the current method used to decide when to add domains
to your new list, and I definitely see *much* smaller levels of spam than many others on the various mailing lists.
It's the top 97th percentile of hits, which only gets about a hundred more new records (domains and IPs) than we already have in SURBLs. We can crank that up later when we get the FP issues nailed down better, though improved processing techniques.
However, my own experience, so far, is that absolutely no "zero-hour" spams have been caught,
Yes, that's because the new URI hit counts must overcome the mass of earlier reports. There may be smarter ways to organize this, but simply lowering the threshold of inclusion (e.g., going to the 98th percentile) would get more on the list sooner.
but very many Spamcop reports (about 1/2 hour later) do trigger; So I have to agree with the
few who
have suggested a much more aggressive decision about when to add.
Yes, I agree too. :-) When I announced the list for testing I said we'd start conservative to get a feeling for the data.
Could we maybe, just for testing, have two or more lists to test with different percentiles? 97.xs.surbl.org 98.xs.surbl.org etc...
Patrik
On Friday, April 22, 2005, 11:47:25 AM, Patrik Nilsson wrote:
At 01:38 2005-04-22 -0700, Jeff Chan wrote:
[forwarded with Paul's permission. Please comment.]
I don't know the current method used to decide when to add domains
to your new list, and I definitely see *much* smaller levels of spam than many others on the various mailing lists.
It's the top 97th percentile of hits, which only gets about a hundred more new records (domains and IPs) than we already have in SURBLs. We can crank that up later when we get the FP issues nailed down better, though improved processing techniques.
However, my own experience, so far, is that absolutely no "zero-hour" spams have been caught,
Yes, that's because the new URI hit counts must overcome the mass of earlier reports. There may be smarter ways to organize this, but simply lowering the threshold of inclusion (e.g., going to the 98th percentile) would get more on the list sooner.
but very many Spamcop reports (about 1/2 hour later) do trigger; So I have to agree with the
few who
have suggested a much more aggressive decision about when to add.
Yes, I agree too. :-) When I announced the list for testing I said we'd start conservative to get a feeling for the data.
Could we maybe, just for testing, have two or more lists to test with different percentiles? 97.xs.surbl.org 98.xs.surbl.org etc...
Patrik
Probably we'll try XS at the 98th percentile next, take out the SURBL hits, and try to list only domains that are less than a year old.
How toes this sound to folks?
Jeff C. -- "If it appears in hams, then don't list it."
Hi!
Yes, I agree too. :-) When I announced the list for testing I said we'd start conservative to get a feeling for the data.
Could we maybe, just for testing, have two or more lists to test with different percentiles? 97.xs.surbl.org 98.xs.surbl.org etc...
Patrik
Probably we'll try XS at the 98th percentile next, take out the SURBL hits, and try to list only domains that are less than a year old.
How toes this sound to folks?
It might be usefull info that if you allready block with DSBL on MTA level the XS is rather useless. We have been testing overnight, 400.000 spams passed, 2 were mentioned by XS and both would have been high spam allready without XS anyway.
So basicly if you block with DSBL i dont see a point using this.
Bye, Raymond.
On Saturday, April 23, 2005, 2:59:29 AM, Raymond Dijkxhoorn wrote:
Hi!
Yes, I agree too. :-) When I announced the list for testing I said we'd start conservative to get a feeling for the data.
Could we maybe, just for testing, have two or more lists to test with different percentiles? 97.xs.surbl.org 98.xs.surbl.org etc...
Patrik
Probably we'll try XS at the 98th percentile next, take out the SURBL hits, and try to list only domains that are less than a year old.
How toes this sound to folks?
It might be usefull info that if you allready block with DSBL on MTA level the XS is rather useless. We have been testing overnight, 400.000 spams passed, 2 were mentioned by XS and both would have been high spam allready without XS anyway.
So basicly if you block with DSBL i dont see a point using this.
The point is that DSBLs have delays in getting new IPs listed, but the same URIs may tend to get advertised from fresh zombies. Therefore if we get the URIs we will catch spams even before the fresh zombie IPs get listed.
The particular set of data currently in XS won't show much 0 hour spams because it's set so conservatively. It takes a lot of spams already seen to get included. What is more interesting to checking at this conservative setting is how spammy the URIs it detects are. When we crank down the settings and catch more URIs sooner, then we should catch more zero hour spams, including ones where the sender IPs don't show up on RBLs yet (because URIs likely change more slowly than sender IPs).
Jeff C. -- "If it appears in hams, then don't list it."
Hi!
So basicly if you block with DSBL i dont see a point using this.
The point is that DSBLs have delays in getting new IPs listed, but the same URIs may tend to get advertised from fresh zombies. Therefore if we get the URIs we will catch spams even before the fresh zombie IPs get listed.
What delays? No more delays then XS also has with reloading the zonefiles. If there were noticable delays i would have seen hits would i?
I dont say XS is a bad idea, i just post what i have seen, its a test ...
:)
My test results are allmost below 0.... thats all.
The particular set of data currently in XS won't show much 0 hour spams because it's set so conservatively. It takes a lot of spams already seen to get included. What is more interesting to checking at this conservative setting is how spammy the URIs it detects are. When we crank down the settings and catch more URIs sooner, then we should catch more zero hour spams, including ones where the sender IPs don't show up on RBLs yet (because URIs likely change more slowly than sender IPs).
If thats raised i can test again, no problem at all, but the ammount of FP's is also high, disney cough cough ... :)
Bye, Raymond.