Discuss

discuss@lists.surbl.org

1864 discussions

Re: [SURBL-Discuss] ANN: new surbl client (still beta)
by Jeff Chan 13 Feb '05

13 Feb '05

On Saturday, February 12, 2005, 2:41:36 AM, Alain Alain wrote: >> >> - I've added a local skiplist with about top half of the public >> >> "whitelist", no need to query those. >> >> When you say half, that may be more than optimal (should be about >> 5000 records). SpamAssassin is using the top 125, which worked >> out to about the 50%th percentile of all whitelist hits when we >> first set this up. (Now that result is skewed *because* >> SpamAssassin isn't checking those 125 any more, but their >> snapshot of the 125 is still probably useful. >> >> I'd say anything between 100 and 1000 would probably be a good >> compromise between list size and coverage. > The only disadvantage I see from a bigger local skiplist is some local > CPU usage for every uri in a email. Most pc's have plenty of CPU > power ;-) If this could become a problem, I can lower or optimise the > local checking. Are there any other disadvantages? One reason SpamAssassin didn't want to hard code too many domains into their local whitelist was in case we needed to withdraw any, i.e. because they started spamming. The time between code releases can be many months, and some people may never update, so they wanted to be sure to get very hammy domains into that list. (While Yahoo and Microsoft probably aren't going to start spamming any time soon, that may be less certain about some of the less commonly seen domains.) But I'm glad that you're trying to minimize the DNS queries. Jeff C. -- "If it appears in hams, then don't list it."

2 2

Re: [SURBL-Discuss] FP rate?
by Jeff Chan 12 Feb '05

12 Feb '05

On Saturday, February 12, 2005, 3:09:46 AM, Alain Alain wrote: >> > I know that not all FP's are reported and there are >> > probably no exact numbers, but it should give a good idea. Or am I >> > wrong? >> >> The FP reports are probably too few overall to be meaningful in >> terms of differentiating performance between lists. There just >> aren't that many, maybe a few a day on average. >> > Yes, but I wasn't thinking on differentiating between the lists, there > are other results for. What I was thinking on was the number of FP's > that exists on more than one list. This is very usefull information > when combining lists. If almost no FP's do occur on more than one > list (at the same time) requiring appearance on at least 2 lists > would be a very safe one. Good point. Anecdotally, FPs don't tend to appear on multiple lists very often, at least the FPs we've seen reported. This is unmeasured, just a subjective opinion. If we had some of the list data in combined form as I had proposed then we could test it better. I suppose I could just do it. ;-) Jeff C. -- "If it appears in hams, then don't list it."

1 0

Re: [SURBL-Discuss] FP rate?
by Jeff Chan 12 Feb '05

12 Feb '05

On Friday, February 11, 2005, 5:29:29 PM, Alain Alain wrote: >> That said, here are some results Daniel Quinlan posted from the >> mass-checks on the SpamAssassin corpora around 26 January 2005: >> >> > Weekly mass-check results for SURBL: >> >> >OVERALL% SPAM% HAM% S/O RANK SCORE NAME >> > 217996 164295 53701 0.754 0.00 0.00 (all messages) >> >100.000 75.3661 24.6339 0.754 0.00 0.00 (all messages as %) >> > 11.644 15.4490 0.0037 1.000 0.98 3.90 URIBL_SC_SURBL >> > 39.572 52.4976 0.0261 1.000 0.98 3.00 URIBL_JP_SURBL >> > 51.955 68.9236 0.0391 0.999 0.96 2.00 URIBL_OB_SURBL >> > 5.690 7.5492 0.0000 1.000 0.95 2.01 URIBL_AB_SURBL >> > 53.948 71.5238 0.1769 0.998 0.83 0.54 URIBL_WS_SURBL >> > 0.030 0.0396 0.0000 1.000 0.51 0.84 URIBL_PH_SURBL >> > Am I right with the following : > JP has 0.0261% FP on 24.6339% of all msg --> 0.0065% of all msg > (is less than 1 in 15.000) That sounds right, but the particular proportions of spam versus ham may not be meaningful, i.e. they may not be representative of an actual mail stream. So the percentages are probably more usefully compared only to spam or ham and not to a combined total of messages. Certainly the relative percentages within spam or ham are meaningful and mostly useful with the caveat that the spam detection rates are wrong for quickly moving data in SC and AB since the test corpora cover too much time for them. (This is more true for spam than ham since spam domains vary quickly with time, but ham domains are relatively steady.) >> SC and AB have much better real world results than show above >> because their time period is much shorter than the test >> corpora's. > Yes, but maybe the FP's will grow faster ;-) That tends not to be the case. The SpamCop data is filtered multiple times and is human-checked at the front end. The SC FP rates are consistently among the lowest, and the spam detection rates are very high for a very small list. In short it's an effective strategy. >> Also note that the JP data is now removed from the WS data, and >> some old data was removed from WS. So the WS spam and ham hit >> rates have probably both decreased since this check was done. >> JP should be about the same. > That will show in the future. Is also a good thing. Yes, it's fairer to the data sources. >> > And if possible, has anybody statistics from FP's that where on >> > several of the sublists -at the same time-? > [snip] >> I don't think that is known yet. I had proposed setting up some >> test lists with combinations like this, but got no response. ;-) >> >> If it *is* known I think we'd all like to hear about it. :-) > I think it could be known to the great people that check the FP > reports. Normally they check against all sublists (I hope) and fix > them all. When we whitelist a domain, it's excluded from all SURBLs. The original data source is usually notified. > I know that not all FP's are reported and there are > probably no exact numbers, but it should give a good idea. Or am I > wrong? The FP reports are probably too few overall to be meaningful in terms of differentiating performance between lists. There just aren't that many, maybe a few a day on average. Jeff C. -- "If it appears in hams, then don't list it."

1 0

Re: [SURBL-Discuss] ANN: new surbl client (still beta)
by Alain 12 Feb '05

12 Feb '05

Hi Jeff > On Saturday, February 12, 2005, 2:34:20 AM, Alain Alain wrote: > >> Generally speaking it may be better to apply this kind of > >> filtering at the server level since there are economies of scale, > >> especially in terms of things like DNS lookups and caching. If > >> we suddenly get 100k more DNS clients, that could tax the name > >> servers somewhat. If those same 100k users were using 100 > >> servers instead, the DNS loading would be quite a bit less. In > >> that sense centralization is desirable. > > > Mmmm isn't the dns server from the ISP caching the dns requests? I > > would think it doesn't make a big difference (except when a server is > > rsync'ing). The difference could be that end users check their e-mail > > not when arriving on the MTA, but later. > > One difference is that the ISP's mail server may see many of the > same spams within a short period of time, and the lookups would > probably tend to be cached over that time span. Individual users > may POP or IMAP their messages at any random time, so the DNS > cache hit rate may be lower for them. This will only the case for spam e-mail, not for domains inside ham e-mail. > > I think we're agreeing, but I've never tried to quantify the > difference between these. We can propose that there's some > difference but how much is unknown. I would suggest a pretty > strong cache effect for mail servers however. But the good news is : The more users, the more caching. So the burden on the nameservers will grow slower. Alain

1 0

Re: [SURBL-Discuss] FP rate?
by Alain 12 Feb '05

12 Feb '05

Hi Jeff > >> That said, here are some results Daniel Quinlan posted from the > >> mass-checks on the SpamAssassin corpora around 26 January 2005: > >> > >> > Weekly mass-check results for SURBL: > >> > >> >OVERALL% SPAM% HAM% S/O RANK SCORE NAME > >> > 217996 164295 53701 0.754 0.00 0.00 (all messages) > >> >100.000 75.3661 24.6339 0.754 0.00 0.00 (all messages as %) > >> > 11.644 15.4490 0.0037 1.000 0.98 3.90 URIBL_SC_SURBL > >> > 39.572 52.4976 0.0261 1.000 0.98 3.00 URIBL_JP_SURBL > >> > 51.955 68.9236 0.0391 0.999 0.96 2.00 URIBL_OB_SURBL > >> > 5.690 7.5492 0.0000 1.000 0.95 2.01 URIBL_AB_SURBL > >> > 53.948 71.5238 0.1769 0.998 0.83 0.54 URIBL_WS_SURBL > >> > 0.030 0.0396 0.0000 1.000 0.51 0.84 URIBL_PH_SURBL > >> > > > Am I right with the following : > > > JP has 0.0261% FP on 24.6339% of all msg --> 0.0065% of all msg > > (is less than 1 in 15.000) > > That sounds right, but the particular proportions of spam versus > ham may not be meaningful, i.e. they may not be representative > of an actual mail stream. So the percentages are probably more > usefully compared only to spam or ham and not to a combined total > of messages. ok > > Certainly the relative percentages within spam or ham are > meaningful and mostly useful with the caveat that the spam > detection rates are wrong for quickly moving data in SC and AB > since the test corpora cover too much time for them. (This is > more true for spam than ham since spam domains vary quickly with > time, but ham domains are relatively steady.) > ok > >> SC and AB have much better real world results than show above > >> because their time period is much shorter than the test > >> corpora's. > > > Yes, but maybe the FP's will grow faster ;-) > > That tends not to be the case. The SpamCop data is filtered > multiple times and is human-checked at the front end. The SC FP > rates are consistently among the lowest, and the spam detection > rates are very high for a very small list. In short it's an > effective strategy. > ok and I am overall impressed with the low FP rates on all lists. > >> Also note that the JP data is now removed from the WS data, and > >> some old data was removed from WS. So the WS spam and ham hit > >> rates have probably both decreased since this check was done. > >> JP should be about the same. > > > That will show in the future. Is also a good thing. > > Yes, it's fairer to the data sources. > > >> > And if possible, has anybody statistics from FP's that where on > >> > several of the sublists -at the same time-? > > > [snip] > > >> I don't think that is known yet. I had proposed setting up some > >> test lists with combinations like this, but got no response. ;-) > >> > >> If it *is* known I think we'd all like to hear about it. :-) > > > I think it could be known to the great people that check the FP > > reports. Normally they check against all sublists (I hope) and fix > > them all. > > When we whitelist a domain, it's excluded from all SURBLs. The > original data source is usually notified. > > > I know that not all FP's are reported and there are > > probably no exact numbers, but it should give a good idea. Or am I > > wrong? > > The FP reports are probably too few overall to be meaningful in > terms of differentiating performance between lists. There just > aren't that many, maybe a few a day on average. > Yes, but I wasn't thinking on differentiating between the lists, there are other results for. What I was thinking on was the number of FP's that exists on more than one list. This is very usefull information when combining lists. If almost no FP's do occur on more than one list (at the same time) requiring appearance on at least 2 lists would be a very safe one. Alain

1 0

Re: [SURBL-Discuss] ANN: new surbl client (still beta)
by Jeff Chan 12 Feb '05

12 Feb '05

On Saturday, February 12, 2005, 2:34:20 AM, Alain Alain wrote: >> Generally speaking it may be better to apply this kind of >> filtering at the server level since there are economies of scale, >> especially in terms of things like DNS lookups and caching. If >> we suddenly get 100k more DNS clients, that could tax the name >> servers somewhat. If those same 100k users were using 100 >> servers instead, the DNS loading would be quite a bit less. In >> that sense centralization is desirable. > Mmmm isn't the dns server from the ISP caching the dns requests? I > would think it doesn't make a big difference (except when a server is > rsync'ing). The difference could be that end users check their e-mail > not when arriving on the MTA, but later. One difference is that the ISP's mail server may see many of the same spams within a short period of time, and the lookups would probably tend to be cached over that time span. Individual users may POP or IMAP their messages at any random time, so the DNS cache hit rate may be lower for them. I think we're agreeing, but I've never tried to quantify the difference between these. We can propose that there's some difference but how much is unknown. I would propose a pretty strong cache effect for mail servers however. Jeff C. -- "If it appears in hams, then don't list it."

1 0

ANN: new surbl client (still beta)
by test 12 Feb '05

12 Feb '05

Hi I'm adding SURBL support to the urlbody plugin of spampal. Spampal is a freeware windows local proxy anti-spam application, with a userbase of almost 100.000. (http://www.spampal.org/) I use only multi. At the moment only JP, but this will become a configuration option. Probably also with a "two out of all" option. - I've added a local skiplist with about top half of the public "whitelist", no need to query those. - Domains are only queried one time per e-mail (the local pc DNS service would probably cache those, but anyway added...). - If the're moere than 20 domains to check, only the first (or last) 5 are done. Any suggestions are welcome. Alain

3 4

RE: [SURBL-Discuss] rsync
by Chris Santerre 12 Feb '05

12 Feb '05

> >I've got a pretty busy mail system (about 600k messages a day) >so I sent >an rsync request earlier this week but I haven't heard back >yet. I also >sent subscription requests on the announce and zones mailing lists and >haven't heard anything on those. All of these requests came from >support(a)fiber.net. > >If anyone can offer some advice it will be greatly appreciated. Hang in these Adam :) One of the guys is away at a conference, and another is swamped. I'm not trusted with the keys because I will lock myself out of the car with it running :) I'll see what I can do. --Chris

7 11

RE: [SURBL-Discuss] ANN: new surbl client (still beta)
by Chris Santerre 12 Feb '05

12 Feb '05

>I'm adding SURBL support to the urlbody plugin of spampal. >Spampal is a freeware windows local proxy anti-spam >application, with a userbase of almost 100.000. >(http://www.spampal.org/) > >I use only multi. At the moment only JP, but this will become >a configuration option. Probably also with a "two out of all" option. >- I've added a local skiplist with about top half of the >public "whitelist", no need to query those. >- Domains are only queried one time per e-mail (the local pc >DNS service would probably cache those, but anyway added...). >- If the're moere than 20 domains to check, only the first (or >last) 5 are done. > >Any suggestions are welcome. I suggest more people implement it just like you did! great job! Thanks for creating the skip list! --Chris

3 2

FP rate?
by Alain 12 Feb '05

12 Feb '05

Hi Has someone recent statistic on the FP's for the different sublists in multi? And if possible, has anybody statistics from FP's that where on several of the sublists -at the same time-? Something "like" the dns-hits in : http://www.surbl.org/permuted-hits.out.txt (which Google didn't find alink to, maybe adding a link inside one of the www.surbl.org pages would be nice...) I'm thinking of adding scoring each sublist to let the user decide on the FP safety. Is very easy for me to generate and seems easy to configure even for end users: a*[sc] + b*[ws] + c*[ob] + d* [jp] + e * [ab] + f * [ph] >= 100 for example d could 100 --> even a hit on jp is spam b could be 99 --> need at least another entry a,c could be 50 --> need more... Alain

2 1

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Discuss