Discuss

discuss@lists.surbl.org

1867 discussions

Re: SURBL problems
by Jeff Chan 07 Sep '04

07 Sep '04

On Monday, September 6, 2004, 5:57:32 PM, Kai Schaetzl wrote: > Jeff Chan wrote on Mon, 6 Sep 2004 16:56:57 -0700: >> Hang on, that's not a meaningful test since it would be in your >> local resolver cache. >> > First check took 22 msec to get the data from our forwarder. I noticed > that our nscd isn't running, so I restarted it, although I don't think > there should be much difference in the performance. The SA timeout in > MailScanner is 120 secs, that's quite long. Is it possible at all that > URISURBL lookups take that long and aren't killed before that? I assume all programs are subject to some kind of DNS lookup timeouts either in the application or an external resolver. RBL lookups in general should be quite fast and cached. If something is taking longer than 120 seconds it's probably broken. Given that name resolution on RBLs should normally take a few milliseconds, if they're taking more than a few seconds there could definitely be a problem somewhere. nscd is some kind of hostname caching program but I'm not familiar with it. Are things better after you restarted it? Why did it stop? Did it give any error messages? Jeff C. -- Jeff Chan mailto:jeffc@surbl.org http://www.surbl.org/

1 0

Re: SURBL problems
by Jeff Chan 07 Sep '04

07 Sep '04

On Monday, September 6, 2004, 4:20:56 PM, Kai Schaetzl wrote: > I haven't debugged this yet any further since I'm on the brink to leaving > for a vacation. Is it possible that one or more of the SURBL RBLs habe > problems today? I was getting a lot of SA time-outs today and the first > reason which comes to my mind is SURBL lookups. It's the only RBL lookup > we use in SA which is run via MailScanner. If SA times out it gives it a > score of 0.00, tells me the reason (SA timeout) and handles the spam as > clean. For the last hours this happened with about every second incoming > mail (spam and ham likewise, so it's unlikely to be some sort of "bad" > spam). This setup has been running for some weeks now, MailScanner 4.32.x > plus SA3-RC2, without any problems. SA timeout for MailScanner is set to > 120 seconds. There doesn't seem to be a timeout value for SURBL checks - > is this possible? No changes to setup or software during the last days, > machine has low load. What you describe sounds perhaps like a DNS timeout (which SA3 may support through your operating system's resolver). There are currently three SURBL nameservers with problems, but they're all commented out of the authority for the subdomains, so they should not be a problem. If you do: dig test.surbl.org.multi.surbl.org a many times, do you get any delays? Is anyone else seeing problems? Jeff C. -- Jeff Chan mailto:jeffc@surbl.org http://www.surbl.org/

2 3

Open Redirector
by Leonardo Helman 06 Sep '04

06 Sep '04

Hi, lately I'm receiving lots of spam pointing to this open redirector <http://lanzadera.ya.com/> site. This is a spanish site, something like tinyurl. They are hosting for example: www.gueb.de and www.imagenes.de So, the spam points to www.gueb.de/somewhere, and gets redirected from there. How are you people dealing with this? Is this the appropiate list to report this kind of things? Thanks Saludos -- Leonardo Helman Pert Consultores Argentina

3 3

False Positives
by Mariano Absatz 06 Sep '04

06 Sep '04

Hi, browsing my autodetected spam folder, I came across a couple of messages from Randy Cassingham's 'This is True' newsletter (see http://thisistrue.com) that hit on SURBL lists.. Randy uses plain text advertising within is bulletin but he's not a spammer... in fact he wrote and maintains a 'spam primer' page at http://www.spamprimer.com/ The domains that hit this where: REMOVETHIS-JumboJoke.com and REMOVETHIS-BonzerSites.com (in outblaze's list) and REMOVETHIS-HealthYouDeserve.com (in Bill's list). -- Mariano Absatz - El Baby el (dot) baby (AT) gmail (dot) com el (punto) baby (ARROBA:@) gmail (punto) com

2 5

fp hsc news dot com
by Jose Marcio Martins da Cruz 06 Sep '04

06 Sep '04

Hello, "hsc - news dot com" is listed at ws and multi.surbl. This a very serious french security consulting group. Surely a false positive. Can you remove it ? They run a list but only subscription is **only** done when **explicitely** asked by recipients - I know as I'm one of their subscribers. Best regards, Jose-Marcio -- --------------------------------------------------------------- Jose Marcio MARTINS DA CRUZ Tel. :(33) 01.40.51.93.41 Ecole des Mines de Paris http://j-chkmail.ensmp.fr 60, bd Saint Michel http://www.ensmp.fr/~martins 75272 - PARIS CEDEX 06 mailto:Jose-Marcio.Martins@ensmp.fr

5 8

Domain age
by Ryan Thompson 06 Sep '04

06 Sep '04

Hi all, I added some experimental code to GetURI to automatically determine the age of a domain, which works for about 99.2% of the domains I've seen in SURBL, and, with a little bit of gaussian math, the results are fricken' *amazing* for classification! Unfortunately, I need a way to be able to do this without violating registry whois terms of service, because they don't allow automated queries, "except as reasonably required to register and update domains" or somesuch... Any ideas? - Ryan -- Ryan Thompson <ryan(a)sasknow.com> SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America

3 2

Ham corpora needed
by Jeff Chan 06 Sep '04

06 Sep '04

In order to reduce false positives in the SURBL data, we would like to have access to ham corpora. Does anyone know of any public ham copora, including just the URI domain names from the hams? Or is there anyone who would be willing to run our URI domain lists against their ham? Does anyone know if messages from the Enron corpus have been categorized for ham and spam? http://www-2.cs.cmu.edu/~enron/ Thanks in advance for any suggestions, comments, thoughts.... Jeff C. -- Jeff Chan mailto:jeffc@surbl.org http://www.surbl.org/

3 12

whitelisted x10.com
by Jeff Chan 06 Sep '04

06 Sep '04

I've whitelisted x10.com. They are a frequent FP since they "advertise" so much to their own customers. I'm surprised it was not whitelisted earlier. x10.com is probably significant contributor to the ham scores. Jeff C.

1 0

Setting SpamAssassin scores for SURBL lists
by Jeff Chan 05 Sep '04

05 Sep '04

Eric Kolve and I were looking at how to best set the default SpamCopURI scores for the various SURBL lists and at first we tried looking at the SpamAssassin 3.0 perceptron-generated scores as a possible guide: > http://spamassassin.apache.org/full/3.0.x/dist/rules/50_scores.cf > > # The following block of scores were generated using the mass-checking > # scripts, and a perceptron to determine the optimum scores which > # resulted in minimum false positives or negatives. The scores are > # weighted to produce roughly 1 false positive in 2500 non-spam messages > # using the default threshold of 5.0. > score URIBL_AB_SURBL 0 2.007 0 0.417 > score URIBL_OB_SURBL 0 1.996 0 3.213 > score URIBL_PH_SURBL 0 0.839 0 2.000 > score URIBL_SC_SURBL 0 3.897 0 4.263 > score URIBL_WS_SURBL 0 0.539 0 1.462 I was trying to figure out what the different score columns meant, to which Theo Van Dinter cited: > $ perldoc Mail::SpamAssassin::Conf > [...] > If four valid scores are listed, then the score that is used > depends on how SpamAssassin is being used. The first score is used > when both Bayes and network tests are disabled (score set 0). The > second score is used when Bayes is disabled, but network tests are > enabled (score set 1). The third score is used when Bayes is > enabled and network tests are disabled (score set 2). The fourth > score is used when Bayes is enabled and network tests are enabled > (score set 3). We wondered if we could somehow use those scores with SpamCopURI and were unable to come up with a good answer. Theo suggested looking at Spam versus ham rates as a good way to set scores, to which I mentioned: > We have these test results from Justin from 25 June: > > OVERALL% SPAM% HAM% S/O RANK SCORE NAME > 121405 22516 98889 0.185 0.00 0.00 (all messages) > 100.000 18.5462 81.4538 0.185 0.00 0.00 (all messages as %) > 13.453 70.3766 0.4925 0.993 1.00 1.00 SURBL_WS > 3.807 20.3811 0.0334 0.998 0.50 1.00 SURBL_SC > 2.650 14.2565 0.0071 1.000 0.50 1.00 SURBL_AB > 0.019 0.0933 0.0020 0.979 0.50 1.00 SURBL_PH > 12.624 67.6275 0.1001 0.999 0.50 1.00 SURBL_OB > > which shows a pretty high FP rate for WS, less for the others. > Do you happen to have access to any more recent corpus check data > like this? Could be useful to have another snapshot for a more > complete picture. Which was followed up with more data and discussion: > On Saturday, September 4, 2004, 10:13:11 PM, Theo Dinter wrote: >> high spam + low ham is good from an FP standpoint, but having a "significant" >> (for your definition thereof) ham hitrate means the score shouldn't be too >> high. My handwaving scores would be something like: [Theo's wild guess scores for Justin's June data: -- Jeff C.] >> WS 1.2 >> SC 2.5 >> AB 3.5 >> OB 1.8 Theo then gave some of his own stats on a couple different corpora: >> OVERALL% SPAM% HAM% S/O RANK SCORE NAME >> 416072 365031 51041 0.877 0.00 0.00 (all messages) >> 100.000 87.7327 12.2673 0.877 0.00 0.00 (all messages as %) >> set1 30.923 35.2466 0.0000 1.000 0.99 0.00 URIBL_SC_SURBL >> set1 72.231 82.3273 0.0274 1.000 0.98 1.00 URIBL_OB_SURBL >> set1 19.375 22.0847 0.0000 1.000 0.98 1.00 URIBL_AB_SURBL >> set1 74.883 85.2939 0.4310 0.995 0.74 0.00 URIBL_WS_SURBL >> set1 0.001 0.0000 0.0059 0.000 0.48 0.00 URIBL_PH_SURBL > >> OVERALL% SPAM% HAM% S/O RANK SCORE NAME >> 119215 67094 52121 0.563 0.00 0.00 (all messages) >> 100.000 56.2798 43.7202 0.563 0.00 0.00 (all messages as %) >> set3 39.217 69.6605 0.0288 1.000 0.98 1.00 URIBL_OB_SURBL >> set3 10.340 18.3727 0.0000 1.000 0.97 0.00 URIBL_SC_SURBL >> set3 5.998 10.6582 0.0000 1.000 0.94 1.00 URIBL_AB_SURBL >> set3 42.730 75.5522 0.4797 0.994 0.73 0.00 URIBL_WS_SURBL >> set3 0.008 0.0089 0.0058 0.608 0.49 0.00 URIBL_PH_SURBL > >> so for these results, I'd probably do something like: > >> WS 1.3 >> SC 4.0 >> AB 3.0 >> OB 2.2 > >> since the hit rates and S/O are a bit higher for me, related to the fact I ran >> more spam through than Justin did. To which I added: > Those final scores look like an excellent fit to the data to me. and: > Also while the PH spam hit rate [from Justin's stats] is low, > the data is of hand checked phishing scams, which deserve to be > blocked due to their potential danger and damage. > > Therefore I would tend to give PH a medium-high score like > 3 to 5. So we'll probably adjust the default scores on SpamCopURI to something like: WS 1.3 SC 4.0 AB 3.0 OB 2.2 PH 4.5 and we recommend SpamCopURI users do likewise. Please be sure to use the latest version of SpamCopURI with multi.surbl.org: http://sourceforge.net/projects/spamcopuri/ http://search.cpan.org/dist/Mail-SpamAssassin-SpamCopURI/ One thing stood out for me is that the FP rate (ham%) for ws.surbl.org is way too high at about 0.45 to 0.5% across multiple corpora. That FP rate needs to be reduced for WS to be more fully useful. I think Chris or maybe Raymond suggested that they had a way to reduce FPs in WS further. If so, ***please*** try to apply it. We need to get the FPs to be much less than 0.5%. The other lists have FP rates 5 to 50 times lower. Basically the higher the FP rate, the less useful a list is. Does anyone have other corpus stats to share, in particular FP rates? Jeff C. -- Jeff Chan mailto:jeffc@surbl.org http://www.surbl.org/

3 7

Fwd: Re: [SURBL-Discuss] checking plain domains in message bodies against SURBLs reportedly effective
by Jeff Chan 05 Sep '04

05 Sep '04

This is a forwarded message From: Theo Van Dinter <felicity(a)kluge.net> To: SURBL Discussion list <discuss(a)lists.surbl.org>, SpamAssassin Developers <spamassassin-dev(a)incubator.apache.org> Date: Saturday, September 4, 2004, 10:36:53 AM Subject: [SURBL-Discuss] checking plain domains in message bodies against SURBLs reportedly effective ===8<==============Original message text=============== On Sat, Sep 04, 2004 at 10:45:44AM -0600, Ryan Thompson wrote: > Yep. Good idea, overall. There are a few gotchas: > > TLD extensions sometimes map file extensions. We might have to whitelist > command.com, and the entire country of Poland. :-) > > Since the domain is in plain text and doesn't contain a protocol or > subdomain (i.e., 'www'), I haven't yet seen a mail client that will > display it as a clickable URL. This is generally the tact we're taking in SpamAssassin -- if a general MUA doesn't display it as a link, then we don't consider it an URL. Another issue for the generic domains thing is performance -- lots of messages have lots of things like could potentially look like a domain, and querying for them all adds a bit of a load on the client and the server. For instance: /\b([a-zA-Z0-9_.-]{1,256}\.[a-zA-Z]{2,6})\b/ in theory (I haven't tested it), will grab anything that looks like a generic domain name in text. If you check that list against a list of valid TLDs, you'd probably end up with a decent list, but you'd hit the top issue quoted above where "Go take a look at command.com" isn't clear if it's an URL or a filename. -- Randomly Generated Tagline: "Brevity is the soul of lingerie." - Dorothy Parker ===8<===========End of original message text===========

1 0

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Discuss