Proposing a greylist

List overview All Threads
Download

newer

older

RE: [SURBL-Discuss] Proposing a...

RE: RE: [SURBL-Discuss]...

Chris Santerre

2 Sep 2004 2 Sep '04

4:09 p.m.

I am officially proposing a greylist surbl.

We are going to see more and more of this stuff. We might as well deal with it now. I'm suggesting a greylist for all spammers that ride that line. Like the euniverse junk we have been talking about.

1)We DO NOT include it in multi. 2)We SCREAM to the world that it WILL hit some legit, and that only hard liners should use. 3)We DON'T remove domains unless they go completely black, or have no NANAS hits for 3-4 months. 4)See number 2 again. 5)We tell people it is completely optional and to see number 2.

I predict it would be used more for personal emails. IT also gives us an in between mechanism. Rather then list or no list. We get a grey list we desperately need.

THoughts?

Chris Santerre System Admin and SARE Ninja http://www.rulesemporium.com http://www.surbl.org 'It is not the strongest of the species that survives, not the most intelligent, but the one most responsive to change.' Charles Darwin

Show replies by date

Raymond Dijkxhoorn

2 Sep 2 Sep

4:23 p.m.

Hi!

...

I am officially proposing a greylist surbl.

We are going to see more and more of this stuff. We might as well deal with it now. I'm suggesting a greylist for all spammers that ride that line. Like the euniverse junk we have been talking about.

Bingo! I tried to get this one some weeks ago, but time wasnt right i guess, i hope it is now.

...

1)We DO NOT include it in multi.

We can include in multi, why not...

...

2)We SCREAM to the world that it WILL hit some legit, and that only hard liners should use. 3)We DON'T remove domains unless they go completely black, or have no NANAS hits for 3-4 months. 4)See number 2 again. 5)We tell people it is completely optional and to see number 2.

I predict it would be used more for personal emails. IT also gives us an in between mechanism. Rather then list or no list. We get a grey list we desperately need.

Anyone ?

Bye, Raymond.

Mariano Absatz

6:02 p.m.

On Thu, 2 Sep 2004 16:23:45 +0200 (CEST), Raymond Dijkxhoorn raymond@prolocation.net wrote:

...

Hi!

...
I am officially proposing a greylist surbl.

We are going to see more and more of this stuff. We might as well deal with it now. I'm suggesting a greylist for all spammers that ride that line. Like the euniverse junk we have been talking about.

Bingo! I tried to get this one some weeks ago, but time wasnt right i guess, i hope it is now.

...
1)We DO NOT include it in multi.

We can include in multi, why not...

Nope... One of the ideas behind multi is that lazy sysadmins can put only one rule and if that hit, then 'subrl hit'.

I think this MUST NOT be in multi.

...

...
2)We SCREAM to the world that it WILL hit some legit, and that only hard liners should use. 3)We DON'T remove domains unless they go completely black, or have no NANAS hits for 3-4 months. 4)See number 2 again. 5)We tell people it is completely optional and to see number 2.

I predict it would be used more for personal emails. IT also gives us an in between mechanism. Rather then list or no list. We get a grey list we desperately need.

Anyone ?

I concord with Steven's comment... this MUST NOT be called 'greylist' because that term is already beeing used for something completely different... I'd like a name that implies that it WILL have FPs... it can be the elegant 'unconfirmed.surbl.org' as Seteven suggested or something on the line of 'this-has-false-positives.surbl.org' or 'dont-use-this-if-you-re-stupid.surbl.org' :-)

I'm also thinking about the whitelists that applies to SURBLs... AFAIK, there are two kinds of entries in the whitelist that Jeff maintains...

One of them should also be applied to this list and is the one that contains second level domain names that act as TLD's (like co.uk, com.mx, net.ar).

IIRC the other items Jeff adds to the whitelist are simply domains that hit a FP (like the ones I reportes 10 minutes ago). These MUST NOT be whitelisted in the new list, since it would loose its meaning, 'cause that's Jeff's way to promptly react to FP's in any list.

-- Mariano Absatz - El Baby el (dot) baby (AT) gmail (dot) com el (punto) baby (ARROBA:@) gmail (punto) com

Raymond Dijkxhoorn

6:05 p.m.

Hi!

...

...
We can include in multi, why not...

...

Nope... One of the ideas behind multi is that lazy sysadmins can put only one rule and if that hit, then 'subrl hit'.

Lazy sysadmins dont do anything ;)

...

I think this MUST NOT be in multi.

I still think it does, it will save lookups also, within most applications.

...

...
Anyone ?

...

I concord with Steven's comment... this MUST NOT be called 'greylist' because that term is already beeing used for something completely different... I'd like a name that implies that it WILL have FPs... it can be the elegant 'unconfirmed.surbl.org' as Seteven suggested or something on the line of 'this-has-false-positives.surbl.org' or 'dont-use-this-if-you-re-stupid.surbl.org' :-)

No, greylist is perhaps not the best name for a list like that, evil.surbl or something or unfomfirmed fits better.

...

IIRC the other items Jeff adds to the whitelist are simply domains that hit a FP (like the ones I reportes 10 minutes ago). These MUST NOT be whitelisted in the new list, since it would loose its meaning, 'cause that's Jeff's way to promptly react to FP's in any list.

Thats no big problem, its two seperate files anyway...

Bye, Raymond.

Steven Champeon

6:10 p.m.

on Thu, Sep 02, 2004 at 01:02:12PM -0300, Mariano Absatz wrote:

...

...
We can include in multi, why not...

Nope... One of the ideas behind multi is that lazy sysadmins can put only one rule and if that hit, then 'subrl hit'.

I think this MUST NOT be in multi.

[X] STRONGLY AGREE

...

I concord with Steven's comment... this MUST NOT be called 'greylist' because that term is already beeing used for something completely different... I'd like a name that implies that it WILL have FPs... it can be the elegant 'unconfirmed.surbl.org' as Seteven suggested or something on the line of 'this-has-false-positives.surbl.org' or 'dont-use-this-if-you-re-stupid.surbl.org' :-)

Heh. That last is a certain guarantee that it will be used by the most idiotic of mail admins, though ;) Maybe something like

only-a-complete-moron-would-use-this.surbl.org

would be safer, as morons don't tend to think of themselves that way :)

'unconfirmed' is simple, though, and has some precedents in DNSBL usage. It's more a "quarantine on this if you like, but don't blame us as the list hasn't even been looked at yet and is probably the result of an automatic but error-prone process" kind of thing, like unconfirmed.dsbl.org.

http://dsbl.org/faq#lists

-- hesketh.com/inc. v: +1(919)834-2552 f: +1(919)834-2554 w: http://hesketh.com Buy "Cascading Style Sheets: Separating Content from Presentation, 2/e" today! http://www.amazon.com/exec/obidos/ASIN/159059231X/heskecominc-20/ref=nosim/

Jeff Chan

3 Sep 3 Sep

2:39 a.m.

On Thursday, September 2, 2004, 9:10:28 AM, Steven Champeon wrote:

...

only-a-complete-moron-would-use-this.surbl.org

But I don't want lists that only a moron would use.... ;-)

Jeff C.

Alex Broens (Ninja Bootcamp Participant)

2 Sep 2 Sep

4:43 p.m.

Chris Santerre wrote:

...

I am officially proposing a greylist surbl.

We are going to see more and more of this stuff. We might as well deal with it now. I'm suggesting a greylist for all spammers that ride that line. Like the euniverse junk we have been talking about.

1)We DO NOT include it in multi. 2)We SCREAM to the world that it WILL hit some legit, and that only hard liners should use. 3)We DON'T remove domains unless they go completely black, or have no NANAS hits for 3-4 months. 4)See number 2 again. 5)We tell people it is completely optional and to see number 2.

I predict it would be used more for personal emails. IT also gives us an in between mechanism. Rather then list or no list. We get a grey list we desperately need.

THoughts?

Supported...

1. How will it be decided what goes where? 2. Will this double the contribution work? 3. What about redundancies? Would grey be "allowed" to contain the same stuff as "black"?

..... lots more questions........

Alex

Jeff Chan

4:45 p.m.

On Thursday, September 2, 2004, 7:09:27 AM, Chris Santerre wrote:

...

I am officially proposing a greylist surbl.

...

We are going to see more and more of this stuff. We might as well deal with it now. I'm suggesting a greylist for all spammers that ride that line. Like the euniverse junk we have been talking about.

...

1)We DO NOT include it in multi. 2)We SCREAM to the world that it WILL hit some legit, and that only hard liners should use. 3)We DON'T remove domains unless they go completely black, or have no NANAS hits for 3-4 months. 4)See number 2 again. 5)We tell people it is completely optional and to see number 2.

...

I predict it would be used more for personal emails. IT also gives us an in between mechanism. Rather then list or no list. We get a grey list we desperately need.

I'd rather focus on black lists for the upstream mail servers.

Greylists are messier, more time-consuming, difficult to categorize, error-prone, controversial, and subjective than black or white lists. We can already see how much effort a few borderline cases consume. Creating and maintaining these as a third category would multiply that.

If we make greylists, they will be misapplied, legitimate mails will be blocked, people will (somewhat rightly) complain, and our reputation will be damaged.

I know it would perhaps be more fun to play the "find every spammer" game, but I think we should instead focus on improving the quality of the data we already have.

When we can get the FP rate of WS below 0.01%, then maybe we can think about greylists.... ;-)

Jeff C.

Ryan Thompson

7:09 p.m.

Jeff Chan wrote to SURBL Discuss:

...

On Thursday, September 2, 2004, 7:09:27 AM, Chris Santerre wrote:

...
I am officially proposing a greylist surbl.

...
We are going to see more and more of this stuff. We might as well deal with it now. I'm suggesting a greylist for all spammers that ride that line. Like the euniverse junk we have been talking about.

...
1)We DO NOT include it in multi. 2)We SCREAM to the world that it WILL hit some legit, and that only hard liners should use. 3)We DON'T remove domains unless they go completely black, or have no NANAS hits for 3-4 months. 4)See number 2 again. 5)We tell people it is completely optional and to see number 2.

...
I predict it would be used more for personal emails. IT also gives us an in between mechanism. Rather then list or no list. We get a grey list we desperately need.

I'd rather focus on black lists for the upstream mail servers.

Go ahead! :-)

...

Greylists are messier, more time-consuming, difficult to categorize, error-prone, controversial, and subjective than black or white lists. We can already see how much effort a few borderline cases consume. Creating and maintaining these as a third category would multiply that.

I disagree to all of your adjectives. :-)

Messier, error-prone, controversial, and subjective: If used as a *blacklist*, your description wouldfit. By *definition*, however, a greylist is the grey area that can't (yet) be classified as black or white. By *definition* it's where the controversial stuff lives. We need an in-between.

Further, I think *not* having a greylist leads to errors and controversy, because even the most careful submitters will (thanks to human nature) have a tendency to want to put domains *somewhere*. It's damned hard to admit that somedomain.com appears in a dozen local spams, has a bunch of NANAS hits, but, jeez, it's so *close*, but maybe, just maybe, they have some legit uses. A greylist ought to keep the size of our blacklist smaller, so that it really *is* as close to a pure blacklist as we can make it.

Borderline: The borderline cases will now have a proper home, and rely less on submitters' judgement.

Time consuming: *Definitely* not. We submitters beat our heads against the keyboard on a per-domain basis for the difficult to classify cases, in an attempt to list them *somewhere* (either as white or black). A greylist would allow us to spend *less* time on some of the really icky domains, and allow the numbers to work for us.

...

If we make greylists, they will be misapplied, legitimate mails will be blocked, people will (somewhat rightly) complain, and our reputation will be damaged.

This is exactly the objection I expected you'd make. I admire consistency. :-)

However, I take issue with "somewhat rightly complain". What you're talking about, in usability terms, is "affordance". Give somebody a screwdriver, and, with alarming frequency, they'll turn it around and use the handle to beat on something. When someone complains because the handle of their screwdriver is mangled, does that damage the manufacturer's reputation? A coffee mug is *exactly* the right size and shape for throwing. If I throw mine at that wall over there, and it shatters, is it a crappy coffee cup? "Affordance". Tool and mug manufacturers aren't going to restrict and devalue their products just so they don't afford striking and throwing.

If someone takes our greylist and says, "Hey! I could use this to block email", despite the big "May identify legitimate email" warning we're going to scream from the rooftops? We're a cut above: When did you see a coffee cup that said, "May break if thrown"? Actually, that would be less silly than some of the *other* product warnings I've seen...

...

I know it would perhaps be more fun to play the "find every spammer" game, but I think we should instead focus on improving the quality of the data we already have.

A list of grey domains could help accomplish that. See my next mail.

...

When we can get the FP rate of WS below 0.01%, then maybe we can think about greylists.... ;-)

Again, greylists might be one of the means to that end.

- Ryan

-- Ryan Thompson ryan@sasknow.com SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America

Jeff Chan

3 Sep 3 Sep

2:27 a.m.

On Thursday, September 2, 2004, 10:09:52 AM, Ryan Thompson wrote:

...

Further, I think *not* having a greylist leads to errors and controversy, because even the most careful submitters will (thanks to human nature) have a tendency to want to put domains *somewhere*. It's damned hard to admit that somedomain.com appears in a dozen local spams, has a bunch of NANAS hits, but, jeez, it's so *close*, but maybe, just maybe, they have some legit uses. A greylist ought to keep the size of our blacklist smaller, so that it really *is* as close to a pure blacklist as we can make it.

...

Borderline: The borderline cases will now have a proper home, and rely less on submitters' judgement.

A greylist could therefore become a dumping ground for submissions people were too lazy to research or categorize. Just because something is difficult to categorize does not mean it should get blocked, even for individual home users.

That could also result in some full spammers not going onto the blacklists where they belong, basically due to lack of enough effort to properly categorize them.

The usefulness of such a list would tend to undermine itself due to factors like that.

We need to try to think through as many of the consequences as possible ahead of time.

Jeff C.

David Hooton

2 Sep 2 Sep

4:47 p.m.

On Thu, 2 Sep 2004 10:09:27 -0400 , Chris Santerre csanterre@merchantsoverseas.com wrote:

...

I am officially proposing a greylist surbl.

We are going to see more and more of this stuff. We might as well deal with it now. I'm suggesting a greylist for all spammers that ride that line. Like the euniverse junk we have been talking about.

I think this is a good idea.

...

2)We SCREAM to the world that it WILL hit some legit, and that only hard liners should use.

We still need to define rules of engagement.

...

3)We DON'T remove domains unless they go completely black, or have no NANAS hits for 3-4 months.

Or we're proven to be out of line listing them..

-- Regards, David Hooton

Steven Champeon

5:23 p.m.

New subject: just don't call it a greylist (was: Re: Proposing a greylist)

on Fri, Sep 03, 2004 at 12:47:44AM +1000, David Hooton wrote:

...

On Thu, 2 Sep 2004 10:09:27 -0400 , Chris Santerre csanterre@merchantsoverseas.com wrote:

...
I am officially proposing a greylist surbl.

We are going to see more and more of this stuff. We might as well deal with it now. I'm suggesting a greylist for all spammers that ride that line. Like the euniverse junk we have been talking about.

I think this is a good idea.

Ditto. But please, for the love of god, don't call it a greylist. Call it what it is - an unconfirmed, FP-risky, anything-goes list.

unconfirmed.surbl.org

Greylisting is a completely different concept, and given how much trouble we've already had dealing with people in the antispam community because we insist on referring to "exclusion" of a domain from SURBL as "whitelisting", perhaps now is the time to get the terminology right.

Steve

Steven Champeon

5:51 p.m.

New subject: just don't call it a greylist (was: Re: Proposing a greylist)

on Thu, Sep 02, 2004 at 11:23:49AM -0400, Steven Champeon wrote:

...

...
I think this is a good idea.

Ditto. But please, for the love of god, don't call it a greylist. Call it what it is - an unconfirmed, FP-risky, anything-goes list.

unconfirmed.surbl.org

Greylisting is a completely different concept, and given how much trouble we've already had dealing with people in the antispam community because we insist on referring to "exclusion" of a domain from SURBL as "whitelisting", perhaps now is the time to get the terminology right.

More info on greylisting and a definition:

http://projects.puremagic.com/greylisting/

"The term Greylisting is meant to describe a general method of blocking spam based on the behavior of the sending server, rather than the content of the messages. Greylisting does not refer to any particular implementation of these methods. Consequently, there is no single Greylisting product. Instead, there are many products that incorporate some or all of the methods described here."

In a nutshell, greylisting is a way of refusing mail on the first attempt in order to provoke a legit MTA to resend a little later and a zombie not to try again. It's a neat idea, but not likely to survive the next generation of spam cannons and ends up slowing down your legit mail by unpredictable amounts, all dependent on the condition and config of the sending server.

And for anyone who was confused about my disdain for the use of the term "whitelisting" for "exclusion of a domain from a blacklist", let me clarify that, too. For everyone else in the world, whitelisting is the process by which you say you will always accept mail from an address, host or domain, NOT simply excluding it from a blacklist, which just means it's in the mu state and is neither perfectly good or perfectly evil ;)

http://catb.org/~esr/jargon/html/W/whitelist.html http://searchsecurity.techtarget.com/gDefinition/0,294236,sid14_gci896131,00... http://www.wordspy.com/words/whitelist.asp

etc.

Bret Miller

6:05 p.m.

...

1)We DO NOT include it in multi.

Please reconsider this... Including it in multi means a lot less DNS traffic, and that's a serious plus when you're using a greylist that you probably won't use to increase the spam score by much.

Bret ----------

Send your spam to: bretmiller@wcg.org Thanks for keeping the internet spam-free!

Steven Champeon

6:15 p.m.

on Thu, Sep 02, 2004 at 09:05:08AM -0700, Bret Miller wrote:

...

...
1)We DO NOT include it in multi.

Please reconsider this... Including it in multi means a lot less DNS traffic, and that's a serious plus when you're using a greylist that you probably won't use to increase the spam score by much.

No, no, no.

If anything, the unconfirmed list WILL contain FPs. It WILL certainly be a /superset/ of all the other lists. Including it in multi simply means corrupting multi for no real purpose, and is almost certainly going to mean that those least sophisticated users will get FPs as a result.

Jeff Chan

3 Sep 3 Sep

3:14 a.m.

On Thursday, September 2, 2004, 9:15:54 AM, Steven Champeon wrote:

...

on Thu, Sep 02, 2004 at 09:05:08AM -0700, Bret Miller wrote:

...
...
1)We DO NOT include it in multi.

Please reconsider this... Including it in multi means a lot less DNS traffic, and that's a serious plus when you're using a greylist that you probably won't use to increase the spam score by much.

...

No, no, no.

...

If anything, the unconfirmed list WILL contain FPs. It WILL certainly be a /superset/ of all the other lists. Including it in multi simply means corrupting multi for no real purpose, and is almost certainly going to mean that those least sophisticated users will get FPs as a result.

Agreed.

Jeff C.

David Hooton

5:37 a.m.

People,

After watching this thread, I am beginning to see that there are 2 different interpretations of what SURBL's are here for.

1. Purist's view - SURBL's are here to block domains that only appear in spam 2. Spam Nazi - if it's spam block it at all costs

Neither is really right or wrong, but I am beginning to come around to Jeff's view that a list which has no clear rules of engagement is fraught with all kinds of difficulties and risks.

Getting back to the initial domains which sparked the controversy - flowgo, I think that perhaps there is room for a "MainSleaze" list which lists "Junk Mail" line flowgo etc, however one which is just for domains which we're not sure about is definately running like a lemming to a cliff.

Perhaps if we all slow down a bit and drew up some slightly stronger and less "grey" rules of engagement we might get the list up, but for now I agree with Jeff and being that he's our fearless leader we're going to need to try and find a compromise which will be more workable and structured than the currently possible domain dumping ground concept.

-- Regards, David Hooton

Jeff Chan

11:25 a.m.

On Thursday, September 2, 2004, 8:37:09 PM, David Hooton wrote:

...

Getting back to the initial domains which sparked the controversy - flowgo, I think that perhaps there is room for a "MainSleaze" list which lists "Junk Mail" line flowgo etc

If Mainsleazers use fixed mail servers, then just block the mail servers using a global or local RBL, or even block their IP addresses at the transport or routing layer.

It they're using zombies then they're a very good candidate for SURBLs.

How's that for a compromise?

Jeff C.

Rob McEwen

2:27 p.m.

Jeff said:

...

If Mainsleazers use fixed mail servers, then just block the mail servers using a global or local RBL, or even block their IP addresses at the transport or routing layer.

...

It they're using zombies then they're a very good candidate for SURBLs.

...

How's that for a compromise?

Jeff, up until this point, all your concerns and points made a lot of sense. Certainly, there are issues and questions you have raised which need more attention and thought.

However, this last point you made makes little sense. First, there is not much difference, for all practical purposes, between doing what you are suggesting and just throwing all these "mainsleazers" into SURBL... yet no one is suggesting or is in favor that. We are not trying to "end-run around" SURBL by making it more strict in order to circumvent our regular standards. Instead, most of us see the "graylist" as more of an auditing tool or a factoring tool. Recall how some have already mentioned factoring the unconfirmed.surbl.org into SpamAssassin's score, but at a lower value than the regular SURBL score. That way, where a regular SURBL hit might be enough to get a message blocked... an unconfirmed.surbl.org hit would take ADDITIONAL evidence (or rules) to get that message blocked. Also, another use for unconfirmed.surbl.org would be as an auditing tool, where an extra copy of mail that gets "hit" by unconfirmed.surbl.org (but NOT by multi-surbl.org) might go to a folder for review by the mail administrator so that the mail administrator might create additional filtering "rules" for blocking this type of message in the future in a more precision, "surgical strike" manner which doesn't block all mail just for having that particular URI.

Finally, another reason for this greylist, as I and Chris have pointed out in the past, is that spammers will try to circumvent SURBL in the future by providing some little legit service "on the side". Certainly, it would be good to keep these types "on a short lease". If we ONLY do what we have been doing so far, the is a big loophole in SURBL.

A week or two ago, I had other related suggestions about this issue. (I don't know if it got much attention at the time). This post had suggestions for OTHER ways to deal with this potential loophole. (I'll try to find it and repost.)

Rob McEwen

John Lundin

3:50 p.m.

On Fri, Sep 03, 2004 at 08:27:10AM -0400, Rob McEwen wrote:

...

Finally, another reason for this greylist, as I and Chris have pointed out in the past, is that spammers will try to circumvent SURBL in the future by providing some little legit service "on the side". Certainly, it would be good to keep these types "on a short lease". If we ONLY do what we have been doing so far, the is a big loophole in SURBL.

...

From my perspective, this is the compelling reason to have another

list. I'm interested in not missing the URIs that actively appear in spam with occasional appearance in ham, but want to depend on very few false positives out of WS.

Maybe there should be a dark multi, with one bit for confirmed spammers with some ham, and another for early warning entries. It would be nice to be able to evaluate them separately.

As an analog, SARE splits some rulesets (genlsubj, html, header) into `categories of "hit ONLY spam", "have hit ham", and "hit a significant amount of ham." You can choose your level of safety and effectiveness. (If you want to get fancy, encode a confidence level. Two bits? ;-) )

Jeff Chan

4:31 p.m.

On Friday, September 3, 2004, 6:50:25 AM, John Lundin wrote:

...

Maybe there should be a dark multi, with one bit for confirmed spammers with some ham, and another for early warning entries. It would be nice to be able to evaluate them separately.

...

As an analog, SARE splits some rulesets (genlsubj, html, header) into `categories of "hit ONLY spam", "have hit ham", and "hit a significant amount of ham." You can choose your level of safety and effectiveness. (If you want to get fancy, encode a confidence level. Two bits? ;-) )

SARE and SpamAssassin in general have a different approach to detecting spam than SURBLs.

SA is usually used with elaborate rules and technologies to categorize spam based on multiple characteristics in headers and message bodies. SA was built to cut through some of the obfuscation of content and sender information that spammers shifted to when they stopped sending clear text messages from known mail servers. Zombies and compounding obfuscation make that approach a constant challenge.

SURBLs attempt to identify spam by finding exactly those URI domains which are used in spams. They cut right to the unavoidable core of what spammers usually do and that's to advertise a web site.

Because the focus of each technology is slightly different, assumptions made from the perspective of one technology may not fit the other perfectly. For example it's not always the case that SURBLs will be used with programs that can score messages with different weights for different rules. If the false positive rates were low enough, SURBLs could be used to block messages with just URI parsing, including in the MTA. That allows spam to be rejected at the transport layer without sending it through SpamAssassin, thus saving much processing time, cpu resources, etc. MTA uses of SURBL already exist, though we're still waiting for sendmail milters and postfix filters.

It was logical for SURBLs to be used with SpamAssassin because SA provides a nice framework of message parsing, URI extraction, mail program interfaces, etc. but SURBLs can be used directly with MTAs and other mail-handling, spam-blocking programs. In those cases the classifications need to be extremely accurate. False positives are the largest obstacle to that use and so they need to be reduced.

Instead of finding ways to collect greylists full of questionable domains, we should be trying to find ways to improve the quality of the existing lists. That's where the most important and valuable progress can be made.

Jeff C.

Ryan Thompson

5:45 p.m.

Jeff Chan wrote to SURBL Discussion list:

...

It was logical for SURBLs to be used with SpamAssassin because SA provides a nice framework of message parsing, URI extraction, mail program interfaces, etc. but SURBLs can be used directly with MTAs and other mail-handling, spam-blocking programs. In those cases the classifications need to be extremely accurate. False positives are the largest obstacle to that use and so they need to be reduced.

Instead of finding ways to collect greylists full of questionable domains, we should be trying to find ways to improve the quality of the existing lists. That's where the most important and valuable progress can be made.

Nobody, and I mean *nobody* has suggested that our "unconfirmed" list be used to block *anything*. It seems like you're saying that the mere existence of such a list would completely undermine the entire SURBL effort. Despite your mostly well-placed arguments (which I did read and ponder), I don't believe you.

We're talking about two different things. You see SURBL in narrow terms; a list of definitely spammy domains. Others are suggesting that SURBL could be further augmented by including (as a completely separate list) domains with slightly less (collective) certainty.

You point out the differences between SA and SURBL. We're aware of them. Many, *many* people use SA, and will continue to do so, in conjunction with SURBL. For weighted-rule classifiers like SA, having "grey" data which can be scored appropriately further *increases* the accuracy of the overall filter. A domain greylist would therefore be a useful spam fighting tool.

Now, I gather that you think an unconfirmed or grey list would be a "distraction" to SURBL. I'm sorry you feel this way, simply because I believe a uc list would be a very close fit *with* SURBL, and that each project could benefit from mutual support.

UC could benefit from the public presence and established framework SURBL already has. SURBL could benefit from UC as an input data source, and, given some of the rough submission and checking criteria proposed by others and myself, domains from the grey list would very often bubble up to one of the "black" SURBLs with a *higher* degree of certainty than some of the other submissions hand-checked by one or two people.

So, I'd like to see uc as part of the SURBL effort. We don't need any more acronyms. IMHO, acronyms are a PITA, but YMMV. I'm not interested in flogging this poor little thread to death, fighting for the legitimacy of one new list against speculation. The only thing worse than statistics are pre-supposed statistics, especially when they're used in an attempt to dismiss what might be a really good idea.

"Edison! What in blue blazes are you doing with that wire?" "I'm coiling it up to make a new kind of light!" "Poppycock! We need to focus on making better candles, or light will suffer!"

OK, so that's a bit of a dramatization. ;-) We *do* still use candles today. Their purpose has shifted a bit, and, for the most part, they've improved. Mostly, they smell better.

What I'd really like to see is a proof-of-concept uc list with at least a thousand domains in total, submitted from a few different people, so we can take a look at our test data and decide whether we:

a) hit the nail on the head, and continue as-is, b) need to do more work, and tighten the submission guidelines, or c) came up with a really dumb idea, and scrap the list altogether.

We're starting to repeat ourselves in this thread a little bit (myself included), so I need a short break from the discussion. :-)

uc@sasknow.com is open for submissions! I'll post another mail (in a new thread) with some more information for anybody who'd like to participate in the test.

- Ryan

Jeff Chan

5:57 p.m.

On Friday, September 3, 2004, 8:45:26 AM, Ryan Thompson wrote:

...

uc@sasknow.com is open for submissions! I'll post another mail (in a new thread) with some more information for anybody who'd like to participate in the test.

Please don't divide our efforts.

If it's not good enough to be included in a definite blacklist, I'm not interested in it.

Jeff C.

Mariano Absatz

8:36 p.m.

On Fri, 3 Sep 2004 08:57:08 -0700, Jeff Chan jeffc@surbl.org wrote:

...

On Friday, September 3, 2004, 8:45:26 AM, Ryan Thompson wrote:

...
uc@sasknow.com is open for submissions! I'll post another mail (in a new thread) with some more information for anybody who'd like to participate in the test.

Please don't divide our efforts.

If it's not good enough to be included in a definite blacklist, I'm not interested in it.

Jeff C.

Well... don't get mad, Jeff.

The good part of an open technology is that it is, well, open.

I don't think Ryan & Chris are 'dividing efforts', but just doing some research 'standing on other people shoulders' (which many times is the correct place to start).

SURBL, besides the current list which you gather, compile or otherwise endorse by publishing under the subrl.org domain is a new technology with possibly more uses that you or I ever thought of.

Remember when there was only Paul Vixie's RBL out there?... then it transformed into 'MAPS', then appeared other MAPS lists with other logical content (much like the several *.surbl.org) and at some point (maybe before that) other RBLs started to appear...

You have SPEWS with a 'anything spammish in the neibourhood' policy or SBL or a myriad other RBLs, public and private.

The public-use ones usually have a written policy (good or bad, applied or not) about getting into or out of the RBL. It's up to the user to decide whether to use it, in what context and for what purpose.

Now the same thing is about to start with SURBL and I think it is 'A Good Thing (TM)'.

You, as the person responsible for the SURBL project and the surbl.org domain state that you don't endorse the 'uc' list policy. That's fine, you're completely entitled to do so and that is good.

So Ryan can decide that anyway, he'd like to go along with it and he'll be responsible for it (at least during an initial testing phase). That's also fine.

Since you don't endorse this, Ryan should set up the SURBL within his own domain (like, say uc.surbl.sasknow.com or whatever domain he controls and likes) and that's it.

He's effort doesn't have to be even named in the surbl.org site (though, after an initial testing phase, if this seems to work fine it MIGHT be so, but that's totally up to you).

Furthermore, you can politely ask him not to use the SURBL-Discuss mailing list for UC nominations and he can then set up his own mailing list wherever he wants and use it for that.

I don't think you should ban all uc.surbl discussions in this mailing list, but, as a thread gets very uc-related it may be declared O/T.

I think there's much to be won and little to loose by doing things this way... as the SURBL technology proves to be mature and useful, I think more and more SURBL technology lists will start to appear, maybe even for purposes other than identifying spam.

Let's just not get mad about this... let's simply say, the uc list is not an official SURBL-project surbl and it's not even hosted here. Period.

If the idea of uc has enough supporters, it will work, otherwise, it will not, maybe in a couple of months, if it proves successful, we can talk again here about including it or not in the SURBL project and you will be able to decide if you'd rather include it or not and in what context. The final decision, as the SURBL-project responsible person will be yours.

Regars.

-- Mariano Absatz - El Baby el (dot) baby (AT) gmail (dot) com el (punto) baby (ARROBA:@) gmail (punto) com

Bret Miller

10:59 p.m.

To further this idea, comparing it to DNSBLs is appropriate. There are a lot of DSNBLs you can use. Many agree that sbl-xbl spamhaus lists are high quality with no (or almost no) false positives. For that reason, a lot of us use them to block e-mail.

Bl.spamcop.net is another DNSBL, and while it's a fine list, the FP rate is much too high for me to block e-mail on it alone. However, it's highly useful in SpamAssassin, putting many messages over the spam score threshold.

While the SURBL staff has decided that they would rather not "divide their efforts" to support a project like this, it certainly doesn't mean that it shouldn't be done be someone else. At that point, the staff is right to ask that the discussion about its details be moved off this list. Perhaps the better place to discuss this is the SA general list as they are more open to just about any new project that helps SA detect spam accurately.

I know if I relied totally on any one method of filtering spam, a lot of spam would be getting through that isn't. It's because the comination of content and structure analysis with bayesian analysis DNSBL and URIBL all contributes to the end result.

Bret

Jeff Chan

3:56 p.m.

On Friday, September 3, 2004, 5:27:10 AM, Rob McEwen wrote:

...

However, this last point you made makes little sense. First, there is not much difference, for all practical purposes, between doing what you are suggesting and just throwing all these "mainsleazers" into SURBL... yet no one is suggesting or is in favor that. We are not trying to "end-run around" SURBL by making it more strict in order to circumvent our regular standards. Instead, most of us see the "graylist" as more of an auditing tool or a factoring tool. Recall how some have already mentioned factoring the unconfirmed.surbl.org into SpamAssassin's score, but at a lower value than the regular SURBL score. That way, where a regular SURBL hit might be enough to get a message blocked... an unconfirmed.surbl.org hit would take ADDITIONAL evidence (or rules) to get that message blocked. Also, another use for unconfirmed.surbl.org would be as an auditing tool, where an extra copy of mail that gets "hit" by unconfirmed.surbl.org (but NOT by multi-surbl.org) might go to a folder for review by the mail administrator so that the mail administrator might create additional filtering "rules" for blocking this type of message in the future in a more precision, "surgical strike" manner which doesn't block all mail just for having that particular URI.

...

Finally, another reason for this greylist, as I and Chris have pointed out in the past, is that spammers will try to circumvent SURBL in the future by providing some little legit service "on the side". Certainly, it would be good to keep these types "on a short lease". If we ONLY do what we have been doing so far, the is a big loophole in SURBL.

Yes, I understand the points being made, but I feel there are many practical concerns weighing against this idea. I also understand the enthusiasm and fervor of those of us who want to "get every spammer," but I feel that doesn't always fit the model we have built.

Perhaps there's some disagreement on what constitutes a spammer. To me a spammer essentially sends only spam, usually for pills, cable descramblers, mortgages, etc. and steals services using zombies. Their sites are usually hosted at spam-friendly ISPs who won't take down their sites for being a spam destination, or in countries with no apparent spam laws or enforcement.

Anyone who sends mostly legitimate messages should not be blocked, and anyone not using zombies is trivially easily blocked using a conventional RBL of sending server IP addresses or even sender domains. Conventional RBLs typically list the spammers' mail server IP addresses or their sending domain allowing administrators to block on them. Either of those other solutions is vastly simpler and less costly in terms of cpu cycles and disk storage than content checking like we're doing with SURBLs. Conventional RBLs are also well-supported in MTAs, SpamAssassin and most anti-spam programs. The main problem is that zombies are used to get around that technology. Zombies spam from many different and new ip addresses more quickly than conventional RBLs can practically keep up with.

Zombies are the main reason we decided to do SURBLs; because URI checking was the ONLY way remaining to catch spams sent from using rapidly shifting armies of zombied computers. Those who think the source of spams is irrelevant or that zombies don't matter are probably mostly hobbyists with small personal mail servers who can afford processing that would be impractical at ISPs or large mail servers. It's great that people use SURBLs on their personal servers and it's good for them to not get the spam, but actually stopping the spammers will require solutions that will work on a large scale for example on many high volume inbound mail or spam filter servers. Only then will we make enough of a dent in the hard core, highly-abusive, zombie-using spammers to slow or stop them or to make spamming uneconomical for them.

There are at least 100k new zombies discovered every day. Those are the real problem, not someone's joke of the day site. SURBLs are designed to catch the otherwise uncatchable zombie spammers, not the trivially-blocked unwanted newsletter.

These grey cases are frankly a distraction from the goal of stopping the worst offenders and the biggest criminals. They also miss the biggest abusers. The priority should be on catching the biggest, most abusive spammers, and excluding the grey cases which confuse that effort and make it difficult for SURBLs to be more widely adopted due to false positives.

Jeff C.

Rob Mangiafico

4:30 p.m.

On Fri, 3 Sep 2004, Jeff Chan wrote:

...

These grey cases are frankly a distraction from the goal of stopping the worst offenders and the biggest criminals. They also miss the biggest abusers. The priority should be on catching the biggest, most abusive spammers, and excluding the grey cases which confuse that effort and make it difficult for SURBLs to be more widely adopted due to false positives.

100% agree! There is a reason many isp's and web hosts use spamhaus as a block at the MTA level, because the low FP rate is amazing. With SURBL, there is a strong possibility that with very low FP rates, it can be the number two tool for isp's and hosts to cut out large volumes of junk without impacting the every day users (some who like jokes every day).

Jeff, stick to your guns, work on getting these zombies out of our daily lives, and you'll have the best anti-spam tool on the market.

Rob Mangiafico CTO LexiConn

Bret Miller

5:42 p.m.

...

After watching this thread, I am beginning to see that there are 2 different interpretations of what SURBL's are here for.
1. Purist's view - SURBL's are here to block domains that only
appear in spam 2. Spam Nazi - if it's spam block it at all costs

You hit the nail on the head there... Even in many of the add-on rule sets, there are rules which "hit only spam" and rules which "hit mainly spam, but hit on some ham". Jeff argues that only sites that "hit only spam" belong in SURBL. I don't see it that way. I definitely thing there's room for a SURBL that hits on some ham-- you just score it lower so that a somewhat spammy-looking message might get pushed over the threshhold because it includes a site that appears in a lot of spam but also appears in some ham.

Bret ----------

Send your spam to: bretmiller@wcg.org Thanks for keeping the internet spam-free!

Ryan Thompson

2 Sep 2 Sep

6:46 p.m.

Chris Santerre wrote to SURBL Discussion list (E-mail):

...

I am officially proposing a greylist surbl.

+1, [x], check, ditto, and good idea.

I've been wishing for one of these for a while.

...

We are going to see more and more of this stuff. We might as well deal with it now. I'm suggesting a greylist for all spammers that ride that line. Like the euniverse junk we have been talking about.

1)We DO NOT include it in multi.

It would reduce overall traffic and ease administration to include it in multi. Multi is just a bitmapped convenience to access multiple SURBLs. However, I'll concede to your point.

...

2)We SCREAM to the world that it WILL hit some legit, and that only hard liners should use.

Yep. By now, this concept shouldn't be too hard to understand. That doesn't mean a select group of admins won't do stupid things with the data. (Just like we've all seen servers unequivocally block based on some of the "list almost everything" dynamic IP RBLs). But, the more loudly we scream this point, the less difficulty we'll have.

...

3)We DON'T remove domains unless they go completely black, or have no NANAS hits for 3-4 months.

..or are later shown to be completely whitehat. But, yeah, let's not forget that this proposed list will contain sites that are used by spammers, but way well have some small legit uses, as well... right?

...

4)See number 2 again. 5)We tell people it is completely optional and to see number 2.

...

I predict it would be used more for personal emails. IT also gives us an in between mechanism. Rather then list or no list. We get a grey list we desperately need.

Yes. I would use it in production with SA, and just assign a lower score (maybe 1-2 points, depending on my *own* mass-check).

...

THoughts?

Good idea. Let's do it. I'll be able to submit more domains to *this* list than I could to ws. It will finally give meaning to that pile of domains I always end up with and get ulcers trying to classify as black or white.

- Ryan

Ryan Thompson

7:28 p.m.

OK, here are a few thoughts, after reading this thread, and making a few specific replies.

I fully support the idea of a list of grey domains. Even if it starts as a brand new list, and submissions to other SURBLs aren't affected, it's a good idea. However, after reading some of Jeff's objections, I think I can extend a few ideas to make this even more effective.

I like something along the lines of the "unconfirmed" (or "uc") idea. What if *all* as-yet-unverified SURBL submissions (for ws, and anybody else who wants to play with us) went to "uc", and we kept score of the number of submissions (perhaps modified by some assigned trust multiplier for the "coolness" or historical accuracy of a particular submitter). Once a domain reaches a certain score, it's added (probably manually, at first) to ws. Regular whitelisting mechanisms could still apply for both lists.

Concern: this slows down inclusion of domains into ws. I don't think it has to. The SURBL folks re-check submitted domains anyway. Here's what I see happening with submissions:

1. From relatively trusted submitters, they get added to uc right away, and go in the queue for ws 2. ws folks hand-check the submissions as usual. If they believe a domain is worthy of outright blacklisting, it's added to ws immediately, as usual. Otherwise, it stays on uc. 3. For domains already on uc, if more submissions come in for the domain, we have another metric to help accurately classify the domain.

We could come up with various levels of automation for this, but, at first, all three of these things could be done manually without very much extra work, compared to what we're doing now, as far as I know.

We gain more immediate benefit from submissions (i.e., they're more quickly worth *some* points in additive classifiers like SpamAssassin), and don't sacrifice any accuracy or efficiency in the outright blacklisting of domains. In fact, if we do it right, I believe we can more efficiently (i.e., faster, or at least with less person time), list domains accurately.

- Ryan

Chris Santerre wrote to SURBL Discussion list (E-mail):

...

I am officially proposing a greylist surbl.

We are going to see more and more of this stuff. We might as well deal with it now. I'm suggesting a greylist for all spammers that ride that line. Like the euniverse junk we have been talking about.

1)We DO NOT include it in multi. 2)We SCREAM to the world that it WILL hit some legit, and that only hard liners should use. 3)We DON'T remove domains unless they go completely black, or have no NANAS hits for 3-4 months. 4)See number 2 again. 5)We tell people it is completely optional and to see number 2.

I predict it would be used more for personal emails. IT also gives us an in between mechanism. Rather then list or no list. We get a grey list we desperately need.

THoughts?

Chris Santerre System Admin and SARE Ninja http://www.rulesemporium.com http://www.surbl.org 'It is not the strongest of the species that survives, not the most intelligent, but the one most responsive to change.' Charles Darwin _______________________________________________ Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss

Ryan Thompson

7:35 p.m.

Ryan Thompson wrote to SURBL Discussion list:

...

From relatively trusted submitters, they get added to uc right away,

and go in the queue for ws 2. ws folks hand-check the submissions as usual. If they believe a domain is worthy of outright blacklisting, it's added to ws immediately, as usual. Otherwise, it stays on uc.

Yes, another thing I thought of (but forgot to mention) is that it would probably help if there were still two submission buckets... one for ws, and one for uc. That way, submitters who themselves aren't completely confident in a domain can submit directly to uc, and possibly save a little bit of needless hand-checking. This (all of this, actually) would be a policy decision..so these are just my suggestions. :-)

- R

Jeff Chan

3 Sep 3 Sep

2:38 a.m.

On Thursday, September 2, 2004, 10:28:08 AM, Ryan Thompson wrote:

...

I like something along the lines of the "unconfirmed" (or "uc") idea. What if *all* as-yet-unverified SURBL submissions (for ws, and anybody else who wants to play with us) went to "uc", and we kept score of the number of submissions (perhaps modified by some assigned trust multiplier for the "coolness" or historical accuracy of a particular submitter). Once a domain reaches a certain score, it's added (probably manually, at first) to ws. Regular whitelisting mechanisms could still apply for both lists.

...

Concern: this slows down inclusion of domains into ws. I don't think it has to. The SURBL folks re-check submitted domains anyway. Here's what I see happening with submissions:

Something like this happened with BigEvil and MidEvil. MidEvil was Paul Barbeau's attempt to get new entries quicker than Chris could into BigEvil. They ran as two separate but related rulesets for quite a while.

Eventually when Chris had time, he would merge the MidEvil rules in batches into BigEvil. Chris, did you end up re-checking most of Paul's work? If so that arguably made for more work, not less.

I can't really speak to the internals of that process other than to note that it eventually gave way to be.surbl.org, then ws.surbl.org.

...

We could come up with various levels of automation for this, but, at first, all three of these things could be done manually without very much extra work, compared to what we're doing now, as far as I know.

Any lists that are not hand-checked will be full of errors. Automation should only be a first pass. The final pass must always be human checked.

Even then, human categorizers make mistakes. A greylist would increase those mistakes by having mushy criteria for inclusion and therefore encourage sloppy or incomplete work.

Yes, sometimes making a black or white decision is difficult, but it needs to be done in order to gain maximum downstream utility, IMO.

Jeff C.

7870

Age (days ago)

7871

Last active (days ago)

discuss@lists.surbl.org

31 comments

12 participants

tags (0)

participants (12)

Alex Broens (Ninja Bootcamp Participant)
Bret Miller
Chris Santerre
David Hooton
Jeff Chan
John Lundin
Mariano Absatz
Raymond Dijkxhoorn
Rob Mangiafico
Rob McEwen
Ryan Thompson
Steven Champeon