Hello Group, This is my first post here, let me introduce myself. I've been very active in fighting spam for the past 3-4 years, I just recently became a submitter to the WS surbl list and I am hoping to start some action to help get a few things improved for all of us. I've created a number of rulesets for SARE (tripwire, OEM, Spoof) and released my own rules to the public, now I am focusing on SURBL_WS and doing what I can to improve all aspects of it.
First off I am looking for people who use SpamAssassin and have access to a corpus of ham e-mail. The score for the WS SURBL test in 3.0 is very low, I am really focused on improving this and hope to get the score to 2 or 3 by the release of SA 3.1 I want to track down False Positives and get these whitelisted.
The statistics-set1.txt file shows:
The first number is % of e-mails hit (spam and ham), the next is % of spam, followed by % of ham. 51.999 78.4712 0.4756 URIBL_WS_SURBL
With detection at 78.5% this rule is only outperformed by RAZOR2 and this shows our potential. If we could reduce those FPs it would greatly increase or scores and improve everyones perception of how good we are. Even the most basic rule of HTML_MESSAGE only hit on 75% of spam, we are doing more than great so far!
Frederic Tarasevicius Internet Information Services, Inc. http://www.i-is.com/ 810-794-4400 mailto:info@i-is.com
Fred,
This is my first post here, let me introduce myself. I've been very active in fighting spam for the past 3-4 years, I just recently became a submitter to the WS surbl list and I am hoping to start some action to help get a few things improved for all of us. I've created a number of rulesets for SARE (tripwire, OEM, Spoof) and released my own rules to the public, now I am focusing on SURBL_WS and doing what I can to improve all aspects of it.
Welcome aboard.
First off I am looking for people who use SpamAssassin and have access to a corpus of ham e-mail. The score for the WS SURBL test in 3.0 is very low, I am really focused on improving this and hope to get the score to 2 or 3 by the release of SA 3.1 I want to track down False Positives and get these whitelisted.
We also want to get a higher score for that list, especially since its really effective. So if you can help out weeding the 'bad' ones, really appreciated.
Bye, Raymond.
On Saturday, August 28, 2004, 12:48:08 PM, Raymond Dijkxhoorn wrote: (Fred writes:)
First off I am looking for people who use SpamAssassin and have access to a corpus of ham e-mail. The score for the WS SURBL test in 3.0 is very low, I am really focused on improving this and hope to get the score to 2 or 3 by the release of SA 3.1 I want to track down False Positives and get these whitelisted.
We also want to get a higher score for that list, especially since its really effective. So if you can help out weeding the 'bad' ones, really appreciated.
FPs hurt effectiveness because they discourage people from using WS or SRUBLs in the first place. It is therefore crucial to get the FPs out. It's easy to lose focus on accuracy when we take the idea "get as many spam domains as possible". That's good, but the primary goal should be to reduce FPs. The low automatic scoring of WS is a useful indication of the FP issue.
IMO, the best way to stop FPs is to keep them off the list in the first place. Here are some ideas for that:
1. Older registered domains should require a large amount of evidence before they are added. Outblaze only lists domains that have registrations 90 days old or newer. That policy prevents many FPs since the professional spammer seem to change domains frequently. There is statistical evidence that the spam domain is only used for 3 days on average. Therefore, listing an old, established, real company with a 1990s registration should seem highly suspect, for example.
2. If a domain has legitimate uses, it should not be added to any list. Yes that means a spam or two will be missed in a few borderline cases, but it's better to miss a few spams than to be used to block someone's possibly legitimate mail.
3. Legitimacy is something that's best determined by manual, human checking. Purely automated tools are probably not adequate. Therefore all list submissions should have careful, experienced human-checking.
Can anyone think of other ideas? Perhaps we should make these into some rules for list inclusion.
Jeff C.
Jeff wrote:
If a domain has legitimate uses, it should not be added to any list. Yes that means a spam or two will be missed in a few borderline cases, but it's better to miss a few spams than to be used to block someone's possibly legitimate mail.
I agree that it is better to err on the side of allowing a few spams through if it means preventing a false positive. This is especially true considering that SURBL ought to be merely one of a few or several items in one's spam fighting toolkit. However, I also think that this particular part of the discussion is going to need more hashing out because we seem to vacillate often here. But, in all fairness, such is to be expected because there are some very tough issues here!
For example, obviously, there are going to be many Fortune 500 companies who will get away with the worst kinds of harvesting of e-mails from web sites for spamming. Surely, most of the time, their legal departments will prevent this because their "deep pockets" cannot afford to pursue such risky business practices. But in the event that one DOES do this, we would obviously not want to include them in SURBL, even with their bad behavior.
But consider another example which leans toward the other side of the pendulum. An e-mail marketing company tries to play it both ways by (1) sometimes uses harvested addresses (with spamtrap addresses included) when doing business with shady companies ..AND... (2) other times uses legitimate opt-in addresses with other seemingly legitimate companies.... other than the fact that this "legitimate" company chose to do business with such a trashy marketing company ;)
In this last example, what would the official policy of SURBL be?
I'd say that, if all the e-mails in question were pure sales pitches, then blacklist the marketing company on SURBL, but don't blacklist the actual legitimate company. Agree?
But where this can be really tough is if the e-mail marketing company takes over distribution of the legit company's official newsletter, with URIs of the e-mail marketing company included (beacons, for example). This is where it gets more complicated. What should be done in THAT case?
Nevertheless, isn't there also a point where e-mail marketing companies should NOT get away with flagrant and repeated violations just because they decided to play it "both ways". Couldn't this become a strategic and premeditated way for these companies to do an "end run" around SURBL... "Do a little legitimate business on the side and SURBL will say off our back."
Are there other examples which are even more controversial and/or difficult to decide on?
Certainly, I don't have all the answers, but I think I've asked some good questions.
(I don't mean to stir up trouble. I just want us reach a consensus on this.)
Rob McEwen
On Saturday, August 28, 2004, 6:06:20 PM, Rob McEwen wrote:
For example, obviously, there are going to be many Fortune 500 companies who will get away with the worst kinds of harvesting of e-mails from web sites for spamming. Surely, most of the time, their legal departments will prevent this because their "deep pockets" cannot afford to pursue such risky business practices. But in the event that one DOES do this, we would obviously not want to include them in SURBL, even with their bad behavior.
But consider another example which leans toward the other side of the pendulum. An e-mail marketing company tries to play it both ways by (1) sometimes uses harvested addresses (with spamtrap addresses included) when doing business with shady companies ..AND... (2) other times uses legitimate opt-in addresses with other seemingly legitimate companies.... other than the fact that this "legitimate" company chose to do business with such a trashy marketing company ;)
In this last example, what would the official policy of SURBL be?
I'd say that, if all the e-mails in question were pure sales pitches, then blacklist the marketing company on SURBL, but don't blacklist the actual legitimate company. Agree?
But where this can be really tough is if the e-mail marketing company takes over distribution of the legit company's official newsletter, with URIs of the e-mail marketing company included (beacons, for example). This is where it gets more complicated. What should be done in THAT case?
Nevertheless, isn't there also a point where e-mail marketing companies should NOT get away with flagrant and repeated violations just because they decided to play it "both ways". Couldn't this become a strategic and premeditated way for these companies to do an "end run" around SURBL... "Do a little legitimate business on the side and SURBL will say off our back."
Those are good questions, and yes they can get difficult.
The quick answer is to not list any of them if doing so would cause too much collateral damage of legitimate messages being blocked, but to convince the legitimate companies to not do business with spam operators. Certainly if the legal department of any of the legitimate companies was informed that they were doing business with a near criminal organization would stop it immediately. But we had better be very accurate and correct in our reports of they will quickly learn to ignore such reports.
And any quasi-spamhaus that was not using zombies could simply have their mail servers blocked by regular RBLs.
In that sense SURBLs were meant especially to help with the hard core professional criminal spammers who use zombies. They don't have legitimate mail servers that can be consistently blocked on, so we need to block on their web sites.
We perhaps burn too much energy on the borderline quasi-legitimate cases when they're not responsible for nearly as much abuse or spam as the really bad guys.
IMO it's better to whitelist them somewhat generously and focus on the hard core criminals who are not catchable in other ways.
I know this approach is frustrating to some of the dedicated (fixated? ;-) spam fighters, but it's necessary.
Comments?
Jeff C.
Hi!
For example, obviously, there are going to be many Fortune 500 companies who will get away with the worst kinds of harvesting of e-mails from web sites for spamming. Surely, most of the time, their legal departments will prevent this because their "deep pockets" cannot afford to pursue such risky business practices. But in the event that one DOES do this, we would obviously not want to include them in SURBL, even with their bad behavior.
What are your thoughts about leveling the lists, so for example we can make a new evil.surbl.org, where we also state 'dont use this at home, unless...' then we can shift those 'grey area domains' to the new list and we all can be happy.
There will be more and more trying to be gray, and its not like a hardcore spammer can send out 1 legit mailing and be whitelisted all at once...
Any idea's ?
Bye, Raymond.
----- Original Message ----- From: "Raymond Dijkxhoorn" raymond@prolocation.net To: rob@pvsys.com; "SURBL Discussion list" discuss@lists.surbl.org Sent: Sunday, August 29, 2004 10:32 AM Subject: RE: [SURBL-Discuss] SURBL WS test scores in SA 3.0
Hi!
For example, obviously, there are going to be many Fortune 500 companies
who
will get away with the worst kinds of harvesting of e-mails from web
sites
for spamming. Surely, most of the time, their legal departments will
prevent
this because their "deep pockets" cannot afford to pursue such risky business practices. But in the event that one DOES do this, we would obviously not want to include them in SURBL, even with their bad
behavior.
What are your thoughts about leveling the lists, so for example we can make a new evil.surbl.org, where we also state 'dont use this at home, unless...' then we can shift those 'grey area domains' to the new list and we all can be happy.
There will be more and more trying to be gray, and its not like a hardcore spammer can send out 1 legit mailing and be whitelisted all at once...
Supported.... I'd even say ws.subrl.org should be this list..... and let spamcop and the rest be more lenient. Adding another list would probably just complicate the choice, while making ws. (if Bill approves) the more strict list, users have the choice to set their score accordingly.
Alex
On Sunday, August 29, 2004, 1:41:43 AM, Alex Broens wrote:
From: "Raymond Dijkxhoorn" raymond@prolocation.net
For example, obviously, there are going to be many Fortune 500 companies
who
will get away with the worst kinds of harvesting of e-mails from web
sites
for spamming. Surely, most of the time, their legal departments will
prevent
this because their "deep pockets" cannot afford to pursue such risky business practices. But in the event that one DOES do this, we would obviously not want to include them in SURBL, even with their bad
behavior.
What are your thoughts about leveling the lists, so for example we can make a new evil.surbl.org, where we also state 'dont use this at home, unless...' then we can shift those 'grey area domains' to the new list and we all can be happy.
There will be more and more trying to be gray, and its not like a hardcore spammer can send out 1 legit mailing and be whitelisted all at once...
Supported.... I'd even say ws.subrl.org should be this list..... and let spamcop and the rest be more lenient. Adding another list would probably just complicate the choice, while making ws. (if Bill approves) the more strict list, users have the choice to set their score accordingly.
I disagree. Making lists overly inclusive and increasing the false positives is how many anti-spam efforts fail. We should stay focussed on catching the hard core spammers since they are responsible for most of the abuse.
Also anyone not using zombies can be easily blocked with conventional RBLs at a vastly lower computational cost. There really isn't much point in adding anyone who sends spam from fixed IP addresses since they are dropped so much easier and faster with a regular RBL.
Jeff C.
----- Original Message ----- From: "Jeff Chan" jeffc@surbl.org To: "SURBL Discussion list" discuss@lists.surbl.org Sent: Sunday, August 29, 2004 11:15 AM Subject: Re: [SURBL-Discuss] SURBL WS test scores in SA 3.0
On Sunday, August 29, 2004, 1:41:43 AM, Alex Broens wrote:
From: "Raymond Dijkxhoorn" raymond@prolocation.net
For example, obviously, there are going to be many Fortune 500
companies
who
will get away with the worst kinds of harvesting of e-mails from web
sites
for spamming. Surely, most of the time, their legal departments will
prevent
this because their "deep pockets" cannot afford to pursue such risky business practices. But in the event that one DOES do this, we would obviously not want to include them in SURBL, even with their bad
behavior.
What are your thoughts about leveling the lists, so for example we can make a new evil.surbl.org, where we also state 'dont use this at home, unless...' then we can shift those 'grey area domains' to the new list
and
we all can be happy.
There will be more and more trying to be gray, and its not like a
hardcore
spammer can send out 1 legit mailing and be whitelisted all at once...
Supported.... I'd even say ws.subrl.org should be this list..... and let spamcop and the rest be more lenient. Adding another list would probably just complicate the choice, while
making
ws. (if Bill approves) the more strict list, users have the choice to
set
their score accordingly.
I disagree. Making lists overly inclusive and increasing the false positives is how many anti-spam efforts fail. We should stay focussed on catching the hard core spammers since they are responsible for most of the abuse.
Jeff, If you have 25k users....... see 15k of each spam flood and the user base is totally mixed then does that come from "hard core" spammers?
- Zombies or fixed IP? imho its irrelevant. - Who defines "most abuse" & how? - There's spammers who have been around for years, from fixed IPs and although they're so called "whitehats", business with a reputation and an attitude (Dell?) and users report that no matter what you do, an opt-out isn't respected....
Also anyone not using zombies can be easily blocked with conventional RBLs at a vastly lower computational cost.
Dunno..... In the last few days I've seen trash coming from dialups which weren't in any RBL. Only a fast entry in my local SURBL zone stopped the flood from reaching more than a couple of users. (1 minute update)
There really isn't much point in adding anyone who sends spam from fixed IP addresses since they are dropped so much easier and faster with a regular RBL.
IF they ever make it to an RBL. my thought is that they should complement each other. Lots of stuff from fixed IPs never makes it to Spamcop or Spamhaus if nobody reports it. They're not any better than SURBL or the other way round.
If you use Spamcop intensely, depending where you're based and what your user base it like, you'd be in trouble. Same "could" apply for SURBL. None will ever be the prefect solution, both will do magic if used correctly.
An admin filtering for an Austria based old ppl's home will hardly get a false positive from SURBL or Spamcop, while a US ISP will.
Oh well... politics... the more of them happening, the faster heads get heated up or small parties get formed. Will personally keep on reporting and hope my judgement doesn't cause anybody grief, and if it would, just kick me out.
Lets all enjoy Sunday and a great Formula 1 race in Belgium :-)
Alex
Hi!
Also anyone not using zombies can be easily blocked with conventional RBLs at a vastly lower computational cost.
Dunno..... In the last few days I've seen trash coming from dialups which weren't in any RBL. Only a fast entry in my local SURBL zone stopped the flood from reaching more than a couple of users. (1 minute update)
Yes, i noticed the same, they start with fresh proxy's it seems for every run. Or at least a couple of fresh ones... ;)
So you also send in the ones you list locally towards SURBL ? If not mail me in private so we can get those in also.
Oh well... politics... the more of them happening, the faster heads get heated up or small parties get formed. Will personally keep on reporting and hope my judgement doesn't cause anybody grief, and if it would, just kick me out.
Lets all enjoy Sunday and a great Formula 1 race in Belgium :-)
Ohw well, lets hope the weather wont spoil it.
Bye, Raymond.
On Sunday, August 29, 2004, 4:33:15 AM, Raymond Dijkxhoorn wrote:
Also anyone not using zombies can be easily blocked with conventional RBLs at a vastly lower computational cost.
Dunno..... In the last few days I've seen trash coming from dialups which weren't in any RBL. Only a fast entry in my local SURBL zone stopped the flood from reaching more than a couple of users. (1 minute update)
Yes, i noticed the same, they start with fresh proxy's it seems for every run. Or at least a couple of fresh ones... ;)
So you also send in the ones you list locally towards SURBL ? If not mail me in private so we can get those in also.
Please don't add your local block lists into WS. Only add entries that would be appropriate for the entire world to use.
0.4% FPs or whatever WS is, really is not as useful as it should be. We need find ways to reduce the FPs, not increase them.
Jeff C.
On Sunday, August 29, 2004, 4:19:46 AM, Alex Broens wrote:
- Zombies or fixed IP? imho its irrelevant.
No it's very relevant. Any spam that comes from a fixed IP can be blocked on a local or global RBL.
SURBLs are most useful to catch the ones that can't be caught that way due to zombies, etc.
- Who defines "most abuse" & how?
Mainly the spammers do, by their own actions. Clearly breaking into someone's (insecure) computer and stealing services and bandwidth from it are abusive. Clearly sending 10,000 spams to get 1 through the filters is abusive. Those most highly abusive ones are the most important to catch. It's made important simply by their high level of abuse, if nothing else was even considered.
- There's spammers who have been around for years, from fixed IPs and
although they're so called "whitehats", business with a reputation and an attitude (Dell?) and users report that no matter what you do, an opt-out isn't respected....
So blacklist them locally or personally. We could never list dell.com because many people might mention them in legitimate emails.
Sometimes I wonder if people understand this new paradigm. ;-)
Dunno..... In the last few days I've seen trash coming from dialups which weren't in any RBL. Only a fast entry in my local SURBL zone stopped the flood from reaching more than a couple of users. (1 minute update)
Anything coming from zombied dialups is probably the kind of spam we want to list in SURBLs since there's already theft involved, though I'd still argue IP based RBLs would do it much more efficiently. RBLs are probably still a better solution, i.e. update the dialup RBLs to have the correct dialup pools.
Also I doubt that Dell uses zombied dialups to deliver their mail.
An admin filtering for an Austria based old ppl's home will hardly get a false positive from SURBL or Spamcop, while a US ISP will.
We need to be conservative in listing. It's much better to be able to provide an ISP or telco-grade solution that the old people's home can feel comfortable with than have a solution no-one can be comfortable with due to too many FPs.
Jeff C.
----- Original Message ----- From: "Jeff Chan" jeffc@surbl.org To: "SURBL Discuss" discuss@lists.surbl.org Sent: Sunday, August 29, 2004 2:21 PM Subject: Re: [SURBL-Discuss] SURBL WS test scores in SA 3.0
SURBLs are most useful to catch the ones that can't be caught that way due to zombies, etc.
first time I hear this... and I'd bet most admins don't really really care what or who catches it - I see the SpamcopUri module as another method to stop the trash reaching inboxes... Pls don't misunderstand me: 1. my reports have been judged to be reliable 2. I'll publish the data and its up to you guys to use it or not. 3. I will not publish data considered for local use only. 4. I'll do my best to avoid FP's coz they'll also backfire at me as well .
- There's spammers who have been around for years, from fixed IPs and
although they're so called "whitehats", business with a reputation and
an
attitude (Dell?) and users report that no matter what you do, an
opt-out
isn't respected....
So blacklist them locally or personally. We could never list dell.com because many people might mention them in legitimate emails.
Sometimes I wonder if people understand this new paradigm. ;-)
Dell was just a futile example. Wouldn't dare add them to a global listing, they're not even on a lcoal list, but I get that itch, more than once/week when users get the same Dell Swiss newsletter in french, then german and the again from Dell Germany...
Also I doubt that Dell uses zombied dialups to deliver their mail.
I never assumed they did, just said that they can get extermely annoying.
An admin filtering for an Austria based old ppl's home will hardly get a false positive from SURBL or Spamcop, while a US ISP will.
We need to be conservative in listing. It's much better to be able to provide an ISP or telco-grade solution that the old people's home can feel comfortable with than have a solution no-one can be comfortable with due to too many FPs.
I agree...which is why I keep my local list with the critical entries.
By now I hope my point is clear and that I can get back to copy/pasting URIs :)
thanks
Alex