Jeff's whitelists

List overview All Threads
Download

newer

older

RE: [SURBL-Discuss] {Spam?} RE:...

RFC: Drop multi.surbl.org rbldnsd...

William Stearns

15 Jul 2004 15 Jul '04

1:37 a.m.

Good evening, Jeff, I wanted to check in with you about some domains removed because of your whitelists. There are a few in particular I wanted to cover.

gevalia.com and joingevalia.com: gevalia does a _lot_ of spam. nutrisystem.com: again, a lot of recent spam, just done with a bulk marketing firm as a front. spamarrest.com: spammers themselves

Any sense on the above? Cheers, - Bill

--------------------------------------------------------------------------- "Hackers do not feel that leisure time is automatically any more meaningful than work time. The desirability of both depends on how they are realized. From the point of a view of a meaningful life, the entire work/leisure duality must be abandoned. As long as we are living our work or our leisure, we are not even truly living. Meaning cannot be found in work or leisure but has to arise out of the nature of the activity itself. Out of passion. Social value. Creativity." -- Andrew Leonard aleonard@salon.com http://salon.com/tech/col/leon/2001/02/05/hacker_ethic/index2.html -------------------------------------------------------------------------- William Stearns (wstearns@pobox.com). Mason, Buildkernel, freedups, p0f, rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org --------------------------------------------------------------------------

Show replies by date

Lindsay Snider

15 Jul 15 Jul

1:47 a.m.

Speaking of questionable domains, how do people feel about link-builder.com. I get nothing but spam from them. -lindsay

On Wednesday 14 July 2004 07:37 pm, William Stearns wrote:

...

Good evening, Jeff, I wanted to check in with you about some domains removed because of your whitelists. There are a few in particular I wanted to cover.

gevalia.com and joingevalia.com: gevalia does a _lot_ of spam. nutrisystem.com: again, a lot of recent spam, just done with a bulk marketing firm as a front. spamarrest.com: spammers themselves

Any sense on the above? Cheers,

Bill

"Hackers do not feel that leisure time is automatically any more meaningful than work time. The desirability of both depends on how they are realized. From the point of a view of a meaningful life, the entire work/leisure duality must be abandoned. As long as we are living our work or our leisure, we are not even truly living. Meaning cannot be found in work or leisure but has to arise out of the nature of the activity itself. Out of passion. Social value. Creativity." -- Andrew Leonard aleonard@salon.com http://salon.com/tech/col/leon/2001/02/05/hacker_ethic/index2.html

William Stearns (wstearns@pobox.com). Mason, Buildkernel, freedups, p0f, rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org

Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss

Jeff Chan

3:32 a.m.

On Wednesday, July 14, 2004, 4:47:49 PM, Lindsay Snider wrote:

...

Speaking of questionable domains, how do people feel about link-builder.com. I get nothing but spam from them. -lindsay

Their domain's been registered for about three years. If they were major spammers wouldn't they be shut down by now?

They may spam web site owners, but seem to have a removal procedure:

http://www.link-builder.com/

...

# "I am not interested in linking at all - please do not contact me again".

In either case they're not whitelisted in SURBLs.

Frankly I'm a lot more interested in catching the professional pharma, mortgage, etc. spammers who steal services, use zombies, etc. After they are eradicated we can go after the "annoyance" spammers. The worst offenders should be the top priority, especially since the criminal spammer doesn't seem to be stoppable in other ways.

Most quasi-legitimate companies probably can be stopped from spamming since they are subject to the legal system and their mostly legitimate ISP AUPs in ways that the criminal spammers aren't.

Jeff C.

Jeff Chan

3:26 a.m.

On Wednesday, July 14, 2004, 4:37:23 PM, William Stearns wrote:

...

Good evening, Jeff, I wanted to check in with you about some domains removed because of your whitelists. There are a few in particular I wanted to cover.

...

    gevalia.com and joingevalia.com: gevalia does a _lot_ of spam.

Gevalia is whitelisted as a publicly traded company. While they may or may not be spammers, it seems that actual violations of anti-spam laws would put them in big trouble and their lawyers would make them cease it.

http://www.gevalia.com/gevalia/customerservice/cs_content.jsp

...

OUR CUSTOMER LIST

When you enroll in any of the Gevalia programs or purchase items from our online catalog, we add your name to our Customer List. From time to time we share selected names and addresses from this List with other companies or services that offer fine quality food or merchandise by mail. If you prefer not to receive mailings from these companies, please advise us by using the mail, phone or e-mail address provided above. Please include your name and address if you are contacting us by mail or e-mail. We will ensure that your name is removed from the list we share with other organizations.

CONTACTS FROM US

We also use our Customer List for our own followup contacts with you. Many of our customers look forward to being contacted by Gevalia Customer Service Representatives by phone to be kept up-to-date about new coffees and services, programs and benefits. On the other hand, we recognize that some customers prefer not to be contacted by phone, even if they do love our coffees. Please be assured that we maintain an up-to-date "Do Not Call" list for customers who have advised us of their preference in this regard. If you do not wish to be called regarding our special offers, please let us know by mail, phone or email, and we will add you to our Do Not Call list.

We also maintain a number of e-mail mailboxes in an effort to support our online customers as effectively as we do on the phone or through the mail. We read every e-mail that we receive and do our best to respond within 48 hours, if a response is required. We save a very limited number of e-mail messages, but only until the issue is resolved. We do not record email addresses, unless you have voluntarily provided it to us in your online customer service registration.

If you did provide us with your e-mail address, we may notify you from time to time about Gevalia products or services that may be of interest to you. If you prefer not to receive these e-mail notifications, please follow the "Opt-out" instructions included with the e-mail or contact us at customer_service@gevalia.com We do not trade, sell or disclose e-mail addresses with other companies.

Opt out and information sharing are bad policies of course, but there nominally does appear to be a way to get off their lists.

...

    nutrisystem.com: again, a lot of recent spam, just done with a

bulk marketing firm as a front.

Publicly listed, therefore subject to laws, unlike the hard core professional criminal spammers.

...

What information do we collect? How do we use it?
* When you order, we need to know your name, e-mail
address, mailing address, credit card number, and expiration date. This allows us to process and fulfill your order and to notify you of your order status. * We may also use the information we collect to occasionally notify you about important functionality changes to the web site, new NutriSystem services, and special offers we think you'll find valuable.

...

Newsletter
If a user wishes to subscribe to our newsletter, we ask
only for an individual's email address. Out of respect for our users' privacy we provide a way to opt-out of these communications. Click here to unsubscribe to the nutrisystem.com newsletter.

Opt out. Not good, but there is an alleged removal procedure and the company would be susceptible to legal action if they did not comply.

...

    spamarrest.com: spammers themselves

Came from Joe in France.

They seem to be a legitimate company with reasonable spam policies:

http://spamarrest.com/privacy.jsp

But I don't really know much about them. Iv'e never gotten a spam from them or linking them. Why would a nominally antispam company spam people?

Jeff C.

Frank Ellermann

16 Jul 16 Jul

4:43 a.m.

Jeff Chan wrote:

[spamarrest.com]

...

Why would a nominally antispam company spam people?

That's a C/R system on the surface. I've got dozens of challenges from this orgaization, all spamming for their "services" (= C/R) incl. affiliate and Webmaster programs.

The "confirmations" are picture puzzles (like UOL etc.), but very often they don't work, so I'm not sure whose pictures they use. Maybe they are stolen and abused to create spammer accounts elsewhere, it won't surprise me.

Please stop to whitelist dubious organizations based on vague criteria like "publicly listed, subject to laws".

With that policy you would whitelist "viagra", because it is a product of Pfizer.com ("publicly listed"). Or you would whitelist CAN-SPAM spam ("subject to laws"). This leads nowhere, Bye, Frank

Jeff Chan

10:16 a.m.

On Thursday, July 15, 2004, 7:43:56 PM, Frank Ellermann wrote:

...

Jeff Chan wrote:

...

[spamarrest.com]

...
Why would a nominally antispam company spam people?

...

That's a C/R system on the surface. I've got dozens of challenges from this orgaization, all spamming for their "services" (= C/R) incl. affiliate and Webmaster programs.

...

The "confirmations" are picture puzzles (like UOL etc.), but very often they don't work, so I'm not sure whose pictures they use. Maybe they are stolen and abused to create spammer accounts elsewhere, it won't surprise me.

Interesting. I've not gotten a spam from them so I don't have any specific examples. Maybe you could forward one to me off list and explain how you think they got your address. I see two SpamCop reports about them currently. They may be one of those gray domains that we might not want blocking on. Right now I don't have enough information to say one way or the other.

...

Please stop to whitelist dubious organizations based on vague criteria like "publicly listed, subject to laws".

Actually it's pretty specific and not vague. In practice sc.surbl.org gets very few whitelist hits of any kind, and as a percentage of records per list, the whitelist hits for the other lists are very minor. I'm pretty confident that we're mostly catching the bad guys and not catching the good guys, though I'm always interested in specific counterexamples.

...

With that policy you would whitelist "via gra", because it is a product of Pfizer.com ("publicly listed").

pfizer.com is whitelisted. Via gra is not a company so it's not. In any case the legitimate drug makers and their products are much less of a problem than the actual pharmaspammers.

...

Or you would whitelist CAN-SPAM spam ("subject to laws").

I agree many of the CAN-SPAM policies are broken. Where they are broken we will probably not whitelist. (Nor are we attempting to whitelist CAN-SPAM activity; generally we let the data speak for itself.)

Jeff C.

Frank Ellermann

17 Jul 17 Jul

10:42 p.m.

Jeff Chan wrote:

...

Maybe you could forward one to me off list and explain how you think they got your address.

It's the same idea as in any C/R system: The spammer forges an almost arbitrary MAIL FROM for his stuff. The challenge goes to the MAIL FROM, e.g.

http://www.spamcop.net/sc?id=z554014790zc260cd9fdfc657ecb5cd7e70a6026bf1z&action=display

Following this link you see the complete spamarrest challenge incl. the link to their "webmaster affiliate program". Their business model is to sell "spam filtered" addresses, where the filtering is done by the innocent bystanders (= forged address in the spam).

...

They may be one of those gray domains that we might not want blocking on.

Why should you want to support a commercial C/R system ? It's just UBE asking third parties (forged addresses) to filter the spam for their customers. If you find spamarrest.com in the SC data then that's no "error", therefore it should be reflected in sc.surbl.org

...

...
vague criteria like "publicly listed, subject to laws".

Actually it's pretty specific and not vague.

It's very vague, whose and which laws, resp. which lists ? In the case of sc.surbl.org the relevant rules are the SC rules. Of course you could remove (= whitelist) obvious errors like links to BBC reports in 419 spam. When you see a JoeJob or other cases of innocent bystanders that would be specific.

...

pfizer.com is whitelisted

Bad idea. Big companies do spam from time to time, in the past because they were ignorants, and today because shit happens.

...

(Nor are we attempting to whitelist CAN-SPAM activity; generally we let the data speak for itself.)

ACK, that's what I want, with minimal manual interventions to catch errors, JoeJobs, and innocent bystanders. Bye, Frank

Jeff Chan

10:52 p.m.

On Saturday, July 17, 2004, 1:42:26 PM, Frank Ellermann wrote:

...

Jeff Chan wrote:

...

...
Maybe you could forward one to me off list and explain how you think they got your address.

...

It's the same idea as in any C/R system: The spammer forges an almost arbitrary MAIL FROM for his stuff. The challenge goes to the MAIL FROM, e.g.

...

http://www.spamcop.net/sc?id=z554014790zc260cd9fdfc657ecb5cd7e70a6026bf1z&action=display

...

Following this link you see the complete spamarrest challenge incl. the link to their "webmaster affiliate program". Their business model is to sell "spam filtered" addresses, where the filtering is done by the innocent bystanders (= forged address in the spam).

...

...
They may be one of those gray domains that we might not want blocking on.

...

Why should you want to support a commercial C/R system ? It's just UBE asking third parties (forged addresses) to filter the spam for their customers. If you find spamarrest.com in the SC data then that's no "error", therefore it should be reflected in sc.surbl.org

It sounds like a spammer is abusing spamarrest.com's services. Is that correct? If so that should be reported back to spamarrest as abuse. Or is spamarrest *originating* these messages purely themselves? In other words is spamarrest actively, directly sending these out themselves? If the latter, I agree spamarrest should be listed. If the former, it's more like a Joe Job against spamarrest, same as if I listed Claranet.de in some spams. That would not make claranet a spammer, right?

Jeff C.

Frank Ellermann

18 Jul 18 Jul

12:30 a.m.

Jeff Chan wrote:

...

It sounds like a spammer is abusing spamarrest.com's services. Is that correct?

No. The spammer uses one of his zombies (probably), some arbitrary address as "From", and another arbitrary address as "To". The "To" address happens to be a customer of spamarrest, and the "From" address in this example was...

drussell_tb AT xyzzy.claranet.de

Of course that's a bogus address, the spammers simply combine local parts like "drussel" plus junk like "_tb" with catch-all domains like xyzzy.claranet.de (in fact only "my" vanity host).

The spam is then sent to the spamarrest address (in this example From: drussel_tb@xyzzy To: anneliese@spamarrest)

Spamarrrest doesn't know drussel_tb@xyzzy and therefore it sends a challenge to this address (= me). Because I'm not planning to sort Anneliese's spam I report this challenge via SC.

...

that should be reported back to spamarrest as abuse.

Exactly, that's what I do (using SC, several manual complaints had no effect at all).

...

Or is spamarrest *originating* these messages purely themselves?

No, that's very unlikely.

...

is spamarrest actively, directly sending these out themselves?

Sure, they send these challenges. Like UOL "anti spam", QuikCop, Earthlink, and Mailblocker. The latter allows me to report forgeries, as far as I'm concerned that's a more or less working C/R system. Allegedly Earthlink uses Brightmail to filter some spam (in other words this doesn't work). I'm not sure about QuikCop, whatever they do, they don't support SPF:

No forged xyzzy address (MAIL FROM) would pass a SPF filter.

Again mailblocker is the only C/R system where the abuse desk at least promised to forward my proposal to implement SPF. And I haven't seen mailblocker challenges for some time, so from my POV that's the only mentioned C/R system qualifying for your whitelist. OTOH I've never reported mailblocker challenges via SC, because they always had a link to report forgeries. Bye, Frank

Jeff Chan

12:56 a.m.

On Saturday, July 17, 2004, 3:30:59 PM, Frank Ellermann wrote:

...

Jeff Chan wrote:

...

...
It sounds like a spammer is abusing spamarrest.com's services. Is that correct?

...

No. The spammer uses one of his zombies (probably), some arbitrary address as "From", and another arbitrary address as "To". The "To" address happens to be a customer of spamarrest, and the "From" address in this example was...

...

drussell_tb AT xyzzy.claranet.de

...

Of course that's a bogus address, the spammers simply combine local parts like "drussel" plus junk like "_tb" with catch-all domains like xyzzy.claranet.de (in fact only "my" vanity host).

...

The spam is then sent to the spamarrest address (in this example From: drussel_tb@xyzzy To: anneliese@spamarrest)

...

Spamarrrest doesn't know drussel_tb@xyzzy and therefore it sends a challenge to this address (= me). Because I'm not planning to sort Anneliese's spam I report this challenge via SC.

...

...
that should be reported back to spamarrest as abuse.

...

Exactly, that's what I do (using SC, several manual complaints had no effect at all).

...

...
Or is spamarrest *originating* these messages purely themselves?

...

No, that's very unlikely.

OK That's pretty much how I was reading things. I don't think we should list spamarrest because there could be legitimate users of it and we don't want messages that happen to mention spamarrest as that could easily lead to false positives. Remember that our standards of inclusion need to be higher than for personal use, regular sender domain or IP RBLs, etc. because the effects of URI blocking are a lot more widespread than the effects of blocking one zombied PC somewhere.

The quick answer is that spamarrest should authenticate it's senders, perhaps in the same way as they authenticate their recipients. If they're not doing something like that, then their design is broken, but having a broken design is not enough reason to list them.

Jeff C.

Frank Ellermann

2:26 a.m.

Jeff Chan wrote:

...

I don't think we should list spamarrest because there could be legitimate users of it

That would be like a "legitimate user of illegal drugs". Or a legitimate buyer of generic viagra.

...

that could easily lead to false positives

There are no "false positives". The spamarrest challenges are spam, triggered by spam to spamarrest customers, and sent to the forged addresses in the original spam. Spamarrest.com is only interested to sell more of their snake oil, and as far as I'm concerned it's a criminal organization.

Complete with "webmaster affiliate program", exactly the same kind of marketing you find in XXX sites. Only the "product" is different, it's "spam filtering". The real work is not done by spamarrest, it's done by my ISP and me (for all forged @xyzzy addresses), or by your ISP and you (for all forged @surbl.org addresses), etc.

Spamarrest.com "sells" your and my bandwidth + harddisk space + time. There are no "legitimate users" or "false positives", it's theft.

...

their design is broken, but having a broken design is not enough reason to list them.

It's not only "broken", it's fraudulent. It's no free service, their users pay for this design, and what they really pay for are _our_ resources.

See also http://openrbl.org/ip/66/150/163/156.htm for other BL entries for the IP [66.150.163.156] in my example.

Bye, Frank

Jeff Chan

3:51 a.m.

On Saturday, July 17, 2004, 5:26:35 PM, Frank Ellermann wrote:

...

Jeff Chan wrote:

...

...
that could easily lead to false positives

...

There are no "false positives".

Yes, there could be. If I mention http://www.spamarrest.com/ in my message, and spamarrest.com is in a SURBL, then my message could get blocked. Similarly any other legitimate mentions of spamarrests web site, including saying "it's a bad company," or "I use their services," or "I'm filing a complaint against them," for examples, could get legitimate messages blocked. That is a classic false positive.

Please remember the URI (message body) false positives are really in a different category than sender IP or sender domain (message header/envelope) false positives. If an end user IP address or ISP mail server domain is listed in a conventional RBL, the effect is limited to that IP or sender domain. If a URI is listed in a SURBL, the effect could be as large as blocking all messages that happen to mention that URI, which is potentially much larger in scope. The potential for wide-reaching false positives is much greater with a SURBL than an envelope RBL.

...

The spamarrest challenges are spam, triggered by spam to spamarrest customers, and sent to the forged addresses in the original spam. Spamarrest.com is only interested to sell more of their snake oil, and as far as I'm concerned it's a criminal organization.

...

Complete with "webmaster affiliate program", exactly the same kind of marketing you find in XXX sites. Only the "product" is different, it's "spam filtering". The real work is not done by spamarrest, it's done by my ISP and me (for all forged @xyzzy addresses), or by your ISP and you (for all forged @surbl.org addresses), etc.

...

Spamarrest.com "sells" your and my bandwidth + harddisk space

time. There are no "legitimate users" or "false positives",

it's theft.

All of which is probably true, but not entirely relevant to the question of inclusion, especially when you agree spamarrest is not originating the messages purely themselves. A better answer may be that they have an abuse problem and should fix it.

Since spamarrest appears to be a legitimate company, I'd recommend reporting your spams to the relevant state and national governments' anti-spam folks. That should encourage spamarrest to fix their problems. Here are the Washington state and U.S. government reporting sites:

http://www.atg.wa.gov/junkemail/

https://rn.ftc.gov/pls/dod/wsolcq$.startup?Z_ORG_CODE=PU01

...

...
their design is broken, but having a broken design is not enough reason to list them.

...

It's not only "broken", it's fraudulent. It's no free service, their users pay for this design, and what they really pay for are _our_ resources.

...

See also http://openrbl.org/ip/66/150/163/156.htm for other BL entries for the IP [66.150.163.156] in my example.

As I said, our standards for inclusion are significantly higher than for conventional RBLs, because URI blocking is potentially much broader in scope. We really can't have every domain that's ever been abused a few times or caused someone to be annoyed in the lists, even if that would be fine for a personal policy, since doing so could quickly make the lists unusable for too many people.

The informal rule should be: if a given domain has any legitimate mentions in message body URIs, then it probably should not be listed.

Jeff C.

Frank Ellermann

19 Jul 19 Jul

11:14 p.m.

Jeff Chan wrote:

...

If I mention http://www.spamarrest.com/ in my message, and spamarrest.com is in a SURBL, then my message could get blocked.

Sure, the same is true for any URL in SURBL. Apparently you are now planning to list sex sites only because they are sex sites. You're even making jokes about recipients who cannot complain if they don't get their daily XXX pics :-(

Use the raw SC data, don't introduce arbitrary whitelisting.

...

especially when you agree spamarrest is not originating the messages purely themselves. A better answer may be that they have an abuse problem and should fix it.

They have more than an abuse problem. I reported some of their challenges manually and never got an answer. They are spammers selling a pseudo-spam-solution.

...

I'd recommend reporting your spams to the relevant state and national governments' anti-spam folks.

I'm quite happy with my solution, i.e. report their challenges as spam via SpamCop. If you really think that it's a good idea to censor SC's data please rename this SURBL to jeff.surbl.org instead of SC.surbl.org, and please modify the description http://www.surbl.org/data.html

...

We really can't have every domain that's ever been abused a few times or caused someone to be annoyed in the lists

That's a technical problem, and you have solved it, something reported only a few times shouldn't show up in sc.surbl.org

But at the moment we're discussing arbitrary whitelisting of spamvertized URLs found more than only a few times in SpamCop reports. And spamarrest.com isn't an innocent bystander, it's their "business model" to harass third parties.

Bye, Frank

Jeff Chan

11:48 p.m.

On Monday, July 19, 2004, 2:14:46 PM, Frank Ellermann wrote:

...

Jeff Chan wrote:

...

...
If I mention http://www.spamarrest.com/ in my message, and spamarrest.com is in a SURBL, then my message could get blocked.

...

Sure, the same is true for any URL in SURBL. Apparently you are now planning to list sex sites only because they are sex sites. You're even making jokes about recipients who cannot complain if they don't get their daily XXX pics :-(

We're considering it. It has benefits and disadvantages. The benefit is that it would be an easy way to block sex sites, both in email and potentially in web proxies (assuming someone writes that code, for example into squid). The potential problem is that if it's misapplied it could create a large new set of false positives.

...

Use the raw SC data, don't introduce arbitrary whitelisting.

We can't use the raw, unwhitelisted SpamCop data since it could easily be poisoned. For example an abuser or spammer could submit http://www.claranet.de/ or http://www.google.com/ or http://www.spamcop.net/ to SpamCop a few times then those would be blocked. Obviously we can't allow that.

...

...
especially when you agree spamarrest is not originating the messages purely themselves. A better answer may be that they have an abuse problem and should fix it.

...

They have more than an abuse problem. I reported some of their challenges manually and never got an answer. They are spammers selling a pseudo-spam-solution.

I see it differently. Besides if that have *any significant* legitimate use, we can't list them.

...

...
I'd recommend reporting your spams to the relevant state and national governments' anti-spam folks.

...

And spamarrest.com isn't an innocent bystander, it's their "business model" to harass third parties.

If their business model is to spam people, then the State of Washington could trivially use their anti-spam laws to shut them down. Did you report them?

http://www.atg.wa.gov/junkemail/

If not, then there's not much to say.

Jeff C.

Frank Ellermann

20 Jul 20 Jul

5:08 a.m.

Jeff Chan wrote:

...

The benefit is that it would be an easy way to block sex sites, both in email and potentially in web proxies

If you get relevant input data which is also different from the existing SURBLs. See my reply to Rob, for e-mail the enlargement or viagra stuff isn't really better than XXX.

And filtering Web pages is a completely different business.

...

We can't use the raw, unwhitelisted SpamCop data since it could easily be poisoned.

Yes, I know this. It's good to remove errors and innocent bystanders. And it's good to require a minimal number of votes for the "democracy in action". But as soon as errors, joe jobs, and innocent bystanders are removed, and the number of votes is above the required minimum, the result should be clear. I've quoted the relevant part of data.html in my reply to Rob.

...

I see it differently.

Then you won't vote for spamarrest.com. As long as all votes are equal there's no problem. Otherwise it's arbitrariness.

...

Did you report them?

I report my spam via SpamCop. I'm not interested in the laws of Washington, Taiwan, or any other state you care to name, because I'm not yet planning to sue this or any other spammer.

But my vote in the case of spamarrest.com is clear, and you would nullify it :-( Where's that "democracy in action" as promised on your Web page, is it only _fictitious_ ? Bye.

Jeff Chan

8:03 a.m.

On Monday, July 19, 2004, 8:08:41 PM, Frank Ellermann wrote:

...

Jeff Chan wrote:

...

...
We can't use the raw, unwhitelisted SpamCop data since it could easily be poisoned.

...

Yes, I know this. It's good to remove errors and innocent bystanders. And it's good to require a minimal number of votes for the "democracy in action". But as soon as errors, joe jobs, and innocent bystanders are removed, and the number of votes is above the required minimum, the result should be clear.

I would contend abuse of spamarrest's services, as you originally described them, is closer to a Joe Job than outright spam by spamarrest.

As you described it, the abusers are triggering spamarrest messages to your mailbox by forging your return address on messages sent through spamarrest. That's more like abuse of a poorly designed system by a third party than spam from spamarrest itself. The correct answer to that is to get spamarrest to correct their broken system.

I don't believe in blacklisting systems simply because they're broken designs are being abused. The intent of SURBLs is more to list the sites directly being advertised by spammers in deliberate spams. This does not appear to be such a case, as you described it.

...

...
I see it differently.

...

Then you won't vote for spamarrest.com. As long as all votes are equal there's no problem. Otherwise it's arbitrariness.

It's not at all arbitrary. I already explained the criteria. If a given domain is used by an otherwise legitimate company that may have some legitimate mentions in messages, then they should not be listed.

...

...
Did you report them?

...

I report my spam via SpamCop. I'm not interested in the laws of Washington, Taiwan, or any other state you care to name,

I'd propose that legal action may be better if spamarrest are actually spamming and not simply being abused.

...

because I'm not yet planning to sue this or any other spammer.

It's got nothing to do with suing them personally (in civil court) since it would be a criminal matter if they are breaking anti-spam laws.

Jeff C.

Frank Ellermann

26 Jul 26 Jul

9 a.m.

Jeff Chan wrote:

...

I would contend abuse of spamarrest's services, as you originally described them, is closer to a Joe Job than outright spam by spamarrest.

IBTD. And if you read some of the 166 messages shown by...

http://www.google.com/groups?q=spamarrest.com+group%3A*.net-abuse.*+-group%3A*.sightings

...you'll find that I'm not exactly alone with this opinion.

[co.to vs. oo.to]

...

the site is oo.to, not co.to

Sorry. http://oo.cx/domains/home.php claims that this is a "iRedirector (subdomain edition)" in it's <title>, and I get the same page at http://oo.to/domains/home.php, dito nn.to, dito uu.to

I found no abuse report links, but testing some of the domains mentioned in nanas they _might_ do something about abuse (or my old browser simply can't handle their weird JavaScripts (?)).

Bye, Frank

Jeff Chan

12:15 p.m.

On Monday, July 26, 2004, 12:00:01 AM, Frank Ellermann wrote:

...

Jeff Chan wrote:

...

...
I would contend abuse of spamarrest's services, as you originally described them, is closer to a Joe Job than outright spam by spamarrest.

...

IBTD. And if you read some of the 166 messages shown by...

...

http://www.google.com/groups?q=spamarrest.com+group%3A*.net-abuse.*+-group%3A*.sightings

...

...you'll find that I'm not exactly alone with this opinion.

I don't dispute that they have abusive and brain-dead policies and designs, but the key question is do they have any legitimate uses? If so we probably can't list them.

Jeff C.

Rob McEwen

19 Jul 19 Jul

11:57 p.m.

Frank,

I think you need to lighten up a bit and not take the jokes part so seriously.

Also, I think that Jeff is doing an excellent job, is very thorough, listens carefully to all sides and all evidence presented in disputes, and has excellent discernment and judgment.

A lot of projects like this one have been derailed or ruined by having someone in charge who did NOT have these qualities. We should be all be grateful for Jeff's hard work and dedication.

Also, regarding the sex sites, this is a great idea because many businesses would prefer to block these types of e-mail. Also, many families (particularly with young children) desire a way to get their children connected with e-mail WITHOUT having to fear that their 8 year old is going to see vivid "double penetration" photos, for example. Imagine having to explain that one.

Also, I think that the idea of separating sex sites from spammers fully addresses Frank's (& others) concerns here.

Rob McEwen

-----Original Message----- From: discuss-bounces@lists.surbl.org [mailto:discuss-bounces@lists.surbl.org] On Behalf Of Frank Ellermann Sent: Monday, July 19, 2004 5:15 PM To: discuss@lists.surbl.org Subject: [SURBL-Discuss] Re: Jeff's whitelists

Jeff Chan wrote:

...

If I mention http://www.spamarrest.com/ in my message, and spamarrest.com is in a SURBL, then my message could get blocked.

Use the raw SC data, don't introduce arbitrary whitelisting.

...

especially when you agree spamarrest is not originating the messages purely themselves. A better answer may be that they have an abuse problem and should fix it.

They have more than an abuse problem. I reported some of their challenges manually and never got an answer. They are spammers selling a pseudo-spam-solution.

...

I'd recommend reporting your spams to the relevant state and national governments' anti-spam folks.

...

We really can't have every domain that's ever been abused a few times or caused someone to be annoyed in the lists

That's a technical problem, and you have solved it, something reported only a few times shouldn't show up in sc.surbl.org

Bye, Frank

_______________________________________________ Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss

Jeff Chan

20 Jul 20 Jul

12:30 a.m.

New subject: RFC: sex site domain SURBL (Was: Re: Re: Jeff's whitelists)

On Monday, July 19, 2004, 2:57:47 PM, Rob McEwen wrote:

...

A lot of projects like this one have been derailed or ruined by having someone in charge who did NOT have these qualities. We should be all be grateful for Jeff's hard work and dedication.

...

Also, regarding the sex sites, this is a great idea because many businesses would prefer to block these types of e-mail. Also, many families (particularly with young children) desire a way to get their children connected with e-mail WITHOUT having to fear

[...]

...

Also, I think that the idea of separating sex sites from spammers fully addresses Frank's (& others) concerns here.

Thanks for your support and comments Bob. I don't take any of the discussion personally and am just trying to apply some principles hopefully reasonably on these.

The reality is that the gray areas are often the most troublesome for only a small return, and the clearly professional spammers are likely responsible for a lot more of the spam than the occasionally rogue or abused quasi-legitimate company. So the professional spammers are probably a more important to focus on catching.

I'd still like to hear anyone else's comments on a sex/adult SURBL. It would be a separate list, and could be useful, but I'm somewhat concerned about the potential for misuse.

Comments?

Jeff C.

Bill Landry

12:56 a.m.

New subject: sex site domain SURBL (Was: Re: Re: Jeff'swhitelists)

----- Original Message ----- From: "Jeff Chan" jeffc@surbl.org

...

I'd still like to hear anyone else's comments on a sex/adult SURBL. It would be a separate list, and could be useful, but I'm somewhat concerned about the potential for misuse.

Comments?

I am all for a separate sex/adult list. If people don't wish to read the list description, but opt to blindly use it anyway, then any fallout is their problem. My vote is to move forward with implementing a new/separate sex/adult list.

Bill

Matt Yackley

2:09 a.m.

New subject: RFC: sex site domain SURBL (Was: Re: Re: Jeff's whitelists)

Jeff Chan said:

...

The reality is that the gray areas are often the most troublesome for only a small return, and the clearly professional spammers are likely responsible for a lot more of the spam than the occasionally rogue or abused quasi-legitimate company. So the professional spammers are probably a more important to focus on catching.

I'd still like to hear anyone else's comments on a sex/adult SURBL. It would be a separate list, and could be useful, but I'm somewhat concerned about the potential for misuse.

Comments?

Jeff C.

Hi Jeff, I'm all for having a seperate list for sex/adult sites. One of the major reasons I became more involved in mail filtering was the HR folks at work becomming worried about the possiblilty of sexual harrasment lawsuits based on companies not doing anything about pornograhpic spam messages. While a surbl list for adult sites is not something that an ISP should implement, I see it as a great tool for corporate or individual use.

As long as the test is not installed by default, but instead the admins must manually setup the test, it should cut down the on the potenial for misuse. If an admin goes into his local.cf and creates a rule to check sex.surbl.org, well then they get what they deserve...whether that is a useful tool or a bunch of pissed off customers, thats up to them.

my $0.02

-matt

Jeff Chan

4:20 a.m.

New subject: RFC: sex site domain SURBL

Doing a little preliminary checking of this particular dataset leads me to wonder a little how appropirate it might be for SURBLs. In particular I found over a hundred whitelist hits of sites like aol.com, att.net, btopenworld.com, budweiser.com, clara.net, cnet.com, comcast.net, he.net, lsu.edu, match.com, mindspring.com, msn.com, rr.com, sina.com, texas.net, tripod.com, umich.edu, victoriassecret.com, washington.edu, etc.:

http://spamcheck.freeapp.net/adult.domains.whitelist-hits

that's after excluding the adult/urls list which had about 300 whitelist hits, including more hosting providers like terra.es, etc. Recall that our whitelists are not too complete, so there may be other legitimate domains that are included. We can't be blocking on aol.com, cnet.com, msn.com, etc.

Clearly some of these (shared hosting) sites may have been used to host sex content, but since RBLs are domain-based, and SURBLs are registrar-domain-based, I'm having some doubts about how useful this particular data source might be for SURBL use.

ftp://ftp.univ-tlse1.fr/pub/reseau/cache/squidguard_contrib/adult.tar.gz

Perhaps there are other lists of sex domains that are more selective?

Jeff C.

Jeff Chan

7:49 a.m.

New subject: RFC: sex site domain SURBL

I should have added that the preliminary extracted data is at:

http://spamcheck.freeapp.net/adult.domains.afterwhitelist

This is about 450k entries after whitelisting and some other cleanup. There are still a few bogus entries that would need to be cleaned up further.

The main problem is probably FPs; perhaps I should make it into a test list and ask people to see what kind of FPs they get. Unfortunately I probably can't run a zone file that large on my BIND nameserver (and I haven't converted over to rbldnsd yet).

Jeff C.

Raymond Dijkxhoorn

9:14 a.m.

New subject: RFC: sex site domain SURBL

Hi!

...

that's after excluding the adult/urls list which had about 300 whitelist hits, including more hosting providers like terra.es, etc. Recall that our whitelists are not too complete, so there may be other legitimate domains that are included. We can't be blocking on aol.com, cnet.com, msn.com, etc.

I am preparing a mail for terra.es, they should clean out their systems really fast. I get a *LOT* of spam wth those URLs and most work also, so it seems their abuse dept. isnt running fast enough, or just supporting it. :(

Bye, Raymond.

Marc Kool

3:27 p.m.

New subject: RFC: sex site domain SURBL

Hi Jeff,

Jeff Chan wrote:

...

Doing a little preliminary checking of this particular dataset leads me to wonder a little how appropirate it might be for SURBLs. In particular I found over a hundred whitelist hits of sites like aol.com, att.net, btopenworld.com, budweiser.com, clara.net, cnet.com, comcast.net, he.net, lsu.edu, match.com, mindspring.com, msn.com, rr.com, sina.com, texas.net, tripod.com, umich.edu, victoriassecret.com, washington.edu, etc.:

http://spamcheck.freeapp.net/adult.domains.whitelist-hits

I did a quick check on a few domains and I do not share your conclusion.

# grep aol.com domains adultaol.com register.oscar.aol.com sex-aol.com sexonaol.com usaol.com

# grep att.net domains adultonly.home.att.net borderjumper.home.att.net brookeb.home.att.net chrisd054.home.att.net dating.home.att.net divinenews.home.att.net lilcindy.home.att.net livevids.home.att.net livevids2.home.att.net livevids3.home.att.net livevids4.home.att.net models.home.att.net models2.home.att.net personals.home.att.net pvelasquez.home.att.net sasha69.home.att.net sex-ads.home.att.net sexworld.home.att.net xxxmovies.home.att.net

# grep -w au.com domains aotoys.au.com condoms.au.com freeporn.au.com hornytoad.au.com muff.au.com

So aol.com and att.net and au.com are not in the database and not blacklisted. no subdomain of aol.com is in the blacklist. For au.com and att.net there are only adult subdomains in the blacklist. This is ok.

...

that's after excluding the adult/urls list which had about 300 whitelist hits, including more hosting providers like terra.es, etc. Recall that our whitelists are not too complete, so there may be other legitimate domains that are included. We can't be blocking on aol.com, cnet.com, msn.com, etc.

Clearly some of these (shared hosting) sites may have been used to host sex content, but since RBLs are domain-based, and SURBLs are registrar-domain-based, I'm having some doubts about how useful this particular data source might be for SURBL use.

ftp://ftp.univ-tlse1.fr/pub/reseau/cache/squidguard_contrib/adult.tar.gz

Perhaps there are other lists of sex domains that are more selective?

Jeff C.

The domain terra.es is also not in the domains list. It is indeed in the url list, e.g. personal.telefonica.terra.es/web/sex terra.es/personal2/amateursexual etc.

I think that the only database that can be used by SURBL is the domains database and that the url database is not suitable to be used by SURBL since URLs are difficult to translate to a DNS query string.

I assume that something went wrong when you verified the quality of the database. If you have any questions you can also contact me off list.

-Marc

David Hooton

3:58 p.m.

New subject: RFC: sex site domain SURBL

On Tue, 20 Jul 2004 15:27:52 +0200, Marc Kool m.kool@vioro.nl wrote:

...

Hi Jeff,

Jeff Chan wrote:

...
Doing a little preliminary checking of this particular dataset leads me to wonder a little how appropirate it might be for SURBLs. In particular I found over a hundred whitelist hits of sites like aol.com, att.net, btopenworld.com, budweiser.com, clara.net, cnet.com, comcast.net, he.net, lsu.edu, match.com, mindspring.com, msn.com, rr.com, sina.com, texas.net, tripod.com, umich.edu, victoriassecret.com, washington.edu, etc.:

http://spamcheck.freeapp.net/adult.domains.whitelist-hits

I did a quick check on a few domains and I do not share your conclusion.

# grep aol.com domains adultaol.com register.oscar.aol.com sex-aol.com sexonaol.com usaol.com

register.oscar.aol.com is the server used by AOL messenger and ICQ to login - how on earth does this count as an Adult Website, much less a sex site?!!

...

# grep att.net domains adultonly.home.att.net borderjumper.home.att.net brookeb.home.att.net chrisd054.home.att.net dating.home.att.net divinenews.home.att.net lilcindy.home.att.net livevids.home.att.net livevids2.home.att.net livevids3.home.att.net livevids4.home.att.net models.home.att.net models2.home.att.net personals.home.att.net pvelasquez.home.att.net sasha69.home.att.net sex-ads.home.att.net sexworld.home.att.net xxxmovies.home.att.net

Ahh the plot thickens... Subdomains..

...

# grep -w au.com domains aotoys.au.com condoms.au.com freeporn.au.com hornytoad.au.com muff.au.com

Still more..

...

So aol.com and att.net and au.com are not in the database and not blacklisted. no subdomain of aol.com is in the blacklist.

What is register.oscar.aol.com if it isn't a subdomain?

...

For au.com and att.net there are only adult subdomains in the blacklist. This is ok.

However SURBL's in general don't use subdomains, I've just run a test on my personal SURBL and SpamCopURI doesn't currently look at subdomains. I suspect because of the requirement for a lookup per domain level which would obviously both make things inefficient and also leave room for a denial of service.

...

I assume that something went wrong when you verified the quality of the database.

I think the levels of understanding of what was in the DB and what SURBL was able to do were what went wrong.

Given my very quick testing I think it would probably be worth giving this data a try, we would most likely need to work out how to remove the subdomained entries - the list is huge, and efficiency we can gain by removing excess data would obviously be useful.

The data is somewhat preemptive - just because you have an adult content website doesn't always mean you are spamming, in fact I'm sure there are an awful lot of Adult sites which never spam.

I do however feel that there is a need for this kind of data, there are a lot of organisations which have liability concerns if their users recieve pornographic messages (schools) and many people who find adult content offensive (churches etc).

I reckon let's give it a go for a while like we did 6dos - what's the worst that can happen? We might get another SURBL - well more content is always a good thing in that case :) -- Regards,

David Hooton

Marc Kool

7:59 p.m.

New subject: RFC: sex site domain SURBL

David Hooton wrote:

...

On Tue, 20 Jul 2004 15:27:52 +0200, Marc Kool m.kool@vioro.nl wrote:

...
Hi Jeff,

Jeff Chan wrote:

...
Doing a little preliminary checking of this particular dataset leads me to wonder a little how appropirate it might be for SURBLs. In particular I found over a hundred whitelist hits of sites like aol.com, att.net, btopenworld.com, budweiser.com, clara.net, cnet.com, comcast.net, he.net, lsu.edu, match.com, mindspring.com, msn.com, rr.com, sina.com, texas.net, tripod.com, umich.edu, victoriassecret.com, washington.edu, etc.:

http://spamcheck.freeapp.net/adult.domains.whitelist-hits

I did a quick check on a few domains and I do not share your conclusion.

# grep aol.com domains adultaol.com register.oscar.aol.com sex-aol.com sexonaol.com usaol.com

register.oscar.aol.com is the server used by AOL messenger and ICQ to login - how on earth does this count as an Adult Website, much less a sex site?!!

In my browser, when I type http://register.oscar.aol.com this is displayed:

AOL Instant Messenger is an adults-only service. Click Here if you are 18 or older. If you are under 18, click here to exit.

Seems 100% adult to me!

...

...
# grep att.net domains adultonly.home.att.net borderjumper.home.att.net brookeb.home.att.net chrisd054.home.att.net dating.home.att.net divinenews.home.att.net lilcindy.home.att.net livevids.home.att.net livevids2.home.att.net livevids3.home.att.net livevids4.home.att.net models.home.att.net models2.home.att.net personals.home.att.net pvelasquez.home.att.net sasha69.home.att.net sex-ads.home.att.net sexworld.home.att.net xxxmovies.home.att.net

Ahh the plot thickens... Subdomains..

...
# grep -w au.com domains aotoys.au.com condoms.au.com freeporn.au.com hornytoad.au.com muff.au.com

Still more..

...
So aol.com and att.net and au.com are not in the database and not blacklisted. no subdomain of aol.com is in the blacklist.

What is register.oscar.aol.com if it isn't a subdomain?

You're right, it is a subdoamin of aol.com. If AOL uses this server to register for ICQ and other non-adult stuff *AND* use it to register for adult stuff AND the 'default mode' (i.e. use only the subdomain in the URL) is for adult only, they are asking for problems.

...

...
For au.com and att.net there are only adult subdomains in the blacklist. This is ok.

However SURBL's in general don't use subdomains, I've just run a test on my personal SURBL and SpamCopURI doesn't currently look at subdomains. I suspect because of the requirement for a lookup per domain level which would obviously both make things inefficient and also leave room for a denial of service.

Hmmm. I am afraid that spammers will abuse this property of SpamCopURI.

...

...
I assume that something went wrong when you verified the quality of the database.

I think the levels of understanding of what was in the DB and what SURBL was able to do were what went wrong.

Given my very quick testing I think it would probably be worth giving this data a try, we would most likely need to work out how to remove the subdomained entries - the list is huge, and efficiency we can gain by removing excess data would obviously be useful.

The data is somewhat preemptive - just because you have an adult content website doesn't always mean you are spamming, in fact I'm sure there are an awful lot of Adult sites which never spam.

I do however feel that there is a need for this kind of data, there are a lot of organisations which have liability concerns if their users recieve pornographic messages (schools) and many people who find adult content offensive (churches etc).

This is what I stated in the original proposal: let's make a SURBL list for adult-related URI's, not necessarily spammers. I know that SURBL is meant to fight spam, but it is relatively easy to extend with functionality to ban emails that refer to adult sites, that I think SURBL is the place to do it instead of creating a new mechanism in SA.

...

I reckon let's give it a go for a while like we did 6dos - what's the worst that can happen? We might get another SURBL - well more content is always a good thing in that case :) -- Regards,

David Hooton

-Marc

21 Jul 21 Jul

2:36 a.m.

New subject: RFC: sex site domain SURBL

Hi Marc, At 10:59 20-07-2004, Marc Kool wrote:

...

In my browser, when I type http://register.oscar.aol.com this is displayed:

AOL Instant Messenger is an adults-only service. Click Here if you are 18 or older. If you are under 18, click here to exit.

Seems 100% adult to me!

The url does not give you access to any "adult" material. I don't see any rationale for this being classified as a sex site. It has zero percent porn. :)

Regards, -sm

Jeff Chan

3:25 a.m.

New subject: RFC: sex site domain SURBL

On Tuesday, July 20, 2004, 10:59:27 AM, Marc Kool wrote: (David Hooton wrote:)

...

...
However SURBL's in general don't use subdomains, I've just run a test on my personal SURBL and SpamCopURI doesn't currently look at subdomains. I suspect because of the requirement for a lookup per domain level which would obviously both make things inefficient and also leave room for a denial of service.

...

Hmmm. I am afraid that spammers will abuse this property of SpamCopURI.

Actually the design decision to reduce subdomains to base domains was made to eliminate the abuse by spammers of using randomized subdomains....

Since AOL, ATT, MSN, or other legitimate ISPs and their subdomains are not often professional spammer destinations, it seemed more important to catch the deliberate randomizers. It looks like that may be less so for sex sites.

...

This is what I stated in the original proposal: let's make a SURBL list for adult-related URI's, not necessarily spammers. I know that SURBL is meant to fight spam, but it is relatively easy to extend with functionality to ban emails that refer to adult sites, that I think SURBL is the place to do it instead of creating a new mechanism in SA.

I agree about some of the value in this, certainly for squid use. I can think of a few different ways to proceed:

1. Discard all subdomains: probably too drastic for squid use since some legitimate sites could be lost, but probably appropriate for SURBL use.

2. Fold subdomains to registrar domains: creates too many false positives (at least for SURBL use) of sites hosted on otherwise legitimate hosting providers like att.net, etc. Would also break some squid matches.

3. Include the subdomains (the fully qualified-domain names) in the list as they appear in the data: this will prevent the registrar domains (like att.net) from matching in SURBLs, and it's also faithful to the original data, which can be a good thing in general and is probably preferable for squid use.

The main problem is that most code for using SURBLs on the client (mail server) side try to reduce the subdomains down to base domains. So they will tend not to match deliberately included subdomains. That can be an ok thing for SURBLs. Essentially it tells SURBLs to ignore the subdomains. If we wanted SURBLs to actually match these spam sites we'd check the full subdomains.

For Squid use #3 is probably the desirable since it best captures the original data.

So #3 would probably get the best results for both squid and SURBLs (by side effect of not matching the registrar domains). It's probably the best compromise under the current designed uses of both squid and SURBLs.

Comments?

Jeff C.

Jeff Chan

20 Jul 20 Jul

8:39 p.m.

New subject: RFC: sex site domain SURBL

On Tuesday, July 20, 2004, 6:58:15 AM, David Hooton wrote:

...

On Tue, 20 Jul 2004 15:27:52 +0200, Marc Kool m.kool@vioro.nl wrote:

...

...
I did a quick check on a few domains and I do not share your conclusion.

I think we have a slight case of culture clash here. This adult data is meant to be used in a proxy server where the data is apparently matched literally against URI data from web requests, etc.

SURBLs are designed to be used with specific email message body scanning programs that attempt to reduce the domains found in message body URIs to their registrar (base) domain so that subdomains like "models.home.att.net" are reduced to the base domain "att.net" before being included in a SURBL or checked against a SURBL.

The main reason we did this was to defeat the "random subdomain" spammers who generate random subdomains to try to defeat simple URI pattern matching or to key their spams to confirm the recipient addresses. Examples might be "abc1.xyz.spammerdomain.com" and "abc2.xyz.spammerdomain.com". Those we want to reduce to just "spammerdomain.com" since the randomized/keyed versions may occur only once and the sc.surbl.org data engine tries to increase the likelyhood of inclusion in the list with an increasing number of reports.

It may be useful to read about the sc.surbl.org data:

http://www.surbl.org/data.html

and the related Implementation Guidelines:

http://www.surbl.org/implementation.html

to gain a clearer understanding of some of our design decisions.

So both Mark and David's comments make sense in those differing contexts. The two contexts differ mainly in their handling of subdomains:

...

...
# grep aol.com domains adultaol.com register.oscar.aol.com sex-aol.com sexonaol.com usaol.com

...

register.oscar.aol.com is the server used by AOL messenger and ICQ to login - how on earth does this count as an Adult Website, much less a sex site?!!

And more importantly in my first try at processing the data for use as a SURBL, "register.oscar.aol.com" got reduced to "aol.com". :-(

...

...
# grep att.net domains adultonly.home.att.net borderjumper.home.att.net

[...]

...

Ahh the plot thickens... Subdomains..

...

...
# grep -w au.com domains aotoys.au.com condoms.au.com

[...]

...

...
For au.com and att.net there are only adult subdomains in the blacklist. This is ok.

...

However SURBL's in general don't use subdomains, I've just run a test on my personal SURBL and SpamCopURI doesn't currently look at subdomains. I suspect because of the requirement for a lookup per domain level which would obviously both make things inefficient and also leave room for a denial of service.

[...]

...

...
I assume that something went wrong when you verified the quality of the database.

...

I think the levels of understanding of what was in the DB and what SURBL was able to do were what went wrong.

...

Given my very quick testing I think it would probably be worth giving this data a try, we would most likely need to work out how to remove the subdomained entries - the list is huge, and efficiency we can gain by removing excess data would obviously be useful.

Good suggestion, but perhaps slightly tricky to implement, depending on the data.

I can easily use a regex to delete entries with subdomains like "xxxmovies.home.att.net" so that "att.net" does not get on the list. But that would only be effective if the deliberately randomized domains like "abc.xyz.spammerdomain.com" were reduced to "spammerdomain.com" in the source data, otherwise we would lose both.

In other words, if the data is a literal transcription of everything found in spams, including randomized URIs like "abc.xyz.spammerdomain.com," then we will lose the latter if I discard all subdomains.

So Mark, can you tell us if the randomized domains that spammers frequently used are reduced to the base domains in the adult data, i.e. "spammerdomain.com" and not "abc.xyz.spammerdomain.com"?

Jeff C.

Marc Kool

21 Jul 21 Jul

1:55 a.m.

New subject: RFC: sex site domain SURBL

Jeff Chan wrote:

...

On Tuesday, July 20, 2004, 6:58:15 AM, David Hooton wrote:

...
On Tue, 20 Jul 2004 15:27:52 +0200, Marc Kool m.kool@vioro.nl wrote:

...
...
I did a quick check on a few domains and I do not share your conclusion.

I think we have a slight case of culture clash here. This adult data is meant to be used in a proxy server where the data is apparently matched literally against URI data from web requests, etc.

SURBLs are designed to be used with specific email message body scanning programs that attempt to reduce the domains found in message body URIs to their registrar (base) domain so that subdomains like "models.home.att.net" are reduced to the base domain "att.net" before being included in a SURBL or checked against a SURBL.

This is new for me and it is clear.

...

The main reason we did this was to defeat the "random subdomain" spammers who generate random subdomains to try to defeat simple URI pattern matching or to key their spams to confirm the recipient addresses. Examples might be "abc1.xyz.spammerdomain.com" and "abc2.xyz.spammerdomain.com". Those we want to reduce to just "spammerdomain.com" since the randomized/keyed versions may occur only once and the sc.surbl.org data engine tries to increase the likelyhood of inclusion in the list with an increasing number of reports.

It may be useful to read about the sc.surbl.org data:

Yep, the reasons why this is done are clear but are not flawless. There are ISPs myisp.net that give customers a subdomain: e.g. myspamsite.myisp.net which can not be included in SURBL. I also assume that the percentage of these type of domains is not so big...

*snip*

...

...
Given my very quick testing I think it would probably be worth giving this data a try, we would most likely need to work out how to remove the subdomained entries - the list is huge, and efficiency we can gain by removing excess data would obviously be useful.

Good suggestion, but perhaps slightly tricky to implement, depending on the data.

I can easily use a regex to delete entries with subdomains like "xxxmovies.home.att.net" so that "att.net" does not get on the list. But that would only be effective if the deliberately randomized domains like "abc.xyz.spammerdomain.com" were reduced to "spammerdomain.com" in the source data, otherwise we would lose both.

In other words, if the data is a literal transcription of everything found in spams, including randomized URIs like "abc.xyz.spammerdomain.com," then we will lose the latter if I discard all subdomains.

So Mark, can you tell us if the randomized domains that spammers frequently used are reduced to the base domains in the adult data, i.e. "spammerdomain.com" and not "abc.xyz.spammerdomain.com"?

Nope :-(

...

Jeff C.

There are indeed "different cultures" surbl: fight spam of which lots is adult related squidguard: block adult sites of which only a small percentage spams

_I assume that most sites (want to) fight spam also (want to) block adult sites_.

For the record: my originals proposal would make sex.surbl.org more of a squidguard-based list than a surbl-based list.

One of the reasons to propose sex.surbl.org was the fact that SURBL list lag behind reality. In July I received 156 spams of which 16 were not detected by SA+SOME_SARE_RULES+OWN_RULES+SURBL because the SURBL lists were not updates fast enough (the 16 spams were marked as spam at a later time because then SURBL marked them and the SA rating went up). This is not meant to criticize anybody, just to put a fact.

I observed that many spams from new domains - share IP addresses - automatically forward you to a known sex site (in the squidguard database) and proposed sex.surbl.org

I hate to say it :-) but if the implementation gives to much headaches, the proposal as it is now, can be disregarded.

However, I see some value for the squidguard adult database to be used by software behind spamtraps: if an URI is retrieved and redirects you to a known sex site, the URI can be added automatically (= fast) to a SURBL list.

Marc

Jeff Chan

2:37 a.m.

New subject: RFC: sex site domain SURBL

On Tuesday, July 20, 2004, 4:55:01 PM, Marc Kool wrote:

...

Jeff Chan wrote:

...

...
The main reason we did this was to defeat the "random subdomain" spammers who generate random subdomains to try to defeat simple URI pattern matching or to key their spams to confirm the recipient addresses. Examples might be "abc1.xyz.spammerdomain.com" and "abc2.xyz.spammerdomain.com". Those we want to reduce to just "spammerdomain.com" since the randomized/keyed versions may occur only once and the sc.surbl.org data engine tries to increase the likelyhood of inclusion in the list with an increasing number of reports.

It may be useful to read about the sc.surbl.org data:

...

Yep, the reasons why this is done are clear but are not flawless. There are ISPs myisp.net that give customers a subdomain: e.g. myspamsite.myisp.net which can not be included in SURBL. I also assume that the percentage of these type of domains is not so big...

Yes, I think they are rare because a legitimate ISP would not want a major spam site on their domain, even a subdomain, for damage to their reputation, etc. Any ISP that would willingly host a spam site on a subdomain of their own domain I think we would consider rogue ISPs which I would not feel too bad about blocking entirely. But few ISPs seem to put themselves into this position, which is perhaps why big spammers use so many custom domains.

I think you're right; I can't really think of many examples of this actually happening, so our design compromise perhaps seems reasonable. :-)

[...]

...

For the record: my originals proposal would make sex.surbl.org more of a squidguard-based list than a surbl-based list.

Right, which is fine. Please see my next message for some proposed solutions to this.

...

One of the reasons to propose sex.surbl.org was the fact that SURBL list lag behind reality. In July I received 156 spams of which 16 were not detected by SA+SOME_SARE_RULES+OWN_RULES+SURBL because the SURBL lists were not updates fast enough (the 16 spams were marked as spam at a later time because then SURBL marked them and the SA rating went up). This is not meant to criticize anybody, just to put a fact.

...

I observed that many spams from new domains

share IP addresses

automatically forward you to a known sex site (in the squidguard database)

and proposed sex.surbl.org

There will always be some lag, but once caught, SURBLs have the potential to limit the spread of the spams, at least ones with the same URIs mentioned repeately.

Note that the next version of the sc data engine will cut this lag quite dramatically, especially for those resolving to frequently appearing spammer IP blocks. For more info on the proposed next version of this data engine, please see:

http://www.surbl.org/faq.html#numbered

...

However, I see some value for the squidguard adult database to be used by software behind spamtraps: if an URI is retrieved and redirects you to a known sex site, the URI can be added automatically (= fast) to a SURBL list.

...

Marc

I agree RBLs are a convenient and fast way to get data out. It takes good advantage of the existing DNS infrastructure.

Jeff C.

Frank Ellermann

20 Jul 20 Jul

4:13 a.m.

Rob McEwen wrote:

...

I think that Jeff is doing an excellent job, is very thorough, listens carefully to all sides and all evidence presented in disputes, and has excellent discernment and judgment.

Sure, I fully agree with this statement. And it's good to remove obvious errors and innocent bystanders from surbl lists manually. As far as possible, manual interventions always come after the fact.

And it's better to find technical solutions, e.g. a minimal number of sightings, because that's something working even without manual interventions.

But things start to get messy if Jeff defines some SpamCop reports manually as erroneous although the SC users and staff consider them as valid spam reports.

...

regarding the sex sites, this is a great idea because many businesses would prefer to block these types of e-mail.

On my main address XXX spam is very rare, and I doubt that a sex.surbl.org would help much. It's difficult to define spam, but my definition would try to avoid "content". Enlargement and viagra spam is not "better" than XXX spam, but far more popular.

Actually I'm not really worried about a sex.surbl.org as long as the source of the data is clear. I'm more worried about a SC.surbl.org not more reflecting the SC input data as defined on http://www.surbl.org/data.html

| many independent spam reports by SpamCop users are required | in order to get a domain onto the list [...] | few if any legitimate sites make it through the reporting | threshold and simple, short whitelist [...] | This is a democratic effect, improved by manual de-selection | of legitimate domains by SpamCop users when they submit their | reports. More reports means more votes that a given site is | indeed spam. The quality of data is reinforced by the | conscientious efforts of good people in reporting the spam. | In this sense it is democracy in action.

These are high standards, and if I report spamarrest.com, then this is my vote in this process. If there are enough votes for say spamarrest.com, and it's neither an error nor an innocent bystander, then the "democracy in action" should result in a host spamarrest.com.sc.surbl.org = 127.0.0.2

Bye, Frank

Jeff Chan

4:35 a.m.

On Monday, July 19, 2004, 7:13:33 PM, Frank Ellermann wrote:

...

But things start to get messy if Jeff defines some SpamCop reports manually as erroneous although the SC users and staff consider them as valid spam reports.

I agree it's not good to override the SpamCop reports, but there will always be a need to have whitelists to prevent Joe Jobs and deliberate poisoning of the data.

I also understand that you would like spamarrest listed, but as we discussed it, it does not seem they are creating the spams by themselves (abusers are initiating it), and they probably have some legitimate uses, so we really can't list them.

As a measure of how good the SpamCop data is, the actual whitelist hit log is commendably sparse:

http://spamcheck.freeapp.net/whitelist-hits.new.log

So the sc.surbl.org thresholding, etc. of the SpamCop data appears to be working pretty well.

P.S. Can anyone read Korean and tell us what the oo.to site is. Are they spammers? Should we list them?

Jeff C.

Frank Ellermann

5:43 a.m.

Jeff Chan wrote:

...

Can anyone read Korean and tell us what the oo.to site is.

http://co.to redirects to http://iidd.org/ and there's a page http://iidd.org/html/join.htm with...

| <TITLE>International Instant Domain Development's WorldWide | SUB-DOMAIN FORWARDING service is for FREE!</TITLE>

...sounds like free hosting and / or free redirections. On the main page is a "BAN : Spam, pirate Software, illegality" statement and a "police" link, and the "police" page has a

| mailto:-_-barkzeno@hitel.net?subject=only KOREAN or | ENGLISH&body=I am sorry, we do not have a CHINESE | translator.please write in English or korean.

Whatever that means. IMHO you can't list the complete site, it's like OrgDNS or one of the DynDNS SLDs. Here's the list of their SLDs as found on join.html:

4.to cc.to ce.to co.to con.cn dd.to gg.gg hh.to if.to joa.to kk.to kom.cn kp.to lil.to mini.to mm.to pc.to xx.to zz.to Bye, Frank