Listed at WS_URI_RBL
X-Spam-Status: No, hits=5.1 required=10.0 tests=RATWR10_MESSID=0.111, WS_URI_RBL=5
Thanks, -b.
On Wednesday, September 1, 2004, 5:47:11 AM, Bitz Bitz wrote:
Listed at WS_URI_RBL
X-Spam-Status: No, hits=5.1 required=10.0 tests=RATWR10_MESSID=0.111, WS_URI_RBL=5
Thanks, -b.
I've whitelisted: funnygreetings.com
It belongs to euniverse. I've added it to these other euniverse domains:
euniverse.com flowgo.com skilljam.com cupidjunction.com dietingplans.com intelligentx.com netlaughter.com cutestuf.com madblast.com infobeat.com gossipflash.com funnygreetings.com
Jeff C.
Hi!
It belongs to euniverse. I've added it to these other euniverse domains:
euniverse.com flowgo.com skilljam.com cupidjunction.com dietingplans.com intelligentx.com netlaughter.com cutestuf.com madblast.com infobeat.com gossipflash.com funnygreetings.com
Uh?!
flowgo, please look them up ...
Some examples:
http://listserv.nic.it/cgi-bin/wa?A2=ind0403&L=anti-spam&F=&S=&a... http://www.toastedspam.com/stupid/disptext/usadrugs.net_0004
Are you positive about that one?
Bye, Raymond.
Raymond Dijkxhoorn wrote:
Uh?!
flowgo, please look them up ...
On the other hand, a *LOT* of home-user ISP subscribers VERY *VERY* specifically signed up for their... output. For a variety of other reasons I've had to whitelist them for users anyway in a number of cases, but they *are* providing a service that people have signed up for.
I'd /dev/null them at the MTA if I had my choice (they *can* usually be identified at that level; never mind SA scoring and SURBL hits!), but as an ISP mail administrator I have to grit my teeth and put up with them. :(
-kgd Hates the cutesy "Spam your friends with our really bad jokes" sites, we does.
On Wednesday, September 1, 2004, 11:42:23 AM, Jeff Chan wrote:
On Wednesday, September 1, 2004, 5:47:11 AM, Bitz Bitz wrote:
Listed at WS_URI_RBL
X-Spam-Status: No, hits=5.1 required=10.0 tests=RATWR10_MESSID=0.111, WS_URI_RBL=5
Thanks, -b.
I've whitelisted: funnygreetings.com
It belongs to euniverse. I've added it to these other euniverse domains:
euniverse.com flowgo.com skilljam.com cupidjunction.com dietingplans.com intelligentx.com netlaughter.com cutestuf.com madblast.com infobeat.com gossipflash.com funnygreetings.com
The reason for whitelisting all of them is that they all belong to euniverse. While I agree that these "spam to your friends with jokes, greetings, prayers, whatever" sites are stupid and highly abuse-prone, they do have some legitimate uses and should probably not be blocked globally.
The other rationale is that euniverse is either a spamhaus or not. While it's possible they're highly clueless in their subscription policies, it seems odd to me that one part of their operation would be somewhat responsible, and another part would be blatantly spamming. Unless they've partitioned their mail servers along those lines, they would risk getting them all shut down by their ISP's AUP, and that would not make business sense for them.
Also I place organizations that use their own mail servers in a different class than those who are using zombies, or otherwise illegally stealing services to deliver their mail, or are hosted or sending from spam-friendly ISPs in rogue nations that we are all already aware of. Anyone who has a fixed mail server can be trivially and much more efficiently blocked using a regular RBL and they probably don't need to be in a SURBL. They would be more efficiently handled in a RBL such as sbl.spamhaus.org. If the SBL sighting is correct, perhaps euniverse already is.
That all said, I'm willing to consider taking flowgo.com off the whitelist if people agree that domain is more spammy than legitimate.
Does euniverse use any zombies, stolen services or spam-friendly ISPs?
Jeff C.
on Wed, Sep 01, 2004 at 01:31:25PM -0700, Jeff Chan wrote:
The reason for whitelisting all of them is that they all belong to euniverse. While I agree that these "spam to your friends with jokes, greetings, prayers, whatever" sites are stupid and highly abuse-prone, they do have some legitimate uses and should probably not be blocked globally.
Let's all try not to lose sight of the fact that SURBL is not a "block all spam" service. It is a list of domains known never to appear in any legitimate mail. If flowgo/euniverse/killer26374medzpilz.bix/whoever is a spamhaus with fixed IP space, we'll block them anyway, with a broad mix of antispam tools. If they're spamming through zombies, SURBL may be useful to help stop them or not. But let's not make SURBL into a "version two" antispam system, trying to solve every problem, or we'll spend even more time arguing here. I like SURBL because it covers the last 2% of the spam that my filters don't catch, but lets me quarantine it so I can figure out why my other filters didn't catch it. I'd never use SURBL for rejecting mail, that'd be a potential source of backscatter for innocent victims of joe jobs given my setup here, and probably given the setups of many other folks here.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jeff Chan writes:
On Wednesday, September 1, 2004, 11:42:23 AM, Jeff Chan wrote:
On Wednesday, September 1, 2004, 5:47:11 AM, Bitz Bitz wrote:
Listed at WS_URI_RBL
X-Spam-Status: No, hits=5.1 required=10.0 tests=RATWR10_MESSID=0.111, WS_URI_RBL=5
Thanks, -b.
I've whitelisted: funnygreetings.com
It belongs to euniverse. I've added it to these other euniverse domains:
euniverse.com flowgo.com skilljam.com cupidjunction.com dietingplans.com intelligentx.com netlaughter.com cutestuf.com madblast.com infobeat.com gossipflash.com funnygreetings.com
The reason for whitelisting all of them is that they all belong to euniverse. While I agree that these "spam to your friends with jokes, greetings, prayers, whatever" sites are stupid and highly abuse-prone, they do have some legitimate uses and should probably not be blocked globally.
The other rationale is that euniverse is either a spamhaus or not. While it's possible they're highly clueless in their subscription policies, it seems odd to me that one part of their operation would be somewhat responsible, and another part would be blatantly spamming. Unless they've partitioned their mail servers along those lines, they would risk getting them all shut down by their ISP's AUP, and that would not make business sense for them.
Also I place organizations that use their own mail servers in a different class than those who are using zombies, or otherwise illegally stealing services to deliver their mail, or are hosted or sending from spam-friendly ISPs in rogue nations that we are all already aware of. Anyone who has a fixed mail server can be trivially and much more efficiently blocked using a regular RBL and they probably don't need to be in a SURBL. They would be more efficiently handled in a RBL such as sbl.spamhaus.org. If the SBL sighting is correct, perhaps euniverse already is.
That all said, I'm willing to consider taking flowgo.com off the whitelist if people agree that domain is more spammy than legitimate.
BTW -- one problem I've observed with flowgo as it relates to SURBL, is that users forward their URLs a *lot*. So even if flowgo send spam, a mail that contains a flowgo URL often is not at all spammy -- just a person-to-person "here's a funny webpage" mail.
- --j.
On Wednesday, September 1, 2004, 2:03:25 PM, Justin Mason wrote:
BTW -- one problem I've observed with flowgo as it relates to SURBL, is that users forward their URLs a *lot*. So even if flowgo send spam, a mail that contains a flowgo URL often is not at all spammy -- just a person-to-person "here's a funny webpage" mail.
Good to know, and another reason to whitelist.
Hope people aren't expecting to be able to forward pill spams to their friends. ;-)
Jeff C.
Alright, alright... I see where Jeff is coming from...
I even later discovered that I had already whitelisted madblast.com locally because I saw this in a forwarded mail that was blocked.
I hate to see such a slimy company whitelisted. In fact, when I went online to check out infobeat.com, I saw a link that said "toolbar" ...when I clicked on this link, it didn't ask me to install the toolbar, it announced that it was, in fact, in the process of installing the toolbar (I got out of there ASAP).... thus proving their sliminess!
Nevertheless, I see how/why the goal of getting the FPs to a point where users can "set it and forget it" without having to audit SURBL-blocked mail is more important than stopping a few more spams from EUniverse. Therefore, I support Jeff's decision.
Not that Jeff "needs" my support... but I thought I should mention this since I previously was very pessimistic about whitelisting these domains.
Rob McEwen
On Wednesday, September 1, 2004, 7:36:26 PM, Rob McEwen wrote:
Nevertheless, I see how/why the goal of getting the FPs to a point where users can "set it and forget it" without having to audit SURBL-blocked mail is more important than stopping a few more spams from EUniverse. Therefore, I support Jeff's decision.
Rob McEwen
Steve Champeon's definition that: "SURBL is ... a list of domains known never to appear in any legitimate mail" seems quite useful to keep in mind.
I'd like to get SURBLs to a point where an ISP or telco could use them to block at the MTA level, i.e. with a Postfix filter. For that to happen would require an extremely low false positive rate. That may never be possible, but it should be a goal.
Also I don't like falsely accusing someone of being a spammer or blocking someone's legitimate mail. I hope others feel the same way.
Jeff C.
Jeff Chan wrote:
While I agree that these "spam to your friends with jokes, greetings, prayers, whatever" sites are stupid and highly abuse-prone, they do have some legitimate uses and should probably not be blocked globally.
IBTD. You could split your whitelist into "Jeff found some potentially legitimate use" and "really innocent bystanders".
The first white list should not be used to overrule SpamCop reports in sc.surbl.org. Thousands of SC users have an idea why they report spam, and these ideas don't necessarily match your personal definition of "potentially legitimate use".
Spam is about consent and not about "potentially legitimate use" or similar vague constructs.
euniverse is either a spamhaus or not.
It's not that simple. We've already discussed this problem with the pyramid scheme "spamarrest.com", a spammer styling itself as "anti-spam". IIRC they never made it as candidate for sc.surbl.org, the technical definition of spam works as expected. It's unnecessary to add your personal definition of "potentially legitimate use" to sc.surbl.org if there is a way to catch obvious errors like BBC-links in 419 spam.
it seems odd to me that one part of their operation would be somewhat responsible, and another part would be blatantly spamming.
Yes, that's odd. But this shouldn't be your problem, it's their weird business model. Please use the SC input as is, don't try to censor it.
I place organizations that use their own mail servers in a different class than those who are using zombies
For the SC input there should be only two classes: Obvious errors or votes as defined on your "I have a dream" page.
| we judge spam messages based on what they say, not where | they come from.
There are no "rogue nations". The average admin in China is like the average admin in Florida. China is only bigger.
| More reports means more votes that a given site is indeed | spam. The quality of data is reinforced by the conscientious | efforts of good people in reporting the spam. In this sense | it is democracy in action.
Nothing about "potentially legitimate use" on the SC data page. IMHO that's a feature and no bug. Simply tune the technical definition of spam until it matches your ideas of "potentially legitimate use". Manual interventions should be _exceptions_ for the sc.surbl.org zone. Less work for you, and prepared to run in unattended mode. Bye, Frank
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Frank Ellermann writes:
Jeff Chan wrote:
While I agree that these "spam to your friends with jokes, greetings, prayers, whatever" sites are stupid and highly abuse-prone, they do have some legitimate uses and should probably not be blocked globally.
IBTD. You could split your whitelist into "Jeff found some potentially legitimate use" and "really innocent bystanders".
There's another issue to think about, when you're talking about SURBL listings. A domain listed in SURBL may not have anything to do with the *sender* of the message; it matches the domains mentioned inside a message *that may have been sent by someone else*.
I think this means that the SURBL situation is uniquely different from most DNSBLs. Generally a DNSBL matches against the *sender* of a message. If a sender is listed, their messages and only their messages are blocked.
But in the SURBL case, a listing means that their messages, forwarded copies of their messages, cut-and-pastes from parts of their messages, etc. will also be listed.
This inherently means that for a certain case of borderline domains, a listing will result in more FPs even if the original sender has spammy tendencies.
- --j.
On Tuesday, September 7, 2004, 10:32:08 AM, Justin Mason wrote:
Frank Ellermann writes:
Jeff Chan wrote:
While I agree that these "spam to your friends with jokes, greetings, prayers, whatever" sites are stupid and highly abuse-prone, they do have some legitimate uses and should probably not be blocked globally.
IBTD. You could split your whitelist into "Jeff found some potentially legitimate use" and "really innocent bystanders".
There's another issue to think about, when you're talking about SURBL listings. A domain listed in SURBL may not have anything to do with the *sender* of the message; it matches the domains mentioned inside a message *that may have been sent by someone else*.
I think this means that the SURBL situation is uniquely different from most DNSBLs. Generally a DNSBL matches against the *sender* of a message. If a sender is listed, their messages and only their messages are blocked.
But in the SURBL case, a listing means that their messages, forwarded copies of their messages, cut-and-pastes from parts of their messages, etc. will also be listed.
This inherently means that for a certain case of borderline domains, a listing will result in more FPs even if the original sender has spammy tendencies.
- --j.
In other words content blocking is quite different from envelope or sender blocking. It's easy to get somewhat stuck thinking in terms of a sender blocking paradigm. Indeed that's not what we're doing here.
This is part of the reason content blocking in general is contentious and controversial. There is a lot of potential for abuse, collateral damage, wide-reaching mistakes, etc.
In a gamer's analogy, we have the BFG9000 with quad damage and friendly fire is on.
We need to be careful in applying this tool or we can easily do more harm than good.
Jeff C.
On Tue, 7 Sep 2004, Justin Mason wrote:
There's another issue to think about, when you're talking about SURBL listings. A domain listed in SURBL may not have anything to do with the *sender* of the message; it matches the domains mentioned inside a message *that may have been sent by someone else*.
I think this means that the SURBL situation is uniquely different from most DNSBLs. Generally a DNSBL matches against the *sender* of a message. If a sender is listed, their messages and only their messages are blocked.
But in the SURBL case, a listing means that their messages, forwarded copies of their messages, cut-and-pastes from parts of their messages, etc. will also be listed.
This inherently means that for a certain case of borderline domains, a listing will result in more FPs even if the original sender has spammy tendencies.
- --j.
Yes, but at a deeper level, SURBL is actually a better anti-spam tool because of that phenomenon. Spammers are a service industry, they sent out junk because they get paid by somebody else to do so. (IE just the sending of the crap is not intrinsically valuable, it's because somebody else finds value in it as an advertising medium).
So each forwarding of a spam message is that much more exposure and value-add for the actual slime-merchant.
If we can make that reference anathema, then we take away its value and reduce the effectiveness of that advertising medium, thus reducing the profit motive. Which ultimately will be the only real way to stop spam. As long as there's good money to be made in a particular activity (spam, drugs, smuggling, etc) people will do it, regardless of how hard it is to do.
This is also why SURBL is useful for blog cleaning, etc. It hits the references to the slime-merchant's goods.
However Justin, Jeff, et-all are correct. We need to be careful in how we target this weapon, lest it get branded a loose cannon.
On Tuesday, September 7, 2004, 10:16:06 PM, David Funk wrote:
So each forwarding of a spam message is that much more exposure and value-add for the actual slime-merchant.
If we can make that reference anathema, then we take away its value and reduce the effectiveness of that advertising medium, thus reducing the profit motive. Which ultimately will be the only real way to stop spam. As long as there's good money to be made in a particular activity (spam, drugs, smuggling, etc) people will do it, regardless of how hard it is to do.
But who forwards their (non-anti-spam) friends pill or morgtage spams? No one I know...
The real question is about the quality of our data, and we're not going to improve that by including a bunch of questionable domains. People who use our lists need to be sure they're catching real spam with them and not ham.
Jeff C.
On Tuesday, September 7, 2004, 11:04:51 PM, Jeff Chan wrote:
But who forwards their (non-anti-spam) friends pill or morgtage spams? No one I know...
On the other hand, people forward grey/hammy joke of the day, image of the week, newsletter articles, etc. to friends all the time. Therefore we probably don't want to list those.
Jeff C.
Jeff Chan wrote:
On the other hand, people forward grey/hammy joke of the day, image of the week, newsletter articles, etc. to friends all the time. Therefore we probably don't want to list those.
In the case of SC it would be against SC's rules if I report a mail from a friend only because I don't like it. Or in other words, if I report a "joke of the day" it wasn't sent by a "friend" (= somebody I know), and if I reported it ten times then I really got it ten times (catch-all vanity host, spam sent to Message-Ids or completely forged @xyzzy addresses)...
...unfortunately invisible for SURBL, because I generally use the "quick reporting" system for spam sent to bogus addresses.
But from time to time when I'm really pissed I use the "normal" reporting (= visible for SURBL), and of course then I'd be very unhappy if my "votes" are discarded for non-technical reasons.
Bye, Frank
On Wednesday, September 8, 2004, 5:46:05 AM, Frank Ellermann wrote:
Jeff Chan wrote:
On the other hand, people forward grey/hammy joke of the day, image of the week, newsletter articles, etc. to friends all the time. Therefore we probably don't want to list those.
In the case of SC it would be against SC's rules if I report a mail from a friend only because I don't like it. Or in other words, if I report a "joke of the day" it wasn't sent by a "friend" (= somebody I know), and if I reported it ten times then I really got it ten times (catch-all vanity host, spam sent to Message-Ids or completely forged @xyzzy addresses)...
Yes, but the problem is that people report hams (non-spams) to SpamCop all the time.
1. If people report a spam that mentions ebay ten times to SpamCop, should we blacklist ebay? Of course not.
2. Are they breaking the rules of SpamCop? Probably.
3. Does this actually happen? Yes.
Therefore we need to be able to override the SpamCop reports sometimes. This is only a very tiny fraction of the cases. It is only used in exceptional cases.
Jeff C.
On Wednesday, September 8, 2004, 4:33:53 PM, Jeff Chan wrote:
Therefore we need to be able to override the SpamCop reports sometimes. This is only a very tiny fraction of the cases. It is only used in exceptional cases.
Jeff C.
As proof, here is the log of whitelist hits for SC:
http://spamcheck.freeapp.net/whitelist-hits.new.log
As you can see, it is very sparse and accurate.
Jeff C.
----- Original Message ----- From: "Jeff Chan" jeffc@surbl.org
On Wednesday, September 8, 2004, 4:33:53 PM, Jeff Chan wrote:
Therefore we need to be able to override the SpamCop reports sometimes. This is only a very tiny fraction of the cases. It is only used in exceptional cases.
Jeff C.
As proof, here is the log of whitelist hits for SC:
http://spamcheck.freeapp.net/whitelist-hits.new.log
As you can see, it is very sparse and accurate.
Jeff, you might want to consider alphabetizing your list so that it's easier to spot duplicate entries like myquickpaypro.com. Although the duplicate entries probably don't hurt anything, it's just a thought...
Bill
On Wednesday, September 8, 2004, 5:17:54 PM, Bill Landry wrote:
----- Original Message ----- From: "Jeff Chan" jeffc@surbl.org
On Wednesday, September 8, 2004, 4:33:53 PM, Jeff Chan wrote:
Therefore we need to be able to override the SpamCop reports sometimes. This is only a very tiny fraction of the cases. It is only used in exceptional cases.
Jeff C.
As proof, here is the log of whitelist hits for SC:
http://spamcheck.freeapp.net/whitelist-hits.new.log
As you can see, it is very sparse and accurate.
Jeff, you might want to consider alphabetizing your list so that it's easier to spot duplicate entries like myquickpaypro.com. Although the duplicate entries probably don't hurt anything, it's just a thought...
Bill
Done:
http://spamcheck.freeapp.net/whitelist-hits.new.log.sort
Jeff C.
Jeff Chan wrote:
Done: http://spamcheck.freeapp.net/whitelist-hits.new.log.sort
Thanks, bookmarked. Bye, Frank
On Tuesday, September 7, 2004, 9:10:50 AM, Frank Ellermann wrote:
Jeff Chan wrote:
While I agree that these "spam to your friends with jokes, greetings, prayers, whatever" sites are stupid and highly abuse-prone, they do have some legitimate uses and should probably not be blocked globally.
IBTD. You could split your whitelist into "Jeff found some potentially legitimate use" and "really innocent bystanders".
The first white list should not be used to overrule SpamCop reports in sc.surbl.org. Thousands of SC users have an idea why they report spam, and these ideas don't necessarily match your personal definition of "potentially legitimate use".
Spam is about consent and not about "potentially legitimate use" or similar vague constructs.
Every form of spam classification can make errors. Therefore there must be some form of feedback or error correction, or other strategies to deal with misclassifications.
Whitelisting is one strategy.
Another is trying to get enough spam reports or even trapped spam to be able to get some meaningful statistical impression about spammyness. If 1000 people report a domain as spammy, it probably is. If only 1 person says it's spammy it may be less likely.
It would be great to hear about other strategies. Does anyone have any ideas, research, etc. into this?
euniverse is either a spamhaus or not.
It's not that simple. We've already discussed this problem with the pyramid scheme "spamarrest.com", a spammer styling itself as "anti-spam". IIRC they never made it as candidate for sc.surbl.org, the technical definition of spam works as expected. It's unnecessary to add your personal definition of "potentially legitimate use" to sc.surbl.org if there is a way to catch obvious errors like BBC-links in 419 spam.
In grey cases, we must sometime apply some judgement in order to prevent false positives. It's not fun or easy, but it needs to be done, or else SURBLs could rapidly become much less useful.
it seems odd to me that one part of their operation would be somewhat responsible, and another part would be blatantly spamming.
Yes, that's odd. But this shouldn't be your problem, it's their weird business model. Please use the SC input as is, don't try to censor it.
The point is to determine whether the organization is a spam gang or not. I agree with your point that we should be free to list any part of an organization that is mostly spammy, however, even if other parts are not.
I place organizations that use their own mail servers in a different class than those who are using zombies
For the SC input there should be only two classes: Obvious errors or votes as defined on your "I have a dream" page.
Perhaps my obvious errors are not the same as your obvious errors. ;-)
In case of disagreement we must whitelist or there is potential for FPs. When in doubt, we whitelist (or exclude in the first place for manual blacklists).
| we judge spam messages based on what they say, not where | they come from.
There are no "rogue nations". The average admin in China is like the average admin in Florida. China is only bigger.
I assume most people are aware that many of the professional spammer sites seem to be hosted in China, Brazil and Korea, and that they continue to do so. Therefore we can assume any anti-spam laws or abuse policies are not being enforced there.
That said, it *is* the content that matters. Pill spammers, mortgage spammers, warez spammers, porn spammers, etc. all can be blocked, regardless of where they host or zombie.
| More reports means more votes that a given site is indeed | spam. The quality of data is reinforced by the conscientious | efforts of good people in reporting the spam. In this sense | it is democracy in action.
Nothing about "potentially legitimate use" on the SC data page. IMHO that's a feature and no bug. Simply tune the technical definition of spam until it matches your ideas of "potentially legitimate use". Manual interventions should be _exceptions_ for the sc.surbl.org zone. Less work for you, and prepared to run in unattended mode. Bye, Frank
They *are* the exceptions. Most of the SpamCop reports get into sc.surbl.org.
Jeff C.
Jeff Chan wrote:
there must be some form of feedback or error correction, or other strategies to deal with misclassifications.
Whitelisting is one strategy.
ACK, but where and as far as possible I'd prefer a technical definition like the "BI" (Breidbarth Index) in Usenet.
E.g. whitelisting TLD .edu is almost the same bad idea as blacklisting TLD .biz.
Another is trying to get enough spam reports or even trapped spam to be able to get some meaningful statistical impression about spammyness. If 1000 people report a domain as spammy, it probably is. If only 1 person says it's spammy it may be less likely.
You could combine these strategies using the SC input: If the SC data matches whitelisted domains, then something is wrong:
Either the domain shouldn't be whitelisted w.r.t. the SC zone. or it should be reported as "IB link" (innocent bystander) to deputies@admin.spamcop
Both cases require some manual intervention, unfortuately, but at least you would catch erroneous WL entries.
Does anyone have any ideas, research, etc. into this?
You're already using good ideas like "age of registration", and if this data isn't available (see *.whois.rfc-ignorant.org) it is their problem, treat it as "registered yesterday".
In grey cases, we must sometime apply some judgement in order to prevent false positives.
Sure, but that judgement should consider the source resp. zone of the data. SC and SC.SURBL.ORG are not exactly the same as OB or WS. Minus obvious errors, abuses, and bugs SC.SURBL.ORG is designed to run on auto-pilot.
we should be free to list any part of an organization that is mostly spammy, however, even if other parts are not.
Indeed, and TLD .edu, or hosted by Schlund, or a NYSE ticker symbol have nothing to do with spam vs. ham. Anybody can be hit by an idiot spammer in his own domain, so what ? As soon as the problem is solved the listings expire.
Perhaps my obvious errors are not the same as your obvious errors. ;-)
Not sure. My definition of "obvious error" for the SC zone would be "I'd report it as innocent bystander to deputies@".
If your definition is very different, and if the reason for this difference is related to other SURBL zones, then maybe one general whitelist covering all zones is not good enough.
[rogue nations]
I assume most people are aware that many of the professional spammer sites seem to be hosted in China, Brazil and Korea, and that they continue to do so. Therefore we can assume any anti-spam laws or abuse policies are not being enforced there.
TTBOMK that's not more true for Korea. They have some kind of "anti-spam" law, it predates CAN-SPAM, and is not really worse.
Some ISPs and registrars are "rogue" (e.g. SpamCast, ChinaNet, DirectI), many are clueless or ignorant, but it's not related to "nations". Unless you're prepared to identify the U.S. as the top spammer nation of the known universe. ;-)
Most of the SpamCop reports get into sc.surbl.org.
That's good. Use the rest which doesn't make it to check your whitelists and automatical procedures. Maybe feed it to the new "unconfirmed" SURBL (?) BTW, of course a new uc.surbl.org shouldn't be a part of multi.surbl.org, it's too dangerous.
Bye, Frank
On Wednesday, September 8, 2004, 7:13:36 AM, Frank Ellermann wrote:
Jeff Chan wrote:
there must be some form of feedback or error correction, or other strategies to deal with misclassifications.
Whitelisting is one strategy.
ACK, but where and as far as possible I'd prefer a technical definition like the "BI" (Breidbarth Index) in Usenet.
Here's a definition (note there is no H in the name):
http://www.stopspam.org/usenet/mmf/breidbart.html
"The BI is a measure of how spammy a spammed news article is. It is the sum of the square root of the number of groups each copy of a spam article is posted to. So if you post 10 copies of an article, each cross-posted to 4 groups, the BI is 20. Other ways of reaching the BI=20 mark (a threshhold used by some cancellers) is to post 20 copies, each to just one group, 4 copies to 25 groups each, or 8 articles to 6 groups each and one more to just one group. (for BI=20.6)"
It's interesting, but probably does not apply in the mail spam area directly. I suppose we could say how often does a domain appear on multiple SURBLs, but some of the SURBL data feeds are unitary, i.e. we can't see how many reports went into the listing, only whether a domain is listed or not.
This sort of idea could perhaps be useful for categorizing spamtrap data however, especially across multiple spamtraps.
But I think your complaint is that there's no objective criteria for whitelisting. That's fair, but there always must be some subjective judgement applied, especially when we can't see the entire universe of mail spam in the same way that the entire universe of Usenet spam *is easily visible*.
It's also definitely not the case that we can see the entire mail ham universe, so there really can't be a generally knowable measure of the spam/ham ratio of a given domain.
This is somewhat a question of philosophy and science: to know what is knowable and what is not, i.e. epistemology.
Since spammyness versus legitimacy is not easily measured purely objectively, we must reserve the right to make judgements.
If you have a BI or something similar for *mail* spam, then please share it.
E.g. whitelisting TLD .edu is almost the same bad idea as blacklisting TLD .biz.
Not to worry, neither is going to happen. Such things would be too powerful and probably unnecessary. Our focus is more on the spammyness of individual domains.
Another is trying to get enough spam reports or even trapped spam to be able to get some meaningful statistical impression about spammyness. If 1000 people report a domain as spammy, it probably is. If only 1 person says it's spammy it may be less likely.
You could combine these strategies using the SC input: If the SC data matches whitelisted domains, then something is wrong:
Either the domain shouldn't be whitelisted w.r.t. the SC zone. or it should be reported as "IB link" (innocent bystander) to deputies@admin.spamcop
That's fine, but reporting IB to SpamCop does not take them out of sc.surbl.org. That still must be done on our side.
In fact we should probably also be reporting whitelist hits back to SpamCop as innocent bystanders. The actual number of meaningful whitelist hits is much smaller than you may be assuming.
The feed from SpamCop into sc.surbl.org is one way from them to us.
Both cases require some manual intervention, unfortuately, but at least you would catch erroneous WL entries.
Take a look at the whitelist hit log for sc.surbl.org and tell me how many you think are erroneous:
http://spamcheck.freeapp.net/whitelist-hits.new.log
I see approximately zero. :-)
Does anyone have any ideas, research, etc. into this?
You're already using good ideas like "age of registration", and if this data isn't available (see *.whois.rfc-ignorant.org) it is their problem, treat it as "registered yesterday".
Yes, domain age is a good one. Most professional spammers register many fresh domains every day, use them for a few days at most, then change to another.
In grey cases, we must sometime apply some judgement in order to prevent false positives.
Sure, but that judgement should consider the source resp. zone of the data. SC and SC.SURBL.ORG are not exactly the same as OB or WS. Minus obvious errors, abuses, and bugs SC.SURBL.ORG is designed to run on auto-pilot.
Whitelisting is still needed across all SURBLs. Otherwise things like yahoo.com or ebay.com could get added.
we should be free to list any part of an organization that is mostly spammy, however, even if other parts are not.
Indeed, and TLD .edu, or hosted by Schlund, or a NYSE ticker symbol have nothing to do with spam vs. ham. Anybody can be hit by an idiot spammer in his own domain, so what ? As soon as the problem is solved the listings expire.
Yes, we do not whitelist every domain the Schlund ever registers. Any individual domain hosted at Schlund or any .edu domain can get listed if they spam.
The point about Schlund is that we should not consider them a blackhat registrar because they have a few abusers. That fact does not stop us from listing their customer's domains. We can still list their customers.
The registrar information was meant to be a little extra hint about whether a domain is spammy. There are some registrars that seem to register a lot of spam domains. Sometimes that can be a signal that a domain is spammy. Sometimes it is not.
Perhaps my obvious errors are not the same as your obvious errors. ;-)
Not sure. My definition of "obvious error" for the SC zone would be "I'd report it as innocent bystander to deputies@".
That's fine, but reporting innocent bystanders does not take them off any SURBL lists. Only whitelisting or taking them out of the source data can do that.
If your definition is very different, and if the reason for this difference is related to other SURBL zones, then maybe one general whitelist covering all zones is not good enough.
I disagree. If a domain is legit, we whitelist. Otherwise we allow them to get listed. It doesn't matter what the list is.
[rogue nations]
I assume most people are aware that many of the professional spammer sites seem to be hosted in China, Brazil and Korea, and that they continue to do so. Therefore we can assume any anti-spam laws or abuse policies are not being enforced there.
TTBOMK that's not more true for Korea. They have some kind of "anti-spam" law, it predates CAN-SPAM, and is not really worse.
Some ISPs and registrars are "rogue" (e.g. SpamCast, ChinaNet, DirectI), many are clueless or ignorant, but it's not related to "nations". Unless you're prepared to identify the U.S. as the top spammer nation of the known universe. ;-)
Most legitimate US or European ISPs will shut down spam sites or spam senders. One important point of SURBLs is to be able to catch spam sites that *don't* get shut down. It really doesn't matter where they are. All that matters is that there *are* hosts that allow them to stay up. Those we need to catch. And we do.
Most of the SpamCop reports get into sc.surbl.org.
That's good. Use the rest which doesn't make it to check your whitelists and automatical procedures.
I believe the question of whitelist hits is fully answered by looking at the actual ones:
http://spamcheck.freeapp.net/whitelist-hits.new.log'
But I also agree that we should review whitelist hits to make sure they're legitimate. We're quite careful about what goes onto our whitelists in the first place, so it should not be a major problem.
When I re-write my data engine, it will handle all the lists in a consistent manner and we should be able to get better reporting across all lists about new additions, new whitelist hits, etc.
Jeff C.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jeff Chan writes:
On Wednesday, September 8, 2004, 7:13:36 AM, Frank Ellermann wrote:
Jeff Chan wrote:
there must be some form of feedback or error correction, or other strategies to deal with misclassifications.
Whitelisting is one strategy.
ACK, but where and as far as possible I'd prefer a technical definition like the "BI" (Breidbarth Index) in Usenet.
Here's a definition (note there is no H in the name):
http://www.stopspam.org/usenet/mmf/breidbart.html
"The BI is a measure of how spammy a spammed news article is. It is the sum of the square root of the number of groups each copy of a spam article is posted to. So if you post 10 copies of an article, each cross-posted to 4 groups, the BI is 20. Other ways of reaching the BI=20 mark (a threshhold used by some cancellers) is to post 20 copies, each to just one group, 4 copies to 25 groups each, or 8 articles to 6 groups each and one more to just one group. (for BI=20.6)"
It's interesting, but probably does not apply in the mail spam area directly. I suppose we could say how often does a domain appear on multiple SURBLs, but some of the SURBL data feeds are unitary, i.e. we can't see how many reports went into the listing, only whether a domain is listed or not.
This sort of idea could perhaps be useful for categorizing spamtrap data however, especially across multiple spamtraps.
But I think your complaint is that there's no objective criteria for whitelisting. That's fair, but there always must be some subjective judgement applied, especially when we can't see the entire universe of mail spam in the same way that the entire universe of Usenet spam *is easily visible*.
It's also definitely not the case that we can see the entire mail ham universe, so there really can't be a generally knowable measure of the spam/ham ratio of a given domain.
This is somewhat a question of philosophy and science: to know what is knowable and what is not, i.e. epistemology.
Since spammyness versus legitimacy is not easily measured purely objectively, we must reserve the right to make judgements.
If you have a BI or something similar for *mail* spam, then please share it.
As a matter of interest -- and I should just ask Seth Breidbart ;) -- does this deal with hashbusters? ie. if a message is 80% hashbuster strings, and 20% payload, it's not so easy to automate BI calculation. (cf. dcc, Pyzor, Razor, AOL's paper at CEAS, et al.)
- --j.
On Wednesday, September 8, 2004, 8:02:33 PM, Justin Mason wrote:
Jeff Chan writes:
Here's a definition (note there is no H in the name):
http://www.stopspam.org/usenet/mmf/breidbart.html
"The BI is a measure of how spammy a spammed news article is. It is the sum of the square root of the number of groups each copy of a spam article is posted to. So if you post 10 copies of an article, each cross-posted to 4 groups, the BI is 20. Other ways of reaching the BI=20 mark (a threshhold used by some cancellers) is to post 20 copies, each to just one group, 4 copies to 25 groups each, or 8 articles to 6 groups each and one more to just one group. (for BI=20.6)"
As a matter of interest -- and I should just ask Seth Breidbart ;) -- does this deal with hashbusters? ie. if a message is 80% hashbuster strings, and 20% payload, it's not so easy to automate BI calculation. (cf. dcc, Pyzor, Razor, AOL's paper at CEAS, et al.)
- --j.
I'm not sure I understand the question. It seems to me that BI is a calculation based on counts of crossposting per message and does not consider content.
I guess you're saying that detection of multiple postings could be thrown off by hash busting, when the crossposting is done by posting to different newsgroups individually and not overtly listed in the headers.
Jeff C.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jeff Chan writes:
On Wednesday, September 8, 2004, 8:02:33 PM, Justin Mason wrote:
Jeff Chan writes:
Here's a definition (note there is no H in the name):
http://www.stopspam.org/usenet/mmf/breidbart.html
"The BI is a measure of how spammy a spammed news article is. It is the sum of the square root of the number of groups each copy of a spam article is posted to. So if you post 10 copies of an article, each cross-posted to 4 groups, the BI is 20. Other ways of reaching the BI=20 mark (a threshhold used by some cancellers) is to post 20 copies, each to just one group, 4 copies to 25 groups each, or 8 articles to 6 groups each and one more to just one group. (for BI=20.6)"
As a matter of interest -- and I should just ask Seth Breidbart ;) -- does this deal with hashbusters? ie. if a message is 80% hashbuster strings, and 20% payload, it's not so easy to automate BI calculation. (cf. dcc, Pyzor, Razor, AOL's paper at CEAS, et al.)
- --j.
I'm not sure I understand the question. It seems to me that BI is a calculation based on counts of crossposting per message and does not consider content.
I guess you're saying that detection of multiple postings could be thrown off by hash busting, when the crossposting is done by posting to different newsgroups individually and not overtly listed in the headers.
yep.
- --j.
Jeff Chan wrote:
[Seth Breidbart]
note there is no H in the name
Oops, sorry, I checked only the wrong version with Google, completely stupid, for the correct spelling I should have used the link in my signature.
It's interesting, but probably does not apply in the mail spam area directly.
Yes, it's only the principle which is interesting, use some simple and objective criteria as far as possible. You have already some good algorithms like "domains inherit the values known for other domains with the same IP".
That's the stuff I like, it makes sense, and it works always the same way - independent of your mood or caffeine level.
there always must be some subjective judgement applied, especially when we can't see the entire universe of mail spam
Sure, but you can still try to minimize these judgement calls. In the case of SC you have quantitative data, a good sample.
we must reserve the right to make judgements.
Of course, there are and always will be errors in the SC data, joe jobs / bogus links / abuses. It's good to fix these bugs, and remove or prevent any corresponding sc.surbl.org entries.
But it's wrong to do more than this. If SC says that inkjets (or whatever the name was) is spamvertized, then it should be listed.
reporting IB to SpamCop does not take them out of sc.surbl.org. That still must be done on our side.
Yes, maybe these efforts could be combined somehow. If you're in contact with Julian, maybe he's willing to share this info. They are interested in "IBs", but there might be reasons why they cannot publish this data.
The actual number of meaningful whitelist hits is much smaller than you may be assuming.
No assumption, I'm only worried that the manual interventions for whitelisting could get out of hand, or end in arbitrariness.
http://spamcheck.freeapp.net/whitelist-hits.new.log I see approximately zero. :-)
Did you count tripod.cl ? That's an extremely ignorant hoster of many spamvertized pages. Wanadoo.es also had some dubious customers. Or does this data exclude spamvertized subdomains ?
The point about Schlund is that we should not consider them a blackhat registrar because they have a few abusers.
ACK. Some years ago there was a problem with this hoster, but they changed. Like Joker (as registrar), and maybe we can say the same for SpamCast (as ISP) in 2006.
There are some registrars that seem to register a lot of spam domains.
DirectI. In theory this should be better in 2006. The new ICANN WDPRS for almost all gTLDs started this year, therefore the problems should be obvious early in 2005, and then rogue registrars fix their procedures or risk their accreditation.
maybe one general whitelist covering all zones is not good enough.
I disagree. If a domain is legit, we whitelist. Otherwise we allow them to get listed. It doesn't matter what the list is.
That's a point where we have to agree to disagree. I support the published definition of SC.SURBL.ORG with the "democracy in action". Which has nothing to do with your personal ideas of "legit". Heck, we're not on the same continent, we're in completely different cultures, there's almost no chance that our definitions of "legit" match.
We obviously agree on "don't harm innocents" as an excuse to overrule SC votes, but that's not exactly the same as "legit".
Bye, Frank -- Whois Data Problem Report System http://wdprs.internic.net/
On Thursday, September 9, 2004, 2:49:41 PM, Frank Ellermann wrote:
Jeff Chan wrote:
(Breidbart Index)
Yes, it's only the principle which is interesting, use some simple and objective criteria as far as possible. You have already some good algorithms like "domains inherit the values known for other domains with the same IP".
That's the stuff I like, it makes sense, and it works always the same way
Yes, and we will use them where we can.
http://spamcheck.freeapp.net/whitelist-hits.new.log I see approximately zero. :-)
Did you count tripod.cl ? That's an extremely ignorant hoster of many spamvertized pages. Wanadoo.es also had some dubious customers. Or does this data exclude spamvertized subdomains ?
Too many legitimate mentions. These are large hosting providers. We can't block on them. We generally don't list subdomains, only registrar domains.
(Chris and Ryan and Raymond, don't even think about proposing a subdomain list. LOL! ;-)
There are some registrars that seem to register a lot of spam domains.
DirectI. In theory this should be better in 2006. The new ICANN WDPRS for almost all gTLDs started this year, therefore the problems should be obvious early in 2005, and then rogue registrars fix their procedures or risk their accreditation.
Excellent. It's about time ICANN cracked down on rogue registrars.
We obviously agree on "don't harm innocents" as an excuse to overrule SC votes, but that's not exactly the same as "legit".
I think that needs to be key. Minimize collateral damage, and maximize spam listings. We try to optimize both simultaneously.
There will always be disagreement about that optimization point. That is natural. (It's also a PITA.)
For someone to suggest that we have not "drawn a line" is ignorant and unfair. I think the 60,000+ spammers we have listed would feel otherwise also.
Jeff C.
Jeff Chan wrote:
Chris and Ryan and Raymond, don't even think about proposing a subdomain list. LOL! ;-)
What's the problem with this idea ? It would be only one level above the real host, so for say claranet.de you would have to consider www.claranet.de and xyzzy.claranet.de, but you would ignore www.xyzzy.claranet.de or more.levels.xyzzy.claranet.de
Then if I start to spamvertize my site you catch me without hitting any other user.claranet.de (let alone www.claranet.de)
Assuming that my ISP doesn't neeed weeks to cancel my account after I started to spam the xyzzy entry will expire soon.
It's about time ICANN cracked down on rogue registrars.
I'll believe it when I see it. These registrars pay ICANN's budget, don't they ?
There will always be disagreement about that optimization point. That is natural. (It's also a PITA.)
Sometimes your criteria appear to be a bit obscure for me. Of course some people may love a "joke of the day" mail - that's okay, if they like it they won't report it as spam.
But others don't like any unsolicited jokes, and they would report it as spam. In that case the joke-of-the-day site _is_ spamming, and it's okay to list them. Even if they also have some real fans with a "legit" interest in their joke of the day. In that case you can't avoid a collateral damage, whatever you do. Bye, Frank