-----Original Message----- From: Jeff Chan [mailto:jeffc@surbl.org] Sent: Monday, October 25, 2004 5:01 PM To: SURBL Discuss Subject: Re: [SURBL-Discuss] free host: greatnow.com
On Monday, October 25, 2004, 1:23:32 PM, Chris Santerre wrote:
From: Jeff Chan [mailto:jeffc@surbl.org]
If we're thinking about setting up a blog list (as we were earlier), then it might be useful to test the data before using it, don't you agree?
I don't see how dumping lists with arbitrary FPs onto UC helps either UC or SURBLs. In fact it's one of the bad things we predicted: that a grey list would become a dumping ground with some FPs and some domains that belong on a blocklist, all sitting there underclassified, unchecked or ignored.
They are NOT going unchecked. UC is still in beta form right
now. So we are
testing. Most people have no clue where the server is as it
is NOT part of
SURBL, so UC.SURBL.ORG doesn't work. Not a dumping ground at
all. It will be
as active as WS.
I fully intend to mirror most of what goes into WS into UC.
UC will simply
have a different policy. Grey domains need to be considered.
UC will do
that. You said yourself earlier you didn't want to be any
part of a list
that handled grey domains. That it would waste time. So you
don't have to
worry about UC.
UC will get as much attention to detail as I put into WS. I
just won't
delete grey domains, like I do now. I will instead list then in UC.
How about a blog spam SURBL? Or is all blog spam grey?
You want a seperate list for blog spammers? Have at it. I'll add what I can to it.
--Chris
On Monday, October 25, 2004, 2:28:25 PM, Chris Santerre wrote:
You want a seperate list for blog spammers? Have at it. I'll add what I can to it.
The list you got from jayallen.org was one that was of interest for a blogger list. But IMO the proper way to use any new data is to test it first, publically and widely. Everyone has slightly different use of email, so it's good to test on more system and more than one set of mail to better find potential problems.
Jeff C. -- "If it appears in hams, then don't list it."
On Monday, October 25, 2004, 4:10:16 PM, Jeff Chan wrote:
it's good to test on more system and more than one set of mail to better find potential problems.
more than one systems....
Anyway did we find any other public blog spam data besides jayallen?
IIRC Matthew Hunter was staring to collect some blog spam data. Matthew, how is that going?
Does anyone else know of other sources?
Jeff C. -- "If it appears in hams, then don't list it."
On Mon, Oct 25, 2004 at 04:24:10PM -0700, Jeff Chan jeffc@surbl.org wrote:
On Monday, October 25, 2004, 4:10:16 PM, Jeff Chan wrote:
it's good to test on more system and more than one set of mail to better find potential problems.
more than one systems.... Anyway did we find any other public blog spam data besides jayallen? IIRC Matthew Hunter was staring to collect some blog spam data. Matthew, how is that going?
Fairly well by my primary measure, which is stopping spam comments and trackbacks on my blog. I've learned that the number of domains actually being used for this sort of thing is very small compared to email spam. I've added the 24 domains of my own to the MT-blacklist list from 2004/08/29 and that has sufficed to block everyone who is trying to spam my blog. 0 false positives on blog spam attempts, but I'm not using the same list to block on email.
Almost all of the attempts have been against the domains I had to add myself, not those already on the list. This suggests to me that there is a small number of blog spammers behind most of it. They buy a few domains, do a run, and when they start to get blocked buy a few more domains. It's very much a stop and start thing.
If anyone can point me to a tool to run a URIBL against a spam corpus, I'll report back results against my own personal spam collection. Or I could just post the updated list somewhere.
On Mon, Oct 25, 2004 at 10:56:35PM -0500, Matthew Hunter matthew@infodancer.org wrote:
On Mon, Oct 25, 2004 at 04:24:10PM -0700, Jeff Chan jeffc@surbl.org wrote:
On Monday, October 25, 2004, 4:10:16 PM, Jeff Chan wrote:
it's good to test on more system and more than one set of mail to better find potential problems.
more than one systems.... Anyway did we find any other public blog spam data besides jayallen? IIRC Matthew Hunter was staring to collect some blog spam data. Matthew, how is that going?
Fairly well by my primary measure, which is stopping spam comments and trackbacks on my blog. I've learned that the number of domains actually being used for this sort of thing is very small compared to email spam. I've added the 24 domains of my own to the MT-blacklist list from 2004/08/29 and that has sufficed to block everyone who is trying to spam my blog. 0 false positives on blog spam attempts, but I'm not using the same list to block on email.
It should be noted that my definition of "false positive" may differ from that of the SURBL overall. In particular, I don't consider a domain a false positive if someone has attempted to blog-spam me with it -- even if the domain has legitimate uses. The domains I am being spammed with are very obviously porn-related; as my blog is not porn-related they are clearly spam.
Whether someone who is into porn and/or willing to pay for porn would have a legitimate use for these domains I can't say. So there might be FPs from a SURBL perspective. But not from mine.
On Monday, October 25, 2004, 9:20:11 PM, Matthew Hunter wrote:
It should be noted that my definition of "false positive" may differ from that of the SURBL overall. In particular, I don't consider a domain a false positive if someone has attempted to blog-spam me with it -- even if the domain has legitimate uses. The domains I am being spammed with are very obviously porn-related; as my blog is not porn-related they are clearly spam.
Whether someone who is into porn and/or willing to pay for porn would have a legitimate use for these domains I can't say. So there might be FPs from a SURBL perspective. But not from mine.
It's definitely good to know about that, as it's a fairly important difference in philosophies. To me it says that such data may be more appropriate for protecting blogs than they might be for filtering mail. Of course either is useful, but perhaps in different applications, as you seem to suggest.
Jeff C. -- "If it appears in hams, then don't list it."
On Monday, October 25, 2004, 8:56:35 PM, Matthew Hunter wrote:
On Mon, Oct 25, 2004 at 04:24:10PM -0700, Jeff Chan jeffc@surbl.org wrote:
On Monday, October 25, 2004, 4:10:16 PM, Jeff Chan wrote:
it's good to test on more system and more than one set of mail to better find potential problems.
more than one systems.... Anyway did we find any other public blog spam data besides jayallen? IIRC Matthew Hunter was staring to collect some blog spam data. Matthew, how is that going?
Fairly well by my primary measure, which is stopping spam comments and trackbacks on my blog. I've learned that the number of domains actually being used for this sort of thing is very small compared to email spam. I've added the 24 domains of my own to the MT-blacklist list from 2004/08/29 and that has sufficed to block everyone who is trying to spam my blog. 0 false positives on blog spam attempts, but I'm not using the same list to block on email.
Almost all of the attempts have been against the domains I had to add myself, not those already on the list. This suggests to me that there is a small number of blog spammers behind most of it. They buy a few domains, do a run, and when they start to get blocked buy a few more domains. It's very much a stop and start thing.
Thanks for the update and sharing some interesting results.
If anyone can point me to a tool to run a URIBL against a spam corpus, I'll report back results against my own personal spam collection. Or I could just post the updated list somewhere.
I assume most of the SpamAssassin folks use the built in mass check facility to do that, and there is a stats script to handle from the results of that, the reference for which I don't have handy.
Jeff C. -- "If it appears in hams, then don't list it."