On Mon, Oct 25, 2004 at 04:24:10PM -0700, Jeff Chan jeffc@surbl.org wrote:
On Monday, October 25, 2004, 4:10:16 PM, Jeff Chan wrote:
it's good to test on more system and more than one set of mail to better find potential problems.
more than one systems.... Anyway did we find any other public blog spam data besides jayallen? IIRC Matthew Hunter was staring to collect some blog spam data. Matthew, how is that going?
Fairly well by my primary measure, which is stopping spam comments and trackbacks on my blog. I've learned that the number of domains actually being used for this sort of thing is very small compared to email spam. I've added the 24 domains of my own to the MT-blacklist list from 2004/08/29 and that has sufficed to block everyone who is trying to spam my blog. 0 false positives on blog spam attempts, but I'm not using the same list to block on email.
Almost all of the attempts have been against the domains I had to add myself, not those already on the list. This suggests to me that there is a small number of blog spammers behind most of it. They buy a few domains, do a run, and when they start to get blocked buy a few more domains. It's very much a stop and start thing.
If anyone can point me to a tool to run a URIBL against a spam corpus, I'll report back results against my own personal spam collection. Or I could just post the updated list somewhere.