[SURBL-Discuss] RE: Applying SURBL against blog comment spammers

Chris Santerre csanterre at merchantsoverseas.com
Fri Sep 3 12:10:40 CEST 2004

>-----Original Message-----
>From: Jeff Chan [mailto:jeffc at surbl.org]
>Sent: Thursday, September 02, 2004 9:04 PM
>To: spamassassin-users at incubator.apache.org
>Cc: SURBL Discuss
>Subject: Re: Applying SURBL against blog comment spammers
>On Thursday, September 2, 2004, 5:43:26 PM, Loren Wilton wrote:
>>> Given the lack of commonality, it may not make much sense to
>>> add to the mail spam lists, since it would be an extra 2000+
>>> records that would probably not get hits on mail.
>>> The MT-Blacklist doesn't seem to update too frequently (the
>>> last new record was from 8/29) and has about 2000 records.
>>> Matthew's list was pretty sparse so far.  So I'm still
>>> pondering things.
>> Just from a technical/philosophical point, I think a separate list is
>> desirable.  Although I agree that making it part of multi 
>would probably be
>> the way to go, and I agree with the basic concept that "spam 
>is spam".
>> However, I think the reasons for a separate list are:
>> 1.    Separate source feed.  A new list allows the source 
>feed to be more
>> easily documented.
>> 2.    (As stated) little overlap with email spammers, at 
>least so far.
>> 3.    Probably a different update cycle and removal (from 
>old age) cycle
>> requirement
>> The different means of updating and possibly different aging 
>method are high
>> on my list of reasons for suggesting a separate list.  On 
>the other hand,
>> having it part of multi would be nice, since (I assume, possibly
>> incorrectly) that one query could check a lot of lists based 
>on the bitmap.
>Correct.  I'm still wavering if a blog spam list should be part
>of multi.  There are programs that use multi but (unadvisedly)
>don't differentiate between the source lists.  That kind of
>argues for keeping multi focussed on only mail spam and making
>a blog spam list separate.  On the other hand there's much less
>overhead in adding a list internally to multi than setting up
>a whole new list.
>> It probably would also be good to devote some thought to how 
>entries will be
>> added to this list and validated.  We surely don't want some 
>annoyed blog
>> spammer spamming the list with every valid doamin they can find!
>Yes, data quality is always an issue.  Any of these ventures will
>struggle if spammers are able to poison the data.  Keeping
>legitimate domains out of any feed is key and provisions would
>need to be made for that.

As always I agree. I think any new idea should be kept out of mutli until
more testing is done. 

Having said that, if a blog spam matches all the requirments we would use in
a SURBL entry now, then why not list? And why not list in the regular WS
list? I'm saying I would only add hand checked domains like those I found in
JM's example. Preemptive listings. Again, only domains that are obvious,
like the examples I listed. If there is any question of a blog spam domain
being used for legit, then it follows the very same rules we have now. Don't
blacklist. Add to unclassified ;)


