RE: Applying SURBL against blog comment spammers - Discuss

3 Sep 2004

      ...
-----Original Message-----
From: Jeff Chan [mailto:jeffc@surbl.org]
Sent: Thursday, September 02, 2004 9:04 PM
To: spamassassin-users@incubator.apache.org
Cc: SURBL Discuss
Subject: Re: Applying SURBL against blog comment spammers
On Thursday, September 2, 2004, 5:43:26 PM, Loren Wilton wrote:
...
...
Given the lack of commonality, it may not make much sense to
add to the mail spam lists, since it would be an extra 2000+
records that would probably not get hits on mail.
The MT-Blacklist doesn't seem to update too frequently (the
last new record was from 8/29) and has about 2000 records.
Matthew's list was pretty sparse so far.  So I'm still
pondering things.
...
Just from a technical/philosophical point, I think a separate list is
desirable.  Although I agree that making it part of multi
would probably be
...
the way to go, and I agree with the basic concept that "spam
is spam".
...
However, I think the reasons for a separate list are:
...

Separate source feed.  A new list allows the source

feed to be more
...
easily documented.
2.    (As stated) little overlap with email spammers, at
least so far.
...

Probably a different update cycle and removal (from

old age) cycle
...
requirement
...
The different means of updating and possibly different aging
method are high
...
on my list of reasons for suggesting a separate list.  On
the other hand,
...
having it part of multi would be nice, since (I assume, possibly
incorrectly) that one query could check a lot of lists based
on the bitmap.
Correct.  I'm still wavering if a blog spam list should be part
of multi.  There are programs that use multi but (unadvisedly)
don't differentiate between the source lists.  That kind of
argues for keeping multi focussed on only mail spam and making
a blog spam list separate.  On the other hand there's much less
overhead in adding a list internally to multi than setting up
a whole new list.
...
It probably would also be good to devote some thought to how
entries will be
...
added to this list and validated.  We surely don't want some
annoyed blog
...
spammer spamming the list with every valid doamin they can find!
Yes, data quality is always an issue.  Any of these ventures will
struggle if spammers are able to poison the data.  Keeping
legitimate domains out of any feed is key and provisions would
need to be made for that.
As always I agree. I think any new idea should be kept out of mutli until
more testing is done.
Having said that, if a blog spam matches all the requirments we would use in
a SURBL entry now, then why not list? And why not list in the regular WS
list? I'm saying I would only add hand checked domains like those I found in
JM's example. Preemptive listings. Again, only domains that are obvious,
like the examples I listed. If there is any question of a blog spam domain
being used for legit, then it follows the very same rules we have now. Don't
blacklist. Add to unclassified ;)
--Chris