-----Original Message----- From: Jeff Chan [mailto:jeffc@surbl.org] Sent: Thursday, September 02, 2004 9:04 PM To: spamassassin-users@incubator.apache.org Cc: SURBL Discuss Subject: Re: Applying SURBL against blog comment spammers
On Thursday, September 2, 2004, 5:43:26 PM, Loren Wilton wrote:
Given the lack of commonality, it may not make much sense to add to the mail spam lists, since it would be an extra 2000+ records that would probably not get hits on mail.
The MT-Blacklist doesn't seem to update too frequently (the last new record was from 8/29) and has about 2000 records. Matthew's list was pretty sparse so far. So I'm still pondering things.
Just from a technical/philosophical point, I think a separate list is desirable. Although I agree that making it part of multi
would probably be
the way to go, and I agree with the basic concept that "spam
is spam".
However, I think the reasons for a separate list are:
- Separate source feed. A new list allows the source
feed to be more
easily documented. 2. (As stated) little overlap with email spammers, at
least so far.
- Probably a different update cycle and removal (from
old age) cycle
requirement
The different means of updating and possibly different aging
method are high
on my list of reasons for suggesting a separate list. On
the other hand,
having it part of multi would be nice, since (I assume, possibly incorrectly) that one query could check a lot of lists based
on the bitmap.
Correct. I'm still wavering if a blog spam list should be part of multi. There are programs that use multi but (unadvisedly) don't differentiate between the source lists. That kind of argues for keeping multi focussed on only mail spam and making a blog spam list separate. On the other hand there's much less overhead in adding a list internally to multi than setting up a whole new list.
It probably would also be good to devote some thought to how
entries will be
added to this list and validated. We surely don't want some
annoyed blog
spammer spamming the list with every valid doamin they can find!
Yes, data quality is always an issue. Any of these ventures will struggle if spammers are able to poison the data. Keeping legitimate domains out of any feed is key and provisions would need to be made for that.
As always I agree. I think any new idea should be kept out of mutli until more testing is done.
Having said that, if a blog spam matches all the requirments we would use in a SURBL entry now, then why not list? And why not list in the regular WS list? I'm saying I would only add hand checked domains like those I found in JM's example. Preemptive listings. Again, only domains that are obvious, like the examples I listed. If there is any question of a blog spam domain being used for legit, then it follows the very same rules we have now. Don't blacklist. Add to unclassified ;)
--Chris