A great source you probably already know about, Wiki URL blacklists, especially the ones edited on a Wiki:
http://moinmaster.wikiwikiweb.de/BadContent http://www.emacswiki.org/cgi-bin/wiki?BannedContent
Not quite sure how these are edited:
http://spammers.chongqed.org/ or http://blacklist.chongqed.org/ http://www.jayallen.org/blacklist.txt
Open source web proxy filter ... maybe willing to share their URL lists? Of course, this is not a spammer list as far as I know, but perhaps it can be used to amplify and verify SURBL whitelist (to eliminate things) or blacklists (to cross-check an addition).
Huge list of URL blacklists:
http://spamlinks.openrbl.org/filter-bl.htm
Worth trying something more elaborate?
Provide the Wiki folks with a better infrastructure for banning URLs used in Wiki spam (which I'm fairly confident will correlate well with email spam).
1. Get multiple wikis to use a standard format for bad content lists, feed into a SURBL-based Wiki blacklist.
2. All SURBL blacklists can be used on supporting Wikis.
So, SURBL gets a new blacklist (the best kind, one fed with a different type of source), Wikis get a much wider blacklist, etc.
Daniel
Apparently, this is now just a copy of mt-blacklist:
http://www.jayallen.org/comment_spam/
It looks a little stagnant, but my previous cooperation suggestion stands. :-)
Daniel
On Sunday, December 5, 2004, 3:52:31 PM, Daniel Quinlan wrote:
A great source you probably already know about, Wiki URL blacklists, especially the ones edited on a Wiki:
http://moinmaster.wikiwikiweb.de/BadContent http://www.emacswiki.org/cgi-bin/wiki?BannedContent
Not quite sure how these are edited:
http://spammers.chongqed.org/ or http://blacklist.chongqed.org/ http://www.jayallen.org/blacklist.txt
Open source web proxy filter ... maybe willing to share their URL lists? Of course, this is not a spammer list as far as I know, but perhaps it can be used to amplify and verify SURBL whitelist (to eliminate things) or blacklists (to cross-check an addition).
Huge list of URL blacklists:
Worth trying something more elaborate?
Provide the Wiki folks with a better infrastructure for banning URLs used in Wiki spam (which I'm fairly confident will correlate well with email spam).
- Get multiple wikis to use a standard format for bad content lists, feed into a SURBL-based Wiki blacklist.
- All SURBL blacklists can be used on supporting Wikis.
So, SURBL gets a new blacklist (the best kind, one fed with a different type of source), Wikis get a much wider blacklist, etc.
I think new data sources can help the SURBL project if:
1. They have spam URI domains. Some of the wiki or block blacklists may not actually come from spams.
2. They are updated pretty frequently, preferrably several times a day at least.
3. They have false positive rates at least as low as WS.
With theese in mind, would anyone like to help us research some of these other possible sources that Daniel brings up? Multiple opinions could be useful.
Jeff C. -- "If it appears in hams, then don't list it."
On Monday, December 6, 2004, 10:12:43 PM, Jeff Chan wrote:
On Sunday, December 5, 2004, 3:52:31 PM, Daniel Quinlan wrote:
A great source you probably already know about, Wiki URL blacklists, especially the ones edited on a Wiki:
Apparently wiki abuse.
Wiki abuse based on about 1.1k jayallen plus a 2000 more updates.
Not quite sure how these are edited:
http://spammers.chongqed.org/ or http://blacklist.chongqed.org/
Blog and wiki abuse.
Blog abuse.
Open source web proxy filter ... maybe willing to share their URL lists? Of course, this is not a spammer list as far as I know, but perhaps it can be used to amplify and verify SURBL whitelist (to eliminate things) or blacklists (to cross-check an addition).
GPLed Linux software for content filtering; no specific spam filtering as far as I can tell. Charges money for blocklist updates, but probably can be extracted from 150 mb program to get daily updates of filter data. Unclear if it's worth pursuing.
I think new data sources can help the SURBL project if:
- They have spam URI domains. Some of the wiki or
block blacklists may not actually come from spams.
These all pretty much fail my first criteria, that they should be about email spam, but then Daniel said they were mostly wiki and blog data....
While there may be some overlap between wiki and blog versus email spam, the last time I checked the jayallen data there wasn't much overlap and there were probably too many FPs for our purposes.
I'd rather concentrate on sources of spam-specific URI data.
Jeff C. -- "If it appears in hams, then don't list it."