Joe Wein has created a new version of the data which Raymond is putting into ws.surbl.org. The new version is 60% smaller and contains only domains that were added under an extensive and strict set of criteria which Joe devised.
As a result, the jw additions to ws are 60% smaller and probably a lot better in terms of false positives. The earlier cut of the data had included some domains which Joe had seeded from other RBLs and which therefore did not have the extensive checking which Joe's current process applies to new entries.
Joe used the seeding from other RBLs to initially populate his data, but the new data gathered purely with his own processing should be much better.
Bottom line is that the FP rate of WS should drop.
Does anyone have any current FP rates to share?
Jeff C.
Good afternoon, Jeff, all,
On Wed, 4 Aug 2004, Jeff Chan wrote:
Joe Wein has created a new version of the data which Raymond is putting into ws.surbl.org. The new version is 60% smaller and contains only domains that were added under an extensive and strict set of criteria which Joe devised.
I don't suppose this was done between 3:46 and 4:16 AM EST this morning? Cheers, - Bill
--------------------------------------------------------------------------- "I've discovered that using VMS is a lot like driving a nail with your head: sure, you eventually get something practical done, but it usually results in a headache and some blood loss." (Courtesy of Dmitri dmitri@users.sourceforge.net and Sean A. Simpson) -------------------------------------------------------------------------- William Stearns (wstearns@pobox.com). Mason, Buildkernel, freedups, p0f, rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org --------------------------------------------------------------------------
Hi Bill,
Joe Wein has created a new version of the data which Raymond is putting into ws.surbl.org. The new version is 60% smaller and contains only domains that were added under an extensive and strict set of criteria which Joe devised.
I don't suppose this was done between 3:46 and 4:16 AM EST this morning?
I think it was around that time i inserted the smaller list. Missing domains found :)
Bye, Raymond.
Hi SURBL-Discuss,
allow me to introduce myself. I've been programming for over two decades (e.g. DR DOS, Novell iFolder). Since 1993 I live in Japan. I joined the fight against spam a little over a year ago, when I started developing my own filter.
I soon ended up with a continuous stream of interesting data about spam, which I thought might be useful to others. Last October I started publishing domain names and WHOIS details for spam domains that my filter found.
In May I finally launched jwSpamSpy 1.0, my client-based filter, of which I might develop a server-based equivalent in the future.
The spam data comes out of about a dozen mailboxes / spamtraps with a total of about 600 spams a day, which yield about 70 new blacklisted domains per day on average (If I had more spamtraps I could harvest even more data). jwSpamSpy also harvests data from NDNs for spoofed-sender spam. If anybody gets a lot of bounces, I might be interested in that at some point :-)
The blacklist data is published several times a day. Since January I've been distributing deltas to my blacklist via a mailing list (one posting per day).
In addition to domain names, I harvest 419er addresses for the purpose of blocking 419 scam mails. I've collected some 6300 "419" emails in little over a year so far.
Raymond and Jeff recently invited me to provide my data feed for SURBL. I agreed because anything that helps people block more spam is worth supporting. There was just one catch: more than a year ago I had started the list by seeding it with other people's lists, which in hindsight was not a good idea, as I couldn't vouch for all entries and occasionally had to drop entries. Recent entries (from early December 2003 onward) use a quite rigorous protocol that has served me well. I have had but a single complaint about more than 6000 domains from that data set.
Regards
Joe Wein
Joe,
Raymond and Jeff recently invited me to provide my data feed for SURBL. I agreed because anything that helps people block more spam is worth supporting. There was just one catch: more than a year ago I had started the list by seeding it with other people's lists, which in hindsight was not a good idea, as I couldn't vouch for all entries and occasionally had to drop entries. Recent entries (from early December 2003 onward) use a quite rigorous protocol that has served me well. I have had but a single complaint about more than 6000 domains from that data set.
Welcome aboard, we are very happy that you want to share the dataset. Its been very usefull the last few days allready.
If people want to try check the projects Joe is working on, check:
http://www.joewein.de/sw/index.htm
Especialy his SpamSpy would be very helpfull for endusers.
Bye, Raymond.
On Thursday, August 5, 2004, 12:36:02 AM, Joe Wein wrote:
Raymond and Jeff recently invited me to provide my data feed for SURBL. I agreed because anything that helps people block more spam is worth supporting. There was just one catch: more than a year ago I had started the list by seeding it with other people's lists, which in hindsight was not a good idea, as I couldn't vouch for all entries and occasionally had to drop entries. Recent entries (from early December 2003 onward) use a quite rigorous protocol that has served me well. I have had but a single complaint about more than 6000 domains from that data set.
Hi Joe, Welcome to the SURBL community, and thanks very much for sharing your data with us. I agree that sharing information about spam URI domains can help the Internet community in general to fight spam. Thanks also for your personal introduction and the background information on the data being used with and coming from jwSpamSpy.
As you know, Raymond is feeding the data from your blacklist into the SURBL list: ws.surbl.org. I had announced the change earlier, but it may be worth repeating that we are now using only the more recent entries which you describe above and which pass your rigorous new screening protocol from December 2003 onward. So we're currently probably getting only the best jwSpamSpy data into ws.surbl.org and have certainly eliminated a few false positives in the process. (I'd like to still ask people to provide any false positives or FP stats they have for ws or any of the lists.)
Like you, at this point one of our main concerns is reducing false positives, and in that sense your processing of the data from traps could be considered a good model of how such checking should be done. You may want to consider publishing that process if you haven't already.
Cheers, and Thanks again,
Jeff C.
Joe Wein wrote:
If anybody gets a lot of bounces, I might be interested in that at some point :-)
LOL. I could offer about 1000 per day. But that's supposed to drop as soon as SpamAssassin supports SPF, and the spammers get the idea. Bye, Frank