On Thursday, July 1, 2004, 11:47:37 AM, Don Newcomer wrote:
Hat's off to the folks who brought out the new SURBL checks! Here's my top 15 rule hits over the past 20 hours and look where my 4 URIBLs come in:
18832 - HTML_MESSAGE (0.100) - 50_scores.cf 10296 - BAYES_99 (5.400) - 50_scores.cf 9810 - OB_URI_RBL (4.0) - surbl.cf 9403 - MIME_HTML_ONLY (0.320) - 50_scores.cf 8367 - WS_URI_RBL (3.0) - surbl.cf 7922 - CLICK_BELOW (0.100) - 50_scores.cf 5220 - HTML_LINK_CLICK_HERE (0.100) - 50_scores.cf 5102 - SPAMCOP_URI_RBL (3.0) - surbl.cf 4470 - MIME_MISSING_BOUNDARY (1.838) - 50_scores.cf 4401 - MY_SHRT_IMG (0.848) - coding_html.cf 4285 - MK_BAD_HTML_05 (0.3) - coding_html.cf 4118 - NO_REAL_NAME (0.160) - 50_scores.cf 4111 - AB_URI_RBL (5.0) - surbl.cf 3887 - SARE_FROM_SPAM_WORD3 (0.100) - 70_sare_header.cf 3678 - MIME_HTML_NO_CHARSET (0.561) - 50_scores.cf
Thanks much for the data and the compliments Don! I'm forwarding your results to the SURBL discussion list.
It's interesting to see how well ob is detecting spams. My hat is off in thanks to the OutBlaze folks for providing the data.
Still looking for anyone's spam detection rates and false positive rates with all the lists:
sc.surbl.org - SpamCop spamvertised sites ws.surbl.org - sa-blacklist, BigEvil and other data ob.surbl.org - OutBlaze spamvertised sites ab.surbl.org - AbuseButler spamvertised sites
ds.surbl.org (beta, 6dos data)
Jeff C.
Here are my counts since 6:50 PM yesterday for all URI_RBL rules sorted by spam and ham:
URI_RBL spam counts:
3577 - AB_URI_RBL (5.0) - surbl.cf 2499 - DS_URI_RBL (0.33) - surbl.cf 7282 - OB_URI_RBL (4.0) - surbl.cf 4279 - SPAMCOP_URI_RBL (3.0) - surbl.cf 5458 - WS_URI_RBL (3.0) - surbl.cf
URI_RBL ham counts:
231 - DS_URI_RBL (0.33) - surbl.cf 18 - OB_URI_RBL (4.0) - surbl.cf 1 - SPAMCOP_URI_RBL (3.0) - surbl.cf 29 - WS_URI_RBL (3.0) - surbl.cf
Interesting that AB_URI_RBL has no false positives yet... Still, we haven't released spam filtering to our users yet so my Bayes training is based pretty much on all of the SA rulesets' interpretation of spam (which isn't necessarily a bad thing).
Don Newcomer Senior Manager, Systems Infrastructure Systems Department Library and Information Services Dickinson College P.O. Box 1773 Carlisle, PA 17013 717-245-1256 (Voice) 717-245-1690 (FAX) newcomer@dickinson.edu
On Thu, 1 Jul 2004, Jeff Chan wrote:
Still looking for anyone's spam detection rates and false positive rates with all the lists:
sc.surbl.org - SpamCop spamvertised sites ws.surbl.org - sa-blacklist, BigEvil and other data ob.surbl.org - OutBlaze spamvertised sites ab.surbl.org - AbuseButler spamvertised sites
ds.surbl.org (beta, 6dos data)
Jeff C.
Jeff Chan mailto:jeffc@surbl.org http://www.surbl.org/
Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss
Don Newcomer wrote:
Here are my counts since 6:50 PM yesterday for all URI_RBL rules sorted by spam and ham:
URI_RBL spam counts:
3577 - AB_URI_RBL (5.0) - surbl.cf 2499 - DS_URI_RBL (0.33) - surbl.cf 7282 - OB_URI_RBL (4.0) - surbl.cf 4279 - SPAMCOP_URI_RBL (3.0) - surbl.cf 5458 - WS_URI_RBL (3.0) - surbl.cf
How do you collect the stats like this? I want to do this too :)
/ Martin
It's just a shell script using awk, grep, sed, and sort. Not a big deal really.
Don Newcomer Senior Manager, Systems Infrastructure Systems Department Library and Information Services Dickinson College P.O. Box 1773 Carlisle, PA 17013 717-245-1256 (Voice) 717-245-1690 (FAX) newcomer@dickinson.edu
On Fri, 2 Jul 2004, Martin wrote:
Don Newcomer wrote:
Here are my counts since 6:50 PM yesterday for all URI_RBL rules sorted by spam and ham:
URI_RBL spam counts:
3577 - AB_URI_RBL (5.0) - surbl.cf 2499 - DS_URI_RBL (0.33) - surbl.cf 7282 - OB_URI_RBL (4.0) - surbl.cf 4279 - SPAMCOP_URI_RBL (3.0) - surbl.cf 5458 - WS_URI_RBL (3.0) - surbl.cf
How do you collect the stats like this? I want to do this too :)
/ Martin
Hi Don,
It's just a shell script using awk, grep, sed, and sort. Not a big deal really.
Here are my counts since 6:50 PM yesterday for all URI_RBL rules sorted by spam and ham:
URI_RBL spam counts:
How do you collect the stats like this? I want to do this too :)
Why not post it ?
Bye, Raymond.
On Friday, July 2, 2004, 6:06:16 AM, Don Newcomer wrote:
Here are my counts since 6:50 PM yesterday for all URI_RBL rules sorted by spam and ham:
URI_RBL spam counts:
3577 - AB_URI_RBL (5.0) - surbl.cf 2499 - DS_URI_RBL (0.33) - surbl.cf 7282 - OB_URI_RBL (4.0) - surbl.cf 4279 - SPAMCOP_URI_RBL (3.0) - surbl.cf 5458 - WS_URI_RBL (3.0) - surbl.cf
URI_RBL ham counts:
231 - DS_URI_RBL (0.33) - surbl.cf 18 - OB_URI_RBL (4.0) - surbl.cf 1 - SPAMCOP_URI_RBL (3.0) - surbl.cf 29 - WS_URI_RBL (3.0) - surbl.cf
Interesting that AB_URI_RBL has no false positives yet... Still, we haven't released spam filtering to our users yet so my Bayes training is based pretty much on all of the SA rulesets' interpretation of spam (which isn't necessarily a bad thing).
Thanks much for the data Don, particularly the false positive hits. Does anyone else have any to share? If so please post them here.
ab.surbl.org is based on SpamCop data plus some manual reports, as is sc.surbl.org, but ab has a different inclusion criteria of taking the top 500 most often reported (less www. duplicates and whitelists hits) over 7 days, whereas sc has an arbitrary inclusion threshold of 10 reports over 4 days. 1 FP for sc is pretty good, though zero is better. :-)
ob is pretty impressive in terms of hit rate and relatively low FP rate, at least as a percentage of hits.
Note that ds.surbl.org (based on 6dos data) is now up on 5 name servers so it may be ok to use on production servers for beta testing.
Please note that I probably won't be able to check email for about a week so hopefully others will help answer SURBL questions, etc.
Cheers,
Jeff C.