Re: [SURBL-Discuss] Scheduled, Distributed Email Stream Monitoring

11 Feb 2005


      On Thursday, February 10, 2005, 7:46:23 AM, John Delisle wrote:
...
Couldn't this be automated using other spam detection techniques?  IE 
spamassassin detects 100% spam, URL not in SURBL.  Spamassassin sends the 
email to a central repository and any URLs are parsed and added to SURBL.
It's very difficult to fully automate spam detection because not
everyone agrees on what constitutes a spam.  Certainly in some
borderline cases, one person's spam may be another person's ham.
As a global list, we need to be very conservative about adding
records so as not to create false positives (FPs).  For that reason,
we seek to add only hosts that are pretty much universally agreed
to be spam.
Hopefully it's somewhat clear from the website that we have
different sources of data:
http://www.surbl.org/lists.html
JP and OB are based on large spam traps.  They're mostly
automated but with some specific techniques for keeping out
false positives.  Outblaze for example only adds domains
that are registered within the last 6 months and which are
spewing a lot of spam recently. JP has an elaborate system
for weeding out FPs before they get onto their list.
SC and AB are based on SpamCop reports.  Both have inclusion
thresholds so that only the most commonly reported spams get
added.   SpamCop reports have already been hand checked,
though the quality of the checking and reporting varies,
so in a sense they're multiply filtered before they get
published as SC and AB.  (I'm redoing the way the SC data
is handled in a way that should be even better if I ever
get around to doing it.)
WS is a manual list, meaning most of the entries are added by
hand and human checked.
All the lists have FPs, some more than others.  FPs are what
prevent SURBLs from being used say at the MTA level in a
telco, and it would be nice to eliminate FPs entirely.
It's bad to have someone's ham marked as spam.
So yes, some parts of data collection can be automated, but
quite a bit more engineering and thought needs to go
Jeff C.
--
"If it appears in hams, then don't list it."

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [SURBL-Discuss] Scheduled, Distributed Email Stream Monitoring