We've changed the list settings slightly to direct replies to
this announcement list to go to the discussion list, and for
replies on the discussion list to go to the list by default
rather than to the poster.
discuss(a)lists.surbl.org
Hope that's ok with everyone. Please reply off list if it's
not. :-)
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
We're made a document describing some of the general properties
which code using SURBLs should have in order to use the data as
it was designed and intended. We hope these comments may be
useful to developers. Our Implementation Guidelines are brief
and copied below.
http://www.surbl.org/implementation.html
Implementation Guidelines
Here are some very brief guidelines for folks writing software to
use SURBL lists. Your code should:
1. Extract URIs from message bodies. (Extraction of URIs from
message bodies should ideally include full resolution of
redirections into the final target domain name. This can be a
non-trivial problem.)
2. Extract base (registrar) domains from those URIs. This
includes removing any and all leading host names, subdomains,
www., randomized subdomains, etc. In order to determine the base
domain it may be necessary to use a table of country code TLDs
(ccTLDs) such as the partially-imcomplete one SURBL uses.
3. Not do name resolution on the domains.
4. Look up the domain name in the SURBL by prepending it to
the name of the SURBL, e.g., domainundertest.com.sc.surbl.org,
then doing Address record DNS resolution on the resulting
combined name. A non-result indicates lack of inclusion in the
list. A result of 127.0.0.2 represents inclusion, i.e., probable
spam.
5. Handle numeric IPs in URIs similarly, but reverse the octet
ordering before comparison against the RBL. This is standard
practice for RBLs. For example, http://1.2.3.4/ is checked as
4.3.2.1.sc.surbl.org.
SURBL lists unusually have both names and numbers in the same
list. For example, 2.0.0.127 and test.surbl.org and similar
actual spam domains and addresses are both in all SURBL lists.
Numbered addresses in SURBLs should have occurred in spams as
numbers, e.g.: literally http://1.2.3.4/. Additional SURBL test
points are mentioned in the News & Notes section.
__
Please send me any comments, updates, revisions, corrections,
questions, etc...
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
I've udpated the SURBL web site to use frames and have freshened
the content slightly. Please let me know if you spot any broken
links, etc.
http://www.surbl.org/
Also added "An Open Letter To Operators Of Redirection Sites"
in which we try to appeal to redirection sites to deny their
services to spam URI domains (e.g., spammers' web sites).
Redirection sites may become an increasing problem if we're
successful in blocking spams with their sites directly linked.
http://www.surbl.org/redirect.html
Comments, revisions, questions, suggestions on any of that are
welcomed.
Cheers,
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
I probably should have introduced this second SURBL list
that can be used together with or in place of sc.surbl.org
before mentioning that its name was changing from sa.surbl.org
to ws.surbl.org. :-) Note that the two lists have different
data sources, so strictly speaking one is not a replacement for
the other. They're two different lists. sc uses URI domains
from SpamCop reports. The data source for ws data is described
below. Both lists have merits and we'd encourage you to consider
trying both.
Here's an announcement with the additional update that
we've changed the *sample rule names* for the ws list to use
"WS" instead of "SA":
__
http://www.surbl.org/ (with some live links)
More SURBL lists
In addition to the first SpamCop URI-derived SURBL sc.surbl.org, we
are pleased to host another RBL compatible with the SpamCopURI or
URIDNSBL SpamAssassin plugins, or any other software that can
check message body domains against a name-based RBL. Data for the
second SURBL ws.surbl.org comes from the domains in Bill Stearns'
SpamAssassin blacklist: sa-blacklist. This is a large list of
spam domains, including those found in spam message body URIs.
Both ws.surbl.org and sc.surbl.org SURBLs can be used in the same
SA installation by using two sets of rules.
An SA 2.63 rule and score using SpamCopURI (but not the SpamCop
data!) looks like this:
uri WS_URI_RBL eval:check_spamcop_uri_rbl('ws.surbl.org','127.0.0.2')
describe WS_URI_RBL URI's domain appears in spamcop database at ws.surbl.org
tflags WS_URI_RBL net
score WS_URI_RBL 3.0
An SA 3.0 rule and score using URIBL's urirhsbl looks like this:
urirhsbl URIBL_WS_SURBL ws.surbl.org. A
header URIBL_WS_SURBL eval:check_uridnsbl('URIBL_WS_SURBL')
describe URIBL_WS_SURBL Contains a URL listed in the WS SURBL blocklist
tflags URIBL_WS_SURBL net
score URIBL_WS_SURBL 3.0
More details about ws.surbl.org are available in the section
"Additional SURBLs for spam URI testing" (copied below).
Please note that the name of this list is being changed from
sa.surbl.org to ws.surbl.org. If you were using the old name in
your rules please update them to the new name.
...
Additional SURBLs for spam URI testing
Additional SURBLs that list domains occurring in spam message
bodies may be used with the same routines that use the
sc.surbl.org RBL.
sa-blacklist available as RBL: ws.surbl.org
In cooperation with Bill Stearns, SURBL is making his
sa-blacklist SpamAssassin blacklist available as the RBL
ws.surbl.org. It can be used in the same way as sc.surbl.org, for
example by adding urirhsbl and SpamCopURI rules as described in
the Quick Start section at the top of this document. Like sc,
ws.surbl.org is available through DNS and, for large-volume mail
servers, as rsynced BIND and rbldns zone files. Raymond
Dijkxhoorn has graciously agreed to host the ws.surbl.org zone
files from his rsync server along with sc.surbl.org's. Please
contact him at rsync(a)surbl.org for rsync access.
Both sc and ws RBLs can be used in the same installation. The
choice of using either or both or none is yours. Their data
differs somewhat, and we'll try to briefly describe and link some
of the differences here. Bill's list is rather large at about
9600 domains. It consists of domains found in spam message body
URIs and some spam sender and spam operator domains. Given that
the former are more relevant to isolate these days, most of the
recent additions to Bill's list have been URI domains. Those are
also the domains most relevant for use with the message body
checking approach which we propose throughout this site.
The data in sa-blacklist and therefore ws.surbl.org differ from
the SpamCop URI report data described above in that the list is
about ten times larger, more stable, and may have a slightly
higher false positive rate. Bill's policy for inclusion and
cleaning of the sa-blacklist is quite sound, however, so folks
should feel comfortable giving this list a try in addition to the
sc list. ws may currently detect some spam that sc misses, and
vice versa, but it's worth mentioning that the current sc is a
working prototype and that we expect the performance of sc to
improve as we tune the sc data engine further. sc just got out of
the gate, yet it already has some worthy competition in ws.
Thanks Bill!
Because ws is larger and more stable, the zone files for it gets
a six hour TTL compared to 10 minutes for sc. Due to the
differences between the time scales, sizes, and data sources of
ws and sc, we probably won't be offering a combined ws plus sc
list. For example it would be difficult to say what TTL a merged
list should get, and you probably would not want a megabyte plus
BIND zone file refreshing every 10 minutes. For those using
rsynced zone files that would probably not be an issue, but for
those using BIND, the DNS traffic quite well could be.
We encourage you to give ws.surbl.org a try.
Please note that the name of this list is being changed from
sa.surbl.org to ws.surbl.org. If you were using the old name in
your rules please update them to the new name.
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
Given the probable need to improve whitelisting, I've added a
log of domains that would go onto sc.surbl.org but are then
prevented from getting onto the list by the whitelist(s):
http://www.surbl.org/whitelist-hits.new.log
That goes along with the log of new additions to sc.surbl.org,
i.e., essentially a blacklisting log:
http://www.surbl.org/top-sites-domains.new.log
I've also grabbed copy of 500 popular web site domains for
addition to the whitelist. A couple of the recent whitelist hits
have been from it. So far they seem reasonable.
Whitelisting will continue in the next version of the engine,
hopefully with some larger data sets.
Blacklisting based on SpamCop URI domain data will hopefully
be more stable and broader in the next version also. In other
words, there should be significantly less activity on the
blacklist log since the list itself will be more stable.
(For example under the current system you may see some domains
that come off the list then get back on it.... Pay no attention
to the man behind the curtain... :-) There should be a lot less
of that.)
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
A question about whether SpamCopURI would support using the
alternative SURBL ws.surbl.org came up, so I thought I'd address
that for everyone. Any program that knows how to extract URIs
from message bodies, then domains from the URIs, then compare
those domains against an RBL can use any or all of the SURBL
lists. Therefore SpamCopURI will work with ws.surbl.org just
fine. (Noting of course that the ws results won't necessarily
be related to the SpamCop-derived data in the sc list.)
All you need to do is add a rule with the name of that list:
uri SA_URI_RBL eval:check_spamcop_uri_rbl('ws.surbl.org','127.0.0.2')
describe SA_URI_RBL URI's domain appears in spamcop database at ws.surbl.org
tflags SA_URI_RBL net
score SA_URI_RBL 3.0
(Likewise in SpamAssassin 3.0 with urirhsbl:)
urirhsbl URIBL_SA_SURBL ws.surbl.org. A
header URIBL_SA_SURBL eval:check_uridnsbl('URIBL_SA_SURBL')
describe URIBL_SA_SURBL Contains a URL listed in the SA SURBL blocklist
tflags URIBL_SA_SURBL net
score URIBL_SA_SURBL 3.0
You can run either SURBL or both if you like. Note that
ws has a higher spam detection rate (currently) but also
a somewhat higher false positive rate than sc. Here's
a corpus check Dan Quinlan ran:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
11189 1200 9989 0.107 0.00 0.00 (all messages)
100.000 10.7248 89.2752 0.107 0.00 0.00 (all messages as %)
6.095 56.2500 0.0701 0.999 1.00 1.00 URIBL_SC_SURBL
6.855 59.7500 0.5006 0.992 0.98 1.00 URIBL_SBL
9.545 72.8333 1.9421 0.974 0.95 0.01 T_URIBL_SA_SURBL
0.116 0.5000 0.0701 0.877 0.58 0.01 T_URIBL_DSBL
SA_SURBL above reflects the old name for ws; SC_SURBL is
sc.surbl.org. ws detected ~73% of spams in the spam corpus
with a ~1.9% FP rate in the ham corups. sc detected ~56%
with a <0.1% FP rate.
We're still tuning how the SpamCop data is used, so the sc
hit rates should improve and FPs decrease hopefully in the
next version of the sc data engine.
Cheers,
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.jeffchan.com/
Hello SURBL users,
Please note that the name of the SURBL derived from Bill Stearns'
sa-blacklist is being changed from sa.surbl.org to ws.surbl.org .
If you were using the old name in your rules or configs please
update them to the new name.
We will keep DNS queries up on the old name for a week or so but
will probably drop them after that. This is only a name change
for that list. Functionality should remain the same.
Cheers,
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.jeffchan.com/
Devin Carraway has written a plugin for the Perl-based MTA qpsmtpd
to compare domains from message body URIs to SURBL domain
lists. Here's his announcement of what I believe is the first
MTA use of SURBL. Congrats and thanks to Devin!
__
Date: Tue, 13 Apr 2004 02:07:15 -0700
From: Devin Carraway <qpsmtpd(a)devin.com>
Subject: qpsmtpd plugin
Saw today's slashdot article on SURBL -- glad to see someone's taken up
the idea. I had thought of something similar, but somehow hadn't
connected it with "oh yeah, they're already hostnames, make a DNSBL out
of it."
You commented that it'd be nice to see support for it in MTAs, so I
wrote a plugin for qpsmtpd to do it. Qpsmtpd, if you haven't
encountered it, is a replacement smtpd for qmail and postfix, with a
primary emphasis on detecting and declining spam during the initial SMTP
transaction.
http://www.nntp.perl.org/group/perl.qpsmtpd/1216http://devin.com/qpsmtpd/uribl
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/
SpamCop's Spamvertised sites page is up but not currently
serving data. I've take this opportunity to make sure that
the SURBL engine does the right thing when there's no new data
coming in. When that happens the sc.surbl.org list stays
unchanged except for domains that may come off the list due
to expiration of old reports.
Once the data feed is up again, sc.surbl.org should pick up
where it left off and things should continue to operate normally.
As an aside, the next version of the data engine will have a much
longer memory, especially of spam domains and IP addresses so
there won't be nearly as much churn in the domains. There will
also be more domains on the list.
Cheers,
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/