[SURBL-Discuss] Whitelist Please

Justin Mason jm at jmason.org
Thu Sep 9 05:02:33 CEST 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Jeff Chan writes:
> On Wednesday, September 8, 2004, 7:13:36 AM, Frank Ellermann wrote:
> > Jeff Chan wrote:
>  
> >> there must be some form of feedback or error correction,
> >> or other strategies to deal with misclassifications.
> 
> >> Whitelisting is one strategy.
> 
> > ACK, but where and as far as possible I'd prefer a technical
> > definition like the "BI" (Breidbarth Index) in Usenet.
> 
> Here's a definition (note there is no H in the name):
> 
>   http://www.stopspam.org/usenet/mmf/breidbart.html
> 
> "The BI is a measure of how spammy a spammed news article is. It
> is the sum of the square root of the number of groups each copy
> of a spam article is posted to. So if you post 10 copies of an
> article, each cross-posted to 4 groups, the BI is 20. Other ways
> of reaching the BI=20 mark (a threshhold used by some cancellers)
> is to post 20 copies, each to just one group, 4 copies to 25
> groups each, or 8 articles to 6 groups each and one more to just
> one group. (for BI=20.6)"
> 
> It's interesting, but probably does not apply in the mail
> spam area directly.  I suppose we could say how often does
> a domain appear on multiple SURBLs, but some of the SURBL
> data feeds are unitary, i.e. we can't see how many reports
> went into the listing, only whether a domain is listed or not.
> 
> This sort of idea could perhaps be useful for categorizing
> spamtrap data however, especially across multiple spamtraps.
> 
> But I think your complaint is that there's no objective
> criteria for whitelisting.   That's fair, but there always
> must be some subjective judgement applied, especially when
> we can't see the entire universe of mail spam in the same
> way that the entire universe of Usenet spam *is easily
> visible*.
> 
> It's also definitely not the case that we can see the entire
> mail ham universe, so there really can't be a generally knowable
> measure of the spam/ham ratio of a given domain.
> 
> This is somewhat a question of philosophy and science: to
> know what is knowable and what is not, i.e. epistemology.
> 
> Since spammyness versus legitimacy is not easily measured
> purely objectively, we must reserve the right to make
> judgements.
> 
> If you have a BI or something similar for *mail* spam, then
> please share it.

As a matter of interest -- and I should just ask Seth Breidbart ;) -- does
this deal with hashbusters?   ie. if a message is 80% hashbuster strings,
and 20% payload, it's not so easy to automate BI calculation.  (cf. dcc,
Pyzor, Razor, AOL's paper at CEAS, et al.)

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFBP8fIQTcbUG5Y7woRAoXsAKC3A/Q3SOdc+Ektl33sTN0f6MzIIgCgj8Ub
7CVIu93lSfLreyrwQ1++OPo=
=oTky
-----END PGP SIGNATURE-----



More information about the Discuss mailing list