[SURBL-Discuss] Whitelist Please
Justin Mason
jm at jmason.org
Thu Sep 9 05:02:33 CEST 2004
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jeff Chan writes:
> On Wednesday, September 8, 2004, 7:13:36 AM, Frank Ellermann wrote:
> > Jeff Chan wrote:
>
> >> there must be some form of feedback or error correction,
> >> or other strategies to deal with misclassifications.
>
> >> Whitelisting is one strategy.
>
> > ACK, but where and as far as possible I'd prefer a technical
> > definition like the "BI" (Breidbarth Index) in Usenet.
>
> Here's a definition (note there is no H in the name):
>
> http://www.stopspam.org/usenet/mmf/breidbart.html
>
> "The BI is a measure of how spammy a spammed news article is. It
> is the sum of the square root of the number of groups each copy
> of a spam article is posted to. So if you post 10 copies of an
> article, each cross-posted to 4 groups, the BI is 20. Other ways
> of reaching the BI=20 mark (a threshhold used by some cancellers)
> is to post 20 copies, each to just one group, 4 copies to 25
> groups each, or 8 articles to 6 groups each and one more to just
> one group. (for BI=20.6)"
>
> It's interesting, but probably does not apply in the mail
> spam area directly. I suppose we could say how often does
> a domain appear on multiple SURBLs, but some of the SURBL
> data feeds are unitary, i.e. we can't see how many reports
> went into the listing, only whether a domain is listed or not.
>
> This sort of idea could perhaps be useful for categorizing
> spamtrap data however, especially across multiple spamtraps.
>
> But I think your complaint is that there's no objective
> criteria for whitelisting. That's fair, but there always
> must be some subjective judgement applied, especially when
> we can't see the entire universe of mail spam in the same
> way that the entire universe of Usenet spam *is easily
> visible*.
>
> It's also definitely not the case that we can see the entire
> mail ham universe, so there really can't be a generally knowable
> measure of the spam/ham ratio of a given domain.
>
> This is somewhat a question of philosophy and science: to
> know what is knowable and what is not, i.e. epistemology.
>
> Since spammyness versus legitimacy is not easily measured
> purely objectively, we must reserve the right to make
> judgements.
>
> If you have a BI or something similar for *mail* spam, then
> please share it.
As a matter of interest -- and I should just ask Seth Breidbart ;) -- does
this deal with hashbusters? ie. if a message is 80% hashbuster strings,
and 20% payload, it's not so easy to automate BI calculation. (cf. dcc,
Pyzor, Razor, AOL's paper at CEAS, et al.)
- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS
iD8DBQFBP8fIQTcbUG5Y7woRAoXsAKC3A/Q3SOdc+Ektl33sTN0f6MzIIgCgj8Ub
7CVIu93lSfLreyrwQ1++OPo=
=oTky
-----END PGP SIGNATURE-----
More information about the Discuss
mailing list