Thu Sep 9 07:20:59 CEST 2004

On Wednesday, September 8, 2004, 8:02:33 PM, Justin Mason wrote:
> Jeff Chan writes:

>> Here's a definition (note there is no H in the name):
>>   http://www.stopspam.org/usenet/mmf/breidbart.html
>> "The BI is a measure of how spammy a spammed news article is. It
>> is the sum of the square root of the number of groups each copy
>> of a spam article is posted to. So if you post 10 copies of an
>> article, each cross-posted to 4 groups, the BI is 20. Other ways
>> of reaching the BI=20 mark (a threshhold used by some cancellers)
>> is to post 20 copies, each to just one group, 4 copies to 25
>> groups each, or 8 articles to 6 groups each and one more to just
>> one group. (for BI=20.6)"

> As a matter of interest -- and I should just ask Seth Breidbart ;) -- does
> this deal with hashbusters?   ie. if a message is 80% hashbuster strings,
> and 20% payload, it's not so easy to automate BI calculation.  (cf. dcc,
> Pyzor, Razor, AOL's paper at CEAS, et al.)

> - --j.

I'm not sure I understand the question.  It seems to me that
BI is a calculation based on counts of crossposting per message
and does not consider content.

I guess you're saying that detection of multiple postings could
be thrown off by hash busting, when the crossposting is done
by posting to different newsgroups individually and not overtly
listed in the headers. 

Jeff C.

