-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jeff Chan writes:
On Wednesday, September 8, 2004, 8:02:33 PM, Justin Mason wrote:
Jeff Chan writes:
Here's a definition (note there is no H in the name):
http://www.stopspam.org/usenet/mmf/breidbart.html
"The BI is a measure of how spammy a spammed news article is. It is the sum of the square root of the number of groups each copy of a spam article is posted to. So if you post 10 copies of an article, each cross-posted to 4 groups, the BI is 20. Other ways of reaching the BI=20 mark (a threshhold used by some cancellers) is to post 20 copies, each to just one group, 4 copies to 25 groups each, or 8 articles to 6 groups each and one more to just one group. (for BI=20.6)"
As a matter of interest -- and I should just ask Seth Breidbart ;) -- does this deal with hashbusters? ie. if a message is 80% hashbuster strings, and 20% payload, it's not so easy to automate BI calculation. (cf. dcc, Pyzor, Razor, AOL's paper at CEAS, et al.)
- --j.
I'm not sure I understand the question. It seems to me that BI is a calculation based on counts of crossposting per message and does not consider content.
I guess you're saying that detection of multiple postings could be thrown off by hash busting, when the crossposting is done by posting to different newsgroups individually and not overtly listed in the headers.
yep.
- --j.