Differences being tastes in the definition of the classification
...which reminds me... I keep meaning to ask about what constitutes a FP when discussed on this list. Basically, this isn't always so black & white:
Consider the following classifications:
A. Definite hand-typed HAM
B. Closed Loop Opt-In NEWSLETTER (topically applicable to the recipient)
C. NEWSLETTER (topically applicable to the recipient) from reputable organization (no harvesting, few/none NANAS, no SpamHaus) where the person didn't actually subscribe, but likes to read it... maybe it came because they previously bought something or left checked a "receive other offers/info" checkbox
D. More "spammy" NEWSLETTER (but topically applicable to the recipient) where the mailer is fairly "clean" (some NANAS, no SpamHaus), but the user didn't explicitly Opt-in. Maybe they left a "receive other offers" checkbox checked in the past when filling out something else or ordering something else.
E. More "spammy" ADVERTISEMENT (but topically applicable to the recipient) where the mailer is very "clean" (no harvesting, few NANAS, no SpamHaus), but the user didn't explicitly Opt-in. Maybe they left a "receive other offers" checkbox checked in the past when filling out something else or ordering something else
F. Definite spam (to varying degrees).
(I'm sure someone else could have done a better job of listed hard-to-differentiate categories)
Of course, it is not always possible to know if an e-mail is "topically applicable to the recipient". But assuming that you do, it is hard for Mail Administrators to distinguish between B, C, and D. It is also sometimes hard to distinguish between E & F.
The overwhelming percentage of Spam IS very distinguishable from A-E because of things like obfuscation techniques, SpamTrap recipients, location of sender's server, past history of sender, etc.
Still, this whole issue makes me question, "how good are Ham Corpuses".
Moreover, when a particular SURBL gets an FP rating of .002%, I think, "that's great"... but then I wonder, "is this .002% actual human written correspondence, or is it a newsletter, etc?"
Rob McEwen