Re: [SURBL-Discuss] general questions.....

23 Nov 2004


      On Tuesday, November 23, 2004, 12:14:59 PM, Rob McEwen wrote:
...
A. Definite hand-typed HAM
...
B. Closed Loop Opt-In NEWSLETTER (topically applicable to the recipient)
...
C. NEWSLETTER (topically applicable to the recipient) from reputable
organization (no harvesting, few/none NANAS, no SpamHaus) where the person
didn't actually subscribe, but likes to read it... maybe it came because
they previously bought something or left checked a "receive other
offers/info" checkbox
...
D. More "spammy" NEWSLETTER (but topically applicable to the recipient)
where the mailer is fairly "clean" (some NANAS, no SpamHaus), but the user
didn't explicitly Opt-in. Maybe they left a "receive other offers" checkbox
checked in the past when filling out something else or ordering something
else.
...
E. More "spammy" ADVERTISEMENT (but topically applicable to the recipient)
where the mailer is very "clean" (no harvesting, few NANAS, no SpamHaus),
but the user didn't explicitly Opt-in. Maybe they left a "receive other
offers" checkbox checked in the past when filling out something else or
ordering something else
All of the above should probably be considered ham for SURBL
purposes.  What matters more than the *sending style* is what
other *uses the domain name* or IP in the URI might have.
Remember that we're not blocking sending methods.  We're
blocking URI mentions like domains.  Therefore what matters
is not how the message is sent (newsletter, hand-send, etc.)
but ***what the domain might be used for***.  We don't want to
block on legitimate domains.  All of your examples above
are for legitimate or mostly legitimate domains.
...
F. Definite spam (to varying degrees).
...
Of course, it is not always possible to know if an e-mail is "topically
applicable to the recipient". But assuming that you do, it is hard for Mail
Administrators to distinguish between B, C, and D. It is also sometimes hard
to distinguish between E & F.
A better question might be whether the mail is "topically
applicable to ANY recipient."  Since we are a global blocklist,
we need to think globally and act on behalf of ALL users,
not just one particular recipient.
Therefore we want to list domains that are pretty much
universally regarded as spammy like cheappillz4u. biz,
0emsoftwarez. info, etc., and almost certainly not some
plumbing fixture manufacturer's open subscription newsletter.
...
The overwhelming percentage of Spam IS very distinguishable from A-E because
of things like obfuscation techniques, SpamTrap recipients, location of
sender's server, past history of sender, etc.
I agree.  We want to list only that extremely obvious spam.
Usually it's for pills, mortgage, warez, gambling, porn, etc.
...
Still, this whole issue makes me question, "how good are Ham Corpuses".
...
Moreover, when a particular SURBL gets an FP rating of .002%, I think,
"that's great"... but then I wonder, "is this .002% actual human written
correspondence, or is it a newsletter, etc?"
...
Rob McEwen
As has been noted, getting down to 1 part in 50,000
(0.02%) it's very easy for a minor misclassification
to have a huge impact on the FP numbers.
Ham corpora do have errors, both FP and FN.  Usually
FPs can only be detected by hand-checking them again.
Even highly-experienced spam-fighters make errors when
classifying their ham and spam initially.  To err is human.
There are also problems with the representativeness
of messages in corpora.  It's not always easy to put
together large and broad enough collections of ham
to meaningfully reflect the larger corpus of all messages
in general.
Measurements like these are quite hard to do well.
Corpus checks are probably best for relative differences
between algorithms, etc.  I.e. is performance increasing
or decreasing with a given change in coding, inclusion
policies, etc.
Jeff C.
--
"If it appears in hams, then don't list it."

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [SURBL-Discuss] general questions.....