On Friday, September 24, 2004, 11:24:57 PM, Ryan Thompson wrote:
do we want to say anything in this document (or possibly another document) about whitelisting criteria? There are really three main categories:
- Blacklist material (that's what your policy addresses very well)
1.5. "Almost" blacklist material (the grey ones); ala the "UC" list, are the domains that are almost totally spammers, but may have a few borderline uses
- Domains that should not be listed, but are not necessarily of "whitelist" merit. These are mostly the domains where insufficient data (or effort) exists to make a determination, which, for good or for ill, is where the bulk of our human efforts are currently focused.
- Domains that are white; i.e., have definite legitimate uses
OK, that's four. If we really want to reduce FPs, we need to carefully consider *all* of these categories when analysing potential domains. I spend just as much time pulling domains out of ham as I do pulling domains out of spam.
The distinction between 2 and 3 is almost as difficult as the distinction between 1 and 2 sometimes.
- Ryan
I agree with 1 and 3, but another way to look at the undecided middle ground might be to say that if a domain or IP has not proven to be blacklist material and has not been falsely listed and therefore in need of whitelisting, then it perhaps can be ignored until it gets into category 1 or 3.
I know that goes against the feelings of people who want to catch every spam, and I understand that feeling myself, but in *practical terms* it may be a *useful* solution.
Yes, that misses some marginal and probable spammers, but it lets us focus on the first category which are probably the most important to find in terms of the volume of spam they produce. The others can consume a lot of time and effort without producing the level of performance that catching the *major* spammers in the first category can.
I realize you guys are trying to sort out some of the stuff in the middle and I understand some of the reasons for wanting to do it, but I think working on the more clear cases gets us the most results for our efforts.
Jeff C. -- "If it appears in hams, then don't list it."