Jeff Chan wrote:
there must be some form of feedback or error correction, or other strategies to deal with misclassifications.
Whitelisting is one strategy.
ACK, but where and as far as possible I'd prefer a technical definition like the "BI" (Breidbarth Index) in Usenet.
E.g. whitelisting TLD .edu is almost the same bad idea as blacklisting TLD .biz.
Another is trying to get enough spam reports or even trapped spam to be able to get some meaningful statistical impression about spammyness. If 1000 people report a domain as spammy, it probably is. If only 1 person says it's spammy it may be less likely.
You could combine these strategies using the SC input: If the SC data matches whitelisted domains, then something is wrong:
Either the domain shouldn't be whitelisted w.r.t. the SC zone. or it should be reported as "IB link" (innocent bystander) to deputies@admin.spamcop
Both cases require some manual intervention, unfortuately, but at least you would catch erroneous WL entries.
Does anyone have any ideas, research, etc. into this?
You're already using good ideas like "age of registration", and if this data isn't available (see *.whois.rfc-ignorant.org) it is their problem, treat it as "registered yesterday".
In grey cases, we must sometime apply some judgement in order to prevent false positives.
Sure, but that judgement should consider the source resp. zone of the data. SC and SC.SURBL.ORG are not exactly the same as OB or WS. Minus obvious errors, abuses, and bugs SC.SURBL.ORG is designed to run on auto-pilot.
we should be free to list any part of an organization that is mostly spammy, however, even if other parts are not.
Indeed, and TLD .edu, or hosted by Schlund, or a NYSE ticker symbol have nothing to do with spam vs. ham. Anybody can be hit by an idiot spammer in his own domain, so what ? As soon as the problem is solved the listings expire.
Perhaps my obvious errors are not the same as your obvious errors. ;-)
Not sure. My definition of "obvious error" for the SC zone would be "I'd report it as innocent bystander to deputies@".
If your definition is very different, and if the reason for this difference is related to other SURBL zones, then maybe one general whitelist covering all zones is not good enough.
[rogue nations]
I assume most people are aware that many of the professional spammer sites seem to be hosted in China, Brazil and Korea, and that they continue to do so. Therefore we can assume any anti-spam laws or abuse policies are not being enforced there.
TTBOMK that's not more true for Korea. They have some kind of "anti-spam" law, it predates CAN-SPAM, and is not really worse.
Some ISPs and registrars are "rogue" (e.g. SpamCast, ChinaNet, DirectI), many are clueless or ignorant, but it's not related to "nations". Unless you're prepared to identify the U.S. as the top spammer nation of the known universe. ;-)
Most of the SpamCop reports get into sc.surbl.org.
That's good. Use the rest which doesn't make it to check your whitelists and automatical procedures. Maybe feed it to the new "unconfirmed" SURBL (?) BTW, of course a new uc.surbl.org shouldn't be a part of multi.surbl.org, it's too dangerous.
Bye, Frank