On Friday, September 3, 2004, 6:50:25 AM, John Lundin wrote:
Maybe there should be a dark multi, with one bit for confirmed spammers with some ham, and another for early warning entries. It would be nice to be able to evaluate them separately.
As an analog, SARE splits some rulesets (genlsubj, html, header) into `categories of "hit ONLY spam", "have hit ham", and "hit a significant amount of ham." You can choose your level of safety and effectiveness. (If you want to get fancy, encode a confidence level. Two bits? ;-) )
SARE and SpamAssassin in general have a different approach to detecting spam than SURBLs.
SA is usually used with elaborate rules and technologies to categorize spam based on multiple characteristics in headers and message bodies. SA was built to cut through some of the obfuscation of content and sender information that spammers shifted to when they stopped sending clear text messages from known mail servers. Zombies and compounding obfuscation make that approach a constant challenge.
SURBLs attempt to identify spam by finding exactly those URI domains which are used in spams. They cut right to the unavoidable core of what spammers usually do and that's to advertise a web site.
Because the focus of each technology is slightly different, assumptions made from the perspective of one technology may not fit the other perfectly. For example it's not always the case that SURBLs will be used with programs that can score messages with different weights for different rules. If the false positive rates were low enough, SURBLs could be used to block messages with just URI parsing, including in the MTA. That allows spam to be rejected at the transport layer without sending it through SpamAssassin, thus saving much processing time, cpu resources, etc. MTA uses of SURBL already exist, though we're still waiting for sendmail milters and postfix filters.
It was logical for SURBLs to be used with SpamAssassin because SA provides a nice framework of message parsing, URI extraction, mail program interfaces, etc. but SURBLs can be used directly with MTAs and other mail-handling, spam-blocking programs. In those cases the classifications need to be extremely accurate. False positives are the largest obstacle to that use and so they need to be reduced.
Instead of finding ways to collect greylists full of questionable domains, we should be trying to find ways to improve the quality of the existing lists. That's where the most important and valuable progress can be made.
Jeff C.