-----Original Message----- From: Jeff Chan [mailto:jeffc@surbl.org] Sent: Monday, November 22, 2004 8:41 PM To: SURBL Discussion list Subject: Re: [SURBL-Discuss] general questions.....
On Monday, November 22, 2004, 5:25:14 PM, Justin Mason wrote:
Important to note that SURBL *can* increase its efficiency,
by changing
its methods -- ie. adding more data sources, modifying the moderation model, etc. can increase efficiency.
I like to think so too, but one of Terry's hypotheses is that detecting spam in the remaining variance (the ~15% currently undetected) may require some "third dimension of spam" and that about half of that variance may be truly "noise" and therefore inherently undetectable (paraphrasing him from off-list discussions). But he doesn't have data to support that claim yet, just empirical observations across different classification systems.
It's good to hear that Henry Stern is getting a PhD for his work in this area, since it can be worthy of that honor. It's not a particularly easy problem.
Jeff C.
Wow that was a good email. It makes me think about things from a higher level then the trenches. The whole thing has to be thought of in sections. If we are thinking of JUST SURBL, then I agree that to get this 15% remaining requires more manpower thrown at the overall project. I say overall, because there are other antispam projects that support SURBL that would also be MUCH better with more help.
Looking at it from another view, the 15% IS caught! THe bigger picture is antispam. You throw DNSRBL, SURBL, BAYES, SARE, and SA at the problem, and classification jumps an order of magnitude that you wanted. Which for most end users can be 99.99%. Differences being tastes in the definition of the classification. Which is a human trait that can't be removed.
But I believe there is still a huge leap SURBL can make in classification. With an increase in data mining, research, and a little more help from major ISPs and registrars.
Thanks for that informative email Jeff!! You saved me a google ;)
--Chris