On Monday, November 22, 2004, 5:25:14 PM, Justin Mason wrote:
Important to note that SURBL *can* increase its efficiency, by changing its methods -- ie. adding more data sources, modifying the moderation model, etc. can increase efficiency.
I like to think so too, but one of Terry's hypotheses is that detecting spam in the remaining variance (the ~15% currently undetected) may require some "third dimension of spam" and that about half of that variance may be truly "noise" and therefore inherently undetectable (paraphrasing him from off-list discussions). But he doesn't have data to support that claim yet, just empirical observations across different classification systems.
It's good to hear that Henry Stern is getting a PhD for his work in this area, since it can be worthy of that honor. It's not a particularly easy problem.
Jeff C. -- "If it appears in hams, then don't list it."