On Saturday, April 23, 2005, 3:42:57 PM, Paul Shields wrote:
Below are some stats from our incoming mail since midnight 23/04/05. I'm not going to go into too deep an analysis as it's the weekend and I'm too 'tired' to do that ;).
Total nbr of messages with at least one ??_SURBL hit over the last 22 hours was around 1.4 million. The counts below show how many triggered as 'spam' ("result: Y"), and how many didn't trigger ("result: ."). This is based on our Spam Assassin default threshold of 8, but we have many custom rules so spam threshold is really only meaningful to our config - YMMV ;). We don't currently block or tag via SURBL/RBL at the MTA layer - everything goes through SA.
XS popped up reasonably frequently in all SURBL hits - and we had 10456 XS spam hits where it wasn't listed in any other URIBL (unable to say how many false-positives out of that list, but our default tagging threshold is high in SA and FP's are vanishingly small).
Anyway - make of it as you will. SURBL rocks anyway - XS *may* be a useful addition.
Cheers
Paul
Hi Paul,
Thanks very much for sharing your data. Your results look about as should be expected for the other lists in terms of FPs and spam detection. Summarizing your numbers:
AB: 521886 spam 604 ham WS: 996200 spam 12578 ham JP: 1234602 spam 4376 ham OB: 1139181 spam 36760 ham SC: 751549 spam 1095 ham PH: 383 spam 1 ham
XS: 939134 spam 6283 ham XS unique: 10456 spam 5300 ham
For XS it looks like the Spam to Ham ratio is only about 2:1 which means it has too many FPs, and doesn't hit much unique spam, which is also reasonable given the lack of significant legitimate domain filtering and high inclusion threshold. We will work to improve those much further before we propose adding it to the production data in multi.
In terms of ratios of the current lists, OB is underperforming the others, judging by your data. I'm ccing Suresh at Outblaze so he can see the measurements you got.
All the lists need to hit less ham, and more aggressive checking and whitelisting is probably needed, assuming the data sources don't change their inclusion policies. I hope to address this in future.
Jeff C. -- "If it appears in hams, then don't list it."