[SURBL-Discuss] XS hits

Jeff Chan jeffc at surbl.org
Sun Apr 24 08:06:54 CEST 2005


On Saturday, April 23, 2005, 3:42:57 PM, Paul Shields wrote:
> Below are some stats from our incoming mail since midnight 23/04/05. I'm not 
> going to go into too deep an analysis as it's the weekend and I'm too 
> 'tired' to do that ;).

> Total nbr of messages with at least one ??_SURBL hit over the last 22 hours 
> was around 1.4 million. The counts below show how many triggered as 'spam' 
> ("result: Y"), and how many didn't trigger ("result: \."). This is based on 
> our Spam Assassin default threshold of 8, but we have many custom rules so 
> spam threshold is really only meaningful to our config - YMMV ;). We don't 
> currently block or tag via SURBL/RBL at the MTA layer - everything goes 
> through SA.

> XS popped up reasonably frequently in all SURBL hits - and we had 10456 XS 
> spam hits where it wasn't listed in any other URIBL (unable to say how many 
> false-positives out of that list, but our default tagging threshold is high 
> in SA and FP's are vanishingly small).

> Anyway - make of it as you will. SURBL rocks anyway - XS *may* be a useful 
> addition.


> Cheers

> Paul

Hi Paul,

Thanks very much for sharing your data.  Your results look about
as should be expected for the other lists in terms of FPs and
spam detection.   Summarizing your numbers:

AB:       521886 spam      604 ham
WS:       996200 spam    12578 ham
JP:      1234602 spam     4376 ham
OB:      1139181 spam    36760 ham
SC:       751549 spam     1095 ham
PH:          383 spam        1 ham

XS:       939134 spam     6283 ham
XS unique: 10456 spam     5300 ham

For XS it looks like the Spam to Ham ratio is only about 2:1
which means it has too many FPs, and doesn't hit much unique
spam, which is also reasonable given the lack of significant
legitimate domain filtering and high inclusion threshold.
We will work to improve those much further before we propose
adding it to the production data in multi.

In terms of ratios of the current lists, OB is underperforming
the others, judging by your data.  I'm ccing Suresh at Outblaze
so he can see the measurements you got.

All the lists need to hit less ham, and more aggressive checking
and whitelisting is probably needed, assuming the data sources
don't change their inclusion policies.  I hope to address this
in future.

Jeff C.
--
"If it appears in hams, then don't list it."



More information about the Discuss mailing list