[SURBL-Discuss] XS hits

Alain coc454402 at sneakemail.com
Sun Apr 24 13:58:22 CEST 2005


Hi Paul

On 4/24/05, Paul Shields paul.shields-at-blueyonder.co.uk |surbl list|
<...> wrote:
> Below are some stats from our incoming mail since midnight 23/04/05. I'm not
> going to go into too deep an analysis as it's the weekend and I'm too
> 'tired' to do that ;).
> 
> Total nbr of messages with at least one ??_SURBL hit over the last 22 hours
> was around 1.4 million. The counts below show how many triggered as 'spam'
> ("result: Y"), and how many didn't trigger ("result: \."). This is based on
> our Spam Assassin default threshold of 8, but we have many custom rules so
> spam threshold is really only meaningful to our config - YMMV ;). We don't
> currently block or tag via SURBL/RBL at the MTA layer - everything goes
> through SA.

Wow, that's very usefull information.  Thanks an awefull  lot.

I have placed the nr's in a small table (also based on Jeff's table) :

not tagged means a hit, without being marked as spam by SA.  It
doesn't mean that those e-mails aren't spam, just that those pass the
-conservative- filter.

	   Spam	           Not tagged	%not tagged	% spam total
AB	    521886	 604	          0,116%	    37,28%
WS	   996200	12578	       1,247%	          71,16%
JP	     1234602	 4376	         0,353%	           88,19%
OB	    1139181	36760	       3,126%	         81,37%
SC           751549	 1095	         0,145%	           53,68%
PH	     383	   1	              0,260%	         0,03%
XS	      939134	  6283	          0,665%	    67,08%
XS-unique 10456	       5300	     33,638%	          0,75%

Looking at the percentages I do see that JP (and SC) are good
predictors for a e-mail being spam and thus very usable if only a few
checks can be made (for example if scoring isn't an option).

Comparing XS with WS and OB it's clear that XS is a better predictor
than those two lists...

However that doesn't mean they have more FP's, it's possible that WS,
OB and XS do catch many spams that pass the other lists and so pass
through this SA setup.  In this case those lists would be very
usefull.

Looking at XS-unique I do wonder how much the other lists catch
"unique" and how many of those unique hits to pass the filter, it's
possible there's a big overlap between the lists (partly seen at
http://www.surbl.org/permuted-hits.out.txt).

It's possible that XS is so much faster than the other links that it's
catching very new spam that's still passing the other lists.

Some remarks :

- While the total nr of e-mails is quite high, it's based on just a
day, maybe one or two big spamruns are skewing the results.

- It would also very usefull to have info about the unique results of
the other lists and combinations of them.


BTW. As written before a large corpus of domains that have occured
several times over a periode of time on not-tagged e-mails, could be a
very effective first filter to avoid automatic inclusion.  The nice
thing is that this info doesn't need to be very new, I would even
ignore the last week...  A domain that's did occur on several seperate
days in the past on several not-tagged  e-mails is probably hammy.  
Of course a manual inclusion inside the blacklist should be possible. 
This info is probably more usefull than the creation day of the domain
and probably easier to make (just a lookup i a internal list).

Alain



More information about the Discuss mailing list