[SURBL-Discuss] RFC: pj.surbl.org - list from Joe Wein and Pr olocation data

Chris Santerre csanterre at merchantsoverseas.com
Fri Sep 17 15:46:56 CEST 2004

>-----Original Message-----
>From: Jeff Chan [mailto:jeffc at surbl.org]
>Sent: Thursday, September 16, 2004 7:01 PM
>To: SURBL Discuss
>Subject: Re: [SURBL-Discuss] RFC: pj.surbl.org - list from Joe Wein and
>Prolocation data
>On Wednesday, September 15, 2004, 7:06:34 PM, David Hooton wrote:
>> On Wed, 15 Sep 2004 16:43:32 -0700, Jeff Chan 
><jeffc at surbl.org> wrote:
>>> we thought it might be useful to make the PJ data available as
>>> a separate list, at least within multi.surbl.org, the combined
>>> SURBL.  We'd like to get your comments on this.
>> I think having a separate list makes sense if the data quality is
>> different to that of the pooled data it was previously connected to.
>>> We're also wondering whether the PJ data should be taken out of
>>> WS, or left in, if we do make PJ a distinct list.  
>> No point in lowering the hitrate of the superset, any 
>additional score
>> added to a spam is better than none at all.
>>> Please comment,
>> The greater choice and control we provide SURBL users the better.  If
>> we have the ability to sustainably break data out like this and
>> provide ongoing data quality ratings to aid score adjustments I think
>> we should do it.
>Thanks for your feedback David.  Does anyone else have comments
>about the possibility of PJ?  Making separate lists from the WS
>data is a little different from the direction we've been going
>lately, so it would be nice to get comments on it.  We're still
>somewhat undecided about whether to do it or not....
>As you can see from the first message about this, the FP rates
>of PJ look significantly lower than WS as a whole.

AS usual, I'm thinking different from everyone else :) 

I do NOT like the idea of more lists. 

1) The lists are dynamic, so FP rates will change.
2) Too many lists make it more difficult for the devs to GA and perceptron
run all of them. Causing a slow down in scoring for SA and others. 
3) Run a diff and find out where we have our FPs.
4) More lookups for mutli
5) Too many list options will drive some potential users away.
6) K.I.S.S.

The only reason I see having more lists is if the data is specifically
different throughout the whole list.

ie: phishing, UC, regular spam, blog, ect....

His list data is the same kind as WS. So really....why seperate? 

We just keep getting our FP rate lower and it will all be good.

--Chris (The devils advocate.)

More information about the Discuss mailing list