-----Original Message----- From: Jeff Chan [mailto:jeffc@surbl.org] Sent: Thursday, September 16, 2004 7:01 PM To: SURBL Discuss Subject: Re: [SURBL-Discuss] RFC: pj.surbl.org - list from Joe Wein and Prolocation data
On Wednesday, September 15, 2004, 7:06:34 PM, David Hooton wrote:
On Wed, 15 Sep 2004 16:43:32 -0700, Jeff Chan
jeffc@surbl.org wrote:
we thought it might be useful to make the PJ data available as a separate list, at least within multi.surbl.org, the combined SURBL. We'd like to get your comments on this.
I think having a separate list makes sense if the data quality is different to that of the pooled data it was previously connected to.
We're also wondering whether the PJ data should be taken out of WS, or left in, if we do make PJ a distinct list.
No point in lowering the hitrate of the superset, any
additional score
added to a spam is better than none at all.
Please comment,
The greater choice and control we provide SURBL users the better. If we have the ability to sustainably break data out like this and provide ongoing data quality ratings to aid score adjustments I think we should do it.
Thanks for your feedback David. Does anyone else have comments about the possibility of PJ? Making separate lists from the WS data is a little different from the direction we've been going lately, so it would be nice to get comments on it. We're still somewhat undecided about whether to do it or not....
As you can see from the first message about this, the FP rates of PJ look significantly lower than WS as a whole.
AS usual, I'm thinking different from everyone else :)
I do NOT like the idea of more lists.
1) The lists are dynamic, so FP rates will change. 2) Too many lists make it more difficult for the devs to GA and perceptron run all of them. Causing a slow down in scoring for SA and others. 3) Run a diff and find out where we have our FPs. 4) More lookups for mutli 5) Too many list options will drive some potential users away. 6) K.I.S.S.
The only reason I see having more lists is if the data is specifically different throughout the whole list.
ie: phishing, UC, regular spam, blog, ect....
His list data is the same kind as WS. So really....why seperate?
We just keep getting our FP rate lower and it will all be good.
--Chris (The devils advocate.)