[SURBL-Discuss] RFC: pj.surbl.org - list from Joe Wein and Pr olocation data

Jeff Chan jeffc at surbl.org
Sun Sep 19 02:30:54 CEST 2004


On Friday, September 17, 2004, 6:46:56 AM, Chris Santerre wrote:
> I do NOT like the idea of more lists.

> 1) The lists are dynamic, so FP rates will change.

It's true that FP rates vary over time for all lists, but the FPs
of PJ look consistently lower than WS.

> 2) Too many lists make it more difficult for the devs to GA and perceptron
> run all of them. Causing a slow down in scoring for SA and others.

While it's true that a PJ list would be one more rule for the
SpamAssassin mass checks to score, I doubt that one more list
would slow it down significantly in the larger picture.  Mass
checks are already scoring a gazillion other rules....

> 3) Run a diff and find out where we have our FPs.

The diffs between WS and PJ are about 26k records out of 46k
records, perhaps too many to check by hand.  Or did you mean
just the FPs?

> 4) More lookups for mutli

multi doesn't work that way.  We can have an infinite number of
lists in multi (for the same overall universe of domains and IPs)
and it's still just one lookup per wild URI.  That's a major
advantage of a combined list: one lookup gets you all the lists.

Remember that the PJ records are already in multi, as part of WS,
so there would be no new records added by having PJ separate,
just some changed return codes and some slightly longer TXT
records with "[PJ]" added.

> 5) Too many list options will drive some potential users away.

Most users probably just use the defaults.  We would want to
add PJ to the default configs for SA3, if we do it.

> 6) K.I.S.S.

> The only reason I see having more lists is if the data is specifically
> different throughout the whole list.

> ie: phishing, UC, regular spam, blog, ect....

> His list data is the same kind as WS. So really....why seperate? 

sc, ws, ob and ab all have email spam URI data, but they're
all separate lists because they represent different types of
data sources (human reports, manual lists, filtered traps, etc.).

I actually wanted the JW data to be separate in the beginning
because it was a distinctly different and new data source with
different a inclusion process, different spamtrap feeds, etc.

> We just keep getting our FP rate lower and it will all be good.

We definitely need to get the FPs in WS lower, independent of
anything else.  FPs only hurt WS and make it less useful to
people.

Jeff C.



More information about the Discuss mailing list