Re: [SURBL-Discuss] RFC: significantly reducing FPs on OB

List overview All Threads
Download

newer

older

Joe Jobbed by a scammer and...

Re: [SURBL-Discuss] RFC:...

Jeff Chan

13 Jul 2008 13 Jul '08

6:30 p.m.

On Sunday, July 13, 2008, 9:12:25 AM, Joseph Brennan wrote:

...

Jeff Chan jeffc@surbl.org wrote:

...

...
I think we probably can't reveal the exact listing criteria in case they're useful for the bad guys. I know it's somewhat inappropriate to ask for comments without revealing details. I suppose I'm asking for general responses then. :)

...

So you'll keep ob, but take some undisclosed action to improve its accuracy. Sounds worthwhile to me.

Thanks! Yes, we would not get rid of OB entirely ever. It does have some good data, but with too many FPs. The goal would be to keep as much of the good data as possible while eliminating most of the bad. Unfortunately some of the good data may be thrown out with the bad; baby with the bathwater, so to speak. IMO FPs are much worse than FNs, so some increase in FNs balances out a decrease in FPs. Trying to decide if it's worth doing....

Jeff C.

Show replies by date

13 Jul 13 Jul

7:02 p.m.

New subject: RFC: significantly reducing FPs on OB

Hi Jeff, At 09:30 13-07-2008, Jeff Chan wrote:

...

Thanks! Yes, we would not get rid of OB entirely ever. It does have some good data, but with too many FPs. The goal would be to keep as much of the good data as possible while eliminating most of the bad. Unfortunately some of the good data may be thrown out with the bad; baby with the bathwater, so to speak. IMO FPs are much worse than FNs, so some increase in FNs balances out a decrease in FPs. Trying to decide if it's worth doing....

If there are too many false positives, it's better to throw out the bad data at the risk of losing some good data. You can run a test feed to determine the effectiveness of each approach.

Regards, -sm

Jeff Chan

9:20 p.m.

New subject: RFC: significantly reducing FPs on OB

On Sunday, July 13, 2008, 10:02:04 AM, SM wrote:

...

If there are too many false positives, it's better to throw out the bad data at the risk of losing some good data. You can run a test feed to determine the effectiveness of each approach.

Which brings up a good point: does anyone have any current test data about the OB list that they can share, i.e., false positive rate especially compared to the other SURBL lists?

Cheers,

Jeff C.

Theo Van Dinter

10:39 p.m.

New subject: RFC: significantly reducing FPs on OB

On Sun, Jul 13, 2008 at 12:20:18PM -0700, Jeff Chan wrote:

...

Which brings up a good point: does anyone have any current test data about the OB list that they can share, i.e., false positive rate especially compared to the other SURBL lists?

Here's the stats for a recent run of mine using the at-time-of-receipt network test results:

OVERALL SPAM% HAM% S/O RANK SCORE NAME 0 946808 55766 0.944 0.00 0.00 (all messages) 0.00000 94.4377 5.5623 0.944 0.00 0.00 (all messages as %) 60.480 64.0420 0.0018 1.000 1.00 0.00 URIBL_JP_SURBL 42.395 44.8921 0.0000 1.000 0.98 0.00 URIBL_SC_SURBL 36.647 38.8052 0.0000 1.000 0.98 0.00 URIBL_AB_SURBL 27.543 29.1633 0.0341 0.999 0.95 0.00 URIBL_WS_SURBL 43.095 45.6271 0.1022 0.998 0.91 0.00 URIBL_OB_SURBL 0.712 0.7537 0.0072 0.991 0.83 0.00 URIBL_PH_SURBL

I don't know what the overlap and such is on these.

FWIW: It seems like everytime I report a SURBL FP it's because of OB. So I'm all for cleaning up the list/listing criteria.

-- Randomly Selected Tagline: "Turns out a severe caffeine deficiency and crushing deadline pressures have conspired to render me temporarily stupid." - Larry Chin

Jeff Chan

19 Jul 19 Jul

10:06 a.m.

New subject: RFC: significantly reducing FPs on OB

On Sunday, July 13, 2008, 1:39:01 PM, Theo Dinter wrote:

...

On Sun, Jul 13, 2008 at 12:20:18PM -0700, Jeff Chan wrote:

...
Which brings up a good point: does anyone have any current test data about the OB list that they can share, i.e., false positive rate especially compared to the other SURBL lists?

...

Here's the stats for a recent run of mine using the at-time-of-receipt network test results:

...

OVERALL SPAM% HAM% S/O RANK SCORE NAME 0 946808 55766 0.944 0.00 0.00 (all messages) 0.00000 94.4377 5.5623 0.944 0.00 0.00 (all messages as %) 60.480 64.0420 0.0018 1.000 1.00 0.00 URIBL_JP_SURBL 42.395 44.8921 0.0000 1.000 0.98 0.00 URIBL_SC_SURBL 36.647 38.8052 0.0000 1.000 0.98 0.00 URIBL_AB_SURBL 27.543 29.1633 0.0341 0.999 0.95 0.00 URIBL_WS_SURBL 43.095 45.6271 0.1022 0.998 0.91 0.00 URIBL_OB_SURBL 0.712 0.7537 0.0072 0.991 0.83 0.00 URIBL_PH_SURBL

...

I don't know what the overlap and such is on these.

...

FWIW: It seems like everytime I report a SURBL FP it's because of OB. So I'm all for cleaning up the list/listing criteria.

Thanks Theo. Can you describe what the time factors are on the corpora checks? If we make a change to the OB data, how (soon) would it be reflected in the checks?

Jeff C.

Theo Van Dinter

6:12 p.m.

New subject: RFC: significantly reducing FPs on OB

On Sat, Jul 19, 2008 at 01:06:02AM -0700, Jeff Chan wrote:

...

...
Here's the stats for a recent run of mine using the at-time-of-receipt network test results: OVERALL SPAM% HAM% S/O RANK SCORE NAME

[...]

...

...
43.095 45.6271 0.1022 0.998 0.91 0.00 URIBL_OB_SURBL

Thanks Theo. Can you describe what the time factors are on the corpora checks? If we make a change to the OB data, how (soon) would it be reflected in the checks?

My runs are for the last 60 days, and the results are from the live mail flow, so I would expect to see the numbers improve daily as the older mails stop being used.

-- Randomly Selected Tagline: Inoculatte: To take coffee intravenously when you are running late. - Washington Post

jm＠jmason.org

21 Jul 21 Jul

12:29 a.m.

New subject: RFC: significantly reducing FPs on OB

Theo Van Dinter writes:

...

On Sat, Jul 19, 2008 at 01:06:02AM -0700, Jeff Chan wrote:

...
...
Here's the stats for a recent run of mine using the at-time-of-receipt network test results: OVERALL SPAM% HAM% S/O RANK SCORE NAME

[...]

...
...
43.095 45.6271 0.1022 0.998 0.91 0.00 URIBL_OB_SURBL

Thanks Theo. Can you describe what the time factors are on the corpora checks? If we make a change to the OB data, how (soon) would it be reflected in the checks?

My runs are for the last 60 days, and the results are from the live mail flow, so I would expect to see the numbers improve daily as the older mails stop being used.

If you click through on the ruleqa site (http://ruleqa.spamassassin.org), the details page contains hit-rates by time (in weeks), and even a graph at the bottom that may have a finer resolution.

--j.

Jeremy Fairbrass

14 Jul 14 Jul

10:43 a.m.

New subject: RFC: significantly reducing FPs on OB

"Jeff Chan" jeffc@surbl.org wrote in message news:182909761.20080713093057@surbl.org...

...

On Sunday, July 13, 2008, 9:12:25 AM, Joseph Brennan wrote:

...
Jeff Chan jeffc@surbl.org wrote:

...
...
I think we probably can't reveal the exact listing criteria in case they're useful for the bad guys. I know it's somewhat inappropriate to ask for comments without revealing details. I suppose I'm asking for general responses then. :)

...
So you'll keep ob, but take some undisclosed action to improve its accuracy. Sounds worthwhile to me.

Thanks! Yes, we would not get rid of OB entirely ever. It does have some good data, but with too many FPs. The goal would be to keep as much of the good data as possible while eliminating most of the bad. Unfortunately some of the good data may be thrown out with the bad; baby with the bathwater, so to speak. IMO FPs are much worse than FNs, so some increase in FNs balances out a decrease in FPs. Trying to decide if it's worth doing....

Jeff C.

Sounds like a great idea to me too - fewer FPs is always a good thing...

Cheers, Jeremy

6190

Age (days ago)

6197

Last active (days ago)

discuss@lists.surbl.org

7 comments

5 participants

tags (0)

participants (5)

Jeff Chan
Jeremy Fairbrass
jm＠jmason.org
SM
Theo Van Dinter