On Sunday, July 13, 2008, 9:12:25 AM, Joseph Brennan wrote:
Jeff Chan jeffc@surbl.org wrote:
I think we probably can't reveal the exact listing criteria in case they're useful for the bad guys. I know it's somewhat inappropriate to ask for comments without revealing details. I suppose I'm asking for general responses then. :)
So you'll keep ob, but take some undisclosed action to improve its accuracy. Sounds worthwhile to me.
Thanks! Yes, we would not get rid of OB entirely ever. It does have some good data, but with too many FPs. The goal would be to keep as much of the good data as possible while eliminating most of the bad. Unfortunately some of the good data may be thrown out with the bad; baby with the bathwater, so to speak. IMO FPs are much worse than FNs, so some increase in FNs balances out a decrease in FPs. Trying to decide if it's worth doing....
Jeff C.
Hi Jeff, At 09:30 13-07-2008, Jeff Chan wrote:
Thanks! Yes, we would not get rid of OB entirely ever. It does have some good data, but with too many FPs. The goal would be to keep as much of the good data as possible while eliminating most of the bad. Unfortunately some of the good data may be thrown out with the bad; baby with the bathwater, so to speak. IMO FPs are much worse than FNs, so some increase in FNs balances out a decrease in FPs. Trying to decide if it's worth doing....
If there are too many false positives, it's better to throw out the bad data at the risk of losing some good data. You can run a test feed to determine the effectiveness of each approach.
Regards, -sm
On Sunday, July 13, 2008, 10:02:04 AM, SM wrote:
If there are too many false positives, it's better to throw out the bad data at the risk of losing some good data. You can run a test feed to determine the effectiveness of each approach.
Which brings up a good point: does anyone have any current test data about the OB list that they can share, i.e., false positive rate especially compared to the other SURBL lists?
Cheers,
Jeff C.
On Sun, Jul 13, 2008 at 12:20:18PM -0700, Jeff Chan wrote:
Which brings up a good point: does anyone have any current test data about the OB list that they can share, i.e., false positive rate especially compared to the other SURBL lists?
Here's the stats for a recent run of mine using the at-time-of-receipt network test results:
OVERALL SPAM% HAM% S/O RANK SCORE NAME 0 946808 55766 0.944 0.00 0.00 (all messages) 0.00000 94.4377 5.5623 0.944 0.00 0.00 (all messages as %) 60.480 64.0420 0.0018 1.000 1.00 0.00 URIBL_JP_SURBL 42.395 44.8921 0.0000 1.000 0.98 0.00 URIBL_SC_SURBL 36.647 38.8052 0.0000 1.000 0.98 0.00 URIBL_AB_SURBL 27.543 29.1633 0.0341 0.999 0.95 0.00 URIBL_WS_SURBL 43.095 45.6271 0.1022 0.998 0.91 0.00 URIBL_OB_SURBL 0.712 0.7537 0.0072 0.991 0.83 0.00 URIBL_PH_SURBL
I don't know what the overlap and such is on these.
FWIW: It seems like everytime I report a SURBL FP it's because of OB. So I'm all for cleaning up the list/listing criteria.
On Sunday, July 13, 2008, 1:39:01 PM, Theo Dinter wrote:
On Sun, Jul 13, 2008 at 12:20:18PM -0700, Jeff Chan wrote:
Which brings up a good point: does anyone have any current test data about the OB list that they can share, i.e., false positive rate especially compared to the other SURBL lists?
Here's the stats for a recent run of mine using the at-time-of-receipt network test results:
OVERALL SPAM% HAM% S/O RANK SCORE NAME 0 946808 55766 0.944 0.00 0.00 (all messages) 0.00000 94.4377 5.5623 0.944 0.00 0.00 (all messages as %) 60.480 64.0420 0.0018 1.000 1.00 0.00 URIBL_JP_SURBL 42.395 44.8921 0.0000 1.000 0.98 0.00 URIBL_SC_SURBL 36.647 38.8052 0.0000 1.000 0.98 0.00 URIBL_AB_SURBL 27.543 29.1633 0.0341 0.999 0.95 0.00 URIBL_WS_SURBL 43.095 45.6271 0.1022 0.998 0.91 0.00 URIBL_OB_SURBL 0.712 0.7537 0.0072 0.991 0.83 0.00 URIBL_PH_SURBL
I don't know what the overlap and such is on these.
FWIW: It seems like everytime I report a SURBL FP it's because of OB. So I'm all for cleaning up the list/listing criteria.
Thanks Theo. Can you describe what the time factors are on the corpora checks? If we make a change to the OB data, how (soon) would it be reflected in the checks?
Jeff C.
On Sat, Jul 19, 2008 at 01:06:02AM -0700, Jeff Chan wrote:
Here's the stats for a recent run of mine using the at-time-of-receipt network test results: OVERALL SPAM% HAM% S/O RANK SCORE NAME
[...]
43.095 45.6271 0.1022 0.998 0.91 0.00 URIBL_OB_SURBL
Thanks Theo. Can you describe what the time factors are on the corpora checks? If we make a change to the OB data, how (soon) would it be reflected in the checks?
My runs are for the last 60 days, and the results are from the live mail flow, so I would expect to see the numbers improve daily as the older mails stop being used.
Theo Van Dinter writes:
On Sat, Jul 19, 2008 at 01:06:02AM -0700, Jeff Chan wrote:
Here's the stats for a recent run of mine using the at-time-of-receipt network test results: OVERALL SPAM% HAM% S/O RANK SCORE NAME
[...]
43.095 45.6271 0.1022 0.998 0.91 0.00 URIBL_OB_SURBL
Thanks Theo. Can you describe what the time factors are on the corpora checks? If we make a change to the OB data, how (soon) would it be reflected in the checks?
My runs are for the last 60 days, and the results are from the live mail flow, so I would expect to see the numbers improve daily as the older mails stop being used.
If you click through on the ruleqa site (http://ruleqa.spamassassin.org), the details page contains hit-rates by time (in weeks), and even a graph at the bottom that may have a finer resolution.
--j.
"Jeff Chan" jeffc@surbl.org wrote in message news:182909761.20080713093057@surbl.org...
On Sunday, July 13, 2008, 9:12:25 AM, Joseph Brennan wrote:
Jeff Chan jeffc@surbl.org wrote:
I think we probably can't reveal the exact listing criteria in case they're useful for the bad guys. I know it's somewhat inappropriate to ask for comments without revealing details. I suppose I'm asking for general responses then. :)
So you'll keep ob, but take some undisclosed action to improve its accuracy. Sounds worthwhile to me.
Thanks! Yes, we would not get rid of OB entirely ever. It does have some good data, but with too many FPs. The goal would be to keep as much of the good data as possible while eliminating most of the bad. Unfortunately some of the good data may be thrown out with the bad; baby with the bathwater, so to speak. IMO FPs are much worse than FNs, so some increase in FNs balances out a decrease in FPs. Trying to decide if it's worth doing....
Jeff C.
Sounds like a great idea to me too - fewer FPs is always a good thing...
Cheers, Jeremy