I'm replying to both as I don't know if Terry is on the list.....
-----Original Message-----
From: Terry Sullivan [mailto:terry@pantos.org]
Sent: Monday, January 10, 2005 11:55 AM
To: discuss@lists.surbl.org
Subject: [SURBL-Discuss] Re: quick poll on SURBL hit %
Chris Santerre wrote:
Just curious as to what average percent of spam people see SURBL
hitting. In a non scientific manor, I average about 85% ...
I've run multiple analyses on historical datasets, and get a
consistent
*average* of 82%-86%, so 84% is a decent estimate.
The most noteworthy statistical characteristic of the SURBL hit rate
over time is the large *variance* in hit rate. Some days, the
SURBL hit
rate I observe in my data is in the 60%'s, while other days its in the
90%'s. The fluctuation appears to be at least somewhat periodic in
nature (several "low" days in a row, followed by several
"high" days).
I've not actually run the numbers, but my totally informal,
*purely gut*
sense is that the magnitude of that variance may have
diminished lately,
but the periodic pattern persists. These periodic fluctuations imply
that there is probably some systematic cause underlying this variance,
and that cause is itself almost certainly periodic in nature.
That is interesting! I wonder if this has become a Metric for actual spam
traffic? Could it coincide with weekends? Don't suppose you could graph that
data over a 365 day period?
I have a feeling if I clean up my
results a bit, that number would be even higher.
I've talked about this with Jeff several times, and he's even shared
some of my comments with this list. No one in the anti-spam
world likes
hearing this, but there is very strong evidence of a "hard"
statistical
detection limit right around ~85%. This limit appears to be more or
less independent of data set or detection method.
Actually Jeff and I have discussed this, and I finally understood it :) I
also agree with the 85% rule of yours. And we seem to be hitting it very
nicely! I'm not sure Bayes even hits that close to 85%!
--Chris