I'm replying to both as I don't know if Terry is on the list.....
-----Original Message----- From: Terry Sullivan [mailto:terry@pantos.org] Sent: Monday, January 10, 2005 11:55 AM To: discuss@lists.surbl.org Subject: [SURBL-Discuss] Re: quick poll on SURBL hit %
Chris Santerre wrote:
Just curious as to what average percent of spam people see SURBL hitting. In a non scientific manor, I average about 85% ...
I've run multiple analyses on historical datasets, and get a consistent *average* of 82%-86%, so 84% is a decent estimate.
The most noteworthy statistical characteristic of the SURBL hit rate over time is the large *variance* in hit rate. Some days, the SURBL hit rate I observe in my data is in the 60%'s, while other days its in the 90%'s. The fluctuation appears to be at least somewhat periodic in nature (several "low" days in a row, followed by several "high" days). I've not actually run the numbers, but my totally informal, *purely gut* sense is that the magnitude of that variance may have diminished lately, but the periodic pattern persists. These periodic fluctuations imply that there is probably some systematic cause underlying this variance, and that cause is itself almost certainly periodic in nature.
That is interesting! I wonder if this has become a Metric for actual spam traffic? Could it coincide with weekends? Don't suppose you could graph that data over a 365 day period?
I have a feeling if I clean up my results a bit, that number would be even higher.
I've talked about this with Jeff several times, and he's even shared some of my comments with this list. No one in the anti-spam world likes hearing this, but there is very strong evidence of a "hard" statistical detection limit right around ~85%. This limit appears to be more or less independent of data set or detection method.
Actually Jeff and I have discussed this, and I finally understood it :) I also agree with the 85% rule of yours. And we seem to be hitting it very nicely! I'm not sure Bayes even hits that close to 85%!
--Chris