[SURBL-Announce] SURBL stats: list hit rates measured in DNS queries, whitelist hits increased to 90 days

Jeff Chan jeffc at surbl.org
Sun Nov 14 03:24:00 CET 2004


In order to explore some conjunctions of existing lists that
might have fewer false positives, we've created some stats
measuring the number of DNS hits against all blocklists that the
individual lists get, along with some permutations of those lists.

For completeness, we've added checking of AB and PH as individual
(not permuted) lists, and the output can be found at:

  http://www.surbl.org/permuted-hits.out.txt

[sc][ws][ob][jp] 762 records of 82592  67115 hits of 232463 is 28%
[sc][ws][ob]     857 records of 82592  67325 hits of 232463 is 28%
[sc][ws][jp]     899 records of 82592  70622 hits of 232463 is 30%
[sc][ws]        1066 records of 82592  71725 hits of 232463 is 30%
[sc][ob][jp]     788 records of 82592  69526 hits of 232463 is 29%
[sc][ob]         916 records of 82592  70292 hits of 232463 is 30%
[sc][jp]         934 records of 82592  73050 hits of 232463 is 31%
[sc]            1193 records of 82592  75597 hits of 232463 is 32%
[ws][ob][jp]   16383 records of 82592 144989 hits of 232463 is 62%
[ws][ob]       21793 records of 82592 150159 hits of 232463 is 64%
[ws][jp]       33123 records of 82592 186633 hits of 232463 is 80%
[ws]           58471 records of 82592 209710 hits of 232463 is 90%
[ob][jp]       17145 records of 82592 150595 hits of 232463 is 64%
[ob]           44636 records of 82592 168053 hits of 232463 is 72%
[jp]           34669 records of 82592 196112 hits of 232463 is 84%
[ab]             368 records of 82592  61920 hits of 232463 is 26%
[ph]             996 records of 82592    307 hits of 232463 is 0%

The records columns show the size of the lists or intersections.
The hits columns shows how many DNS queries out of all blocklist
hits those lists or intersections.

This is run nightly around midnight using the script:

  http://www.surbl.org/permuted-hits

This gives some measure of the performance of the different lists,
though it likely undercounts rapidly changing data since it's
based on the previous ten days of data.  The more quickly
changing lists like AB and SC have higher detection rates in
actual, real-time operation.  The stats above also do not take
into account false positives at all, just hits against existing
blocklists (which do have some FPs).


Additionally we've increased the number of days that frequency
data of DNS queries against our whitelist are kept from 10 to 90.
However it will take another 80 days before we get 90 days
accumulated:

  http://www.surbl.org/dns-queries.whitelist.counts.txt

I re-wrote the scripts to accommodate the additional data more
efficiently: 

  http://www.surbl.org/hourly-dns
  http://www.surbl.org/daily-dns      (run around midnight)

After things stabilize we will probably change from the current
10,000 DNS queries sampled every 2 hours to 20,000 sampled every
hour.  This will make the results represent about a half million
queries per day.  The larger sample sizes should make the results
more accurate, but it means absolute numbers from the past won't
be comparable.  Percentages and relative rankings should always
be comparable though.

Jeff C.
-- 
Jeff Chan
mailto:jeffc at surbl.org
http://www.surbl.org/



More information about the Announce mailing list