In order to explore some conjunctions of existing lists that
might have fewer false positives, we've created some stats
measuring the number of DNS hits against all blocklists that the
individual lists get, along with some permutations of those lists.
For completeness, we've added checking of AB and PH as individual
(not permuted) lists, and the output can be found at:
http://www.surbl.org/permuted-hits.out.txt
[sc][ws][ob][jp] 762 records of 82592 67115 hits of 232463 is 28%
[sc][ws][ob] 857 records of 82592 67325 hits of 232463 is 28%
[sc][ws][jp] 899 records of 82592 70622 hits of 232463 is 30%
[sc][ws] 1066 records of 82592 71725 hits of 232463 is 30%
[sc][ob][jp] 788 records of 82592 69526 hits of 232463 is 29%
[sc][ob] 916 records of 82592 70292 hits of 232463 is 30%
[sc][jp] 934 records of 82592 73050 hits of 232463 is 31%
[sc] 1193 records of 82592 75597 hits of 232463 is 32%
[ws][ob][jp] 16383 records of 82592 144989 hits of 232463 is 62%
[ws][ob] 21793 records of 82592 150159 hits of 232463 is 64%
[ws][jp] 33123 records of 82592 186633 hits of 232463 is 80%
[ws] 58471 records of 82592 209710 hits of 232463 is 90%
[ob][jp] 17145 records of 82592 150595 hits of 232463 is 64%
[ob] 44636 records of 82592 168053 hits of 232463 is 72%
[jp] 34669 records of 82592 196112 hits of 232463 is 84%
[ab] 368 records of 82592 61920 hits of 232463 is 26%
[ph] 996 records of 82592 307 hits of 232463 is 0%
The records columns show the size of the lists or intersections.
The hits columns shows how many DNS queries out of all blocklist
hits those lists or intersections.
This is run nightly around midnight using the script:
http://www.surbl.org/permuted-hits
This gives some measure of the performance of the different lists,
though it likely undercounts rapidly changing data since it's
based on the previous ten days of data. The more quickly
changing lists like AB and SC have higher detection rates in
actual, real-time operation. The stats above also do not take
into account false positives at all, just hits against existing
blocklists (which do have some FPs).
Additionally we've increased the number of days that frequency
data of DNS queries against our whitelist are kept from 10 to 90.
However it will take another 80 days before we get 90 days
accumulated:
http://www.surbl.org/dns-queries.whitelist.counts.txt
I re-wrote the scripts to accommodate the additional data more
efficiently:
http://www.surbl.org/hourly-dnshttp://www.surbl.org/daily-dns (run around midnight)
After things stabilize we will probably change from the current
10,000 DNS queries sampled every 2 hours to 20,000 sampled every
hour. This will make the results represent about a half million
queries per day. The larger sample sizes should make the results
more accurate, but it means absolute numbers from the past won't
be comparable. Percentages and relative rankings should always
be comparable though.
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
As of 12 November 2004, we have added data from
fraud.rhs.mailpolice.com into ph, joining our exiting phishing
data from mailsecurity.net.au. This has doubled the size of
our phishing list to about 1000 records. Unlike other SURBLs,
data from this fraud list includes a few deliberate subdomains
as found in URIs. (Because SURBL clients are expected to reduce
subdomains to base domains, an occasional mismatch in domain
levels between data and client should not cause false positives.)
Thanks to Jay Swackhamer of MailPolice for gathering this data
and making it available to us.
At the time of this message, overlap between
fraud.rhs.mailpolice.com and mailsecurity.net.au's phishing list
was 36 records. Overlap between fraud.rhs.mailpolice.com and
all other existing SURBLs, including ph.surbl.org, is 89 records.
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
We'd like to welcome and thank the addition of a new public
SURBL name server g3.surbl.org administered by:
Alex Broens of Apexis - Switzerland
Without our public nameservers and the help of their
administrators, SURBLs would not be possible.
Our thanks to all of them!
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/