OK for completeness, or to thoroughly compound the confusion, ;-) here are joins of the (much smaller) percentiled dmoz and wikipedia lists:
http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile.srt
255838 255838 3955772 dmoz-50thpercentile.srt
http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile.srt
28323 28323 402512 wikipedia-70thpercentile.srt
against the SURBL whitelist and blocklist domains (and WS):
http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile-whitelist.txt http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile-blocklist.txt http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile-blocklist.summed... http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile-blocklist.ws
2962 2962 36518 dmoz-50thpercentile-whitelist.txt 236 236 3312 dmoz-50thpercentile-blocklist.txt 236 2360 24355 dmoz-50thpercentile-blocklist.summed.txt 233 2330 24044 dmoz-50thpercentile-blocklist.ws
http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile-whitelist.t... http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile-blocklist.t... http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile-blocklist.s... http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile-blocklist.w...
1260 1260 14702 wikipedia-70thpercentile-whitelist.txt 47 47 574 wikipedia-70thpercentile-blocklist.txt 47 470 4685 wikipedia-70thpercentile-blocklist.summed.txt 45 450 4471 wikipedia-70thpercentile-blocklist.ws
One reason I didn't mention these before is because they're kind of mid-way between the larger lists and the smaller one combining them all (with 37 records), so I didn't want to focus on them.
For comparison purposes, the percentiled lists are much smaller than the non-percentiled ones, because there are many domains in each corpus with only one entry. Here are the original (un-percentiled) sizes compared with the percentiled ones:
http://spamcheck.freeapp.net/whitelists/dmoz.srt http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile.srt
2300851 2300851 38065969 dmoz.srt 255838 255838 3955772 dmoz-50thpercentile.srt
http://spamcheck.freeapp.net/whitelists/wikipedia.srt http://spamcheck.freeapp.net/whitelists/hpercentile.srt
173828 173828 2633441 wikipedia.srt 28323 28323 402512 wikipedia-70thpercentile.srt
So you can see why the matches of the percentiled data against SURBLs are fewer.
Jeff C. -- "If it appears in hams, then don't list it."