First, great work Jeff.
While 102k domains isn't nearly as large as the 2.3M in dmoz, it's certainly more than the 12k or so whitelist records we currently have. How does the intersected list look as a potential whitelist?
I think this is just plain nuts to whitelist all of these! Why? If we don't try to whitelist the most popular sites, then what the heck it the point? We could whitelist millions of legit domains forever. The popular ones are the most important.
Here is one from the above list. Why would listing this help us? http://oigawa-railway.co.jp/ (looks like a real popular site huh!)
Please also take a look at these blocklist hits (potential FPs) and share what you think:
http://spamcheck.freeapp.net/whitelists/wikipedia-dmoz-blocklist.summed.txt
I picked of few of these that may give us problems, and none of them met our current criteria to list. (sissy-world.com, good grief that had to be a man at one time!) With the ability to now see whitelisted domains in the crossref page, I don't see a problem with whitelisting all these on the list. Because if they do start spamming again, we can see they are whitelisted and remove them.
so: -1 for adding all those intersected to WL
+1 for whitelisting the blacklist hits.
--Chris