Alexa by Amazon.com has a top 500 list on its site, which it derives from stats collected via its Alexa toolbar plugin. This may be a good source of whitelist data.
Any site making that high score has the potential to cause a lot of collateral damage if blacklisted, since these appear to be sites that lots of real-life users *do* to visit regularly, as opposed to sites that advertisers suggest they visit, so they are likely to be mentioned in legitimate personal or business e-mail. Probably sites popular enough to be there have far more to lose than to gain from spamming anyway.
I took the HTML from Alexa's five pages which listed 100 sites each, did a bit of text editing and hey presto: here's the list as an attached ASCII file.
A quick check against my local blacklist yielded exactly 0 intersections :-)
The following entries appeared in suspicious mail or as sender addresses and had been investigated by my filter (WHOIS lookup, etc.), but were not classified as spam domains:
163.net 39.net 888.com 8u8.com chosun.com ctinets.com dreamwiz.com eastday.com enet.com.cn etang.com freeservers.com globo.com km169.net linksynergy.com marktplaats.nl mingpao.com mingpaonews.com mym.net mypcera.com nastydollars.com nate.com naver.com nifty.com no-ip.com opendiary.com rambler.ru sayclub.com trafficmp.com xaonline.com yesky.com
About a third of the top 500 sites (160) were already in my local whitelist. I'll probably add the rest to my whitelist too.
Anybody here who can bulk-check these against SURBL, in case there are listed sites?
Joe