On Sunday, October 10, 2004, 12:52:59 AM, Jeff Chan wrote:
http://www.surbl.org/dns-queries.unmatched.30thpercentile.txt
Quite a few appear ok to whitelist, like democrats.org, pudue.edu, latimes.com, charter.net, zdnet.com, etc. and I'll probably go ahead and whitelist obvious ones like these, so some of these will probably be off this "unmatched" list and onto the whitelist hits by the time you read this.
OK I went ahead and whitelisted a few dozen obvious whitehats from this unmatched DNS query data:
http://spamcheck.freeapp.net/whitelists/unmatched-9oct04.sort
Most of them are very obviously whitehats. A couple I had to look up. None of these are FPs; I'm just adding them to keep them off the lists, and eventually out of the DNS queries from SpamAssassin.
At the same time I noticed a couple domains in these that belong to some companies that have multiple legitimate domains.
For example, tmcs.net belongs to ticketmasters, which in turn belongs to a consolidation company iac.com, which owns many other internet content companies like expedia, match.com, etc. Since they all have apparent legitimate uses I tried to find most of them, then whitelisted them all as:
http://spamcheck.freeapp.net/whitelists/iac.sort
Likewise Reed Electronics publishes many professional electronics journals like Electronic Design News (EDN), and they have essentially a links page to their various organizations, so I whitelisted them all:
http://spamcheck.freeapp.net/whitelists/reedelectronics.sort
Reed Electronics is owned by elsevier.com.
There were also a few stray yahoo domains in those top unmatched queries that we didn't already have whitelisted, so I took our existing yahoo domain list and added more hand-checked yahoo domains from the DMOZ, Wikipedia and DNS query data into:
http://spamcheck.freeapp.net/whitelists/yahoo.sort
It's still probably not a complete list of yahoo domains, but it's a pretty good start, especially based on volume of queries.
On the blackhat side, there are definitely some spammers in the top unmatched DNS queries that should be checked and probably listed in SURBLs:
http://www.surbl.org/dns-queries.unmatched.30thpercentile.txt
I'll leave it to some of you folks who enjoy listing spammers (more than whitelisting hammers ;-) to look into some of these....
Since spam domains tend to be a lot more dynamic than ham domains, I'd recommend checking this list every few days.
Certainly there are more ham domains in there also and if anyone spots any, please report or whitelist them.
Cheers,
Jeff C. -- "If it appears in hams, then don't list it."