Alexa by Amazon.com has a top 500 list on its site, which it derives from stats collected via its Alexa toolbar plugin. This may be a good source of whitelist data.
Any site making that high score has the potential to cause a lot of collateral damage if blacklisted, since these appear to be sites that lots of real-life users *do* to visit regularly, as opposed to sites that advertisers suggest they visit, so they are likely to be mentioned in legitimate personal or business e-mail. Probably sites popular enough to be there have far more to lose than to gain from spamming anyway.
I took the HTML from Alexa's five pages which listed 100 sites each, did a bit of text editing and hey presto: here's the list as an attached ASCII file.
A quick check against my local blacklist yielded exactly 0 intersections :-)
The following entries appeared in suspicious mail or as sender addresses and had been investigated by my filter (WHOIS lookup, etc.), but were not classified as spam domains:
163.net 39.net 888.com 8u8.com chosun.com ctinets.com dreamwiz.com eastday.com enet.com.cn etang.com freeservers.com globo.com km169.net linksynergy.com marktplaats.nl mingpao.com mingpaonews.com mym.net mypcera.com nastydollars.com nate.com naver.com nifty.com no-ip.com opendiary.com rambler.ru sayclub.com trafficmp.com xaonline.com yesky.com
About a third of the top 500 sites (160) were already in my local whitelist. I'll probably add the rest to my whitelist too.
Anybody here who can bulk-check these against SURBL, in case there are listed sites?
Joe
On Thursday, September 23, 2004, 9:15:59 PM, Joe Wein wrote:
Alexa by Amazon.com has a top 500 list on its site, which it derives from stats collected via its Alexa toolbar plugin. This may be a good source of whitelist data.
Any site making that high score has the potential to cause a lot of collateral damage if blacklisted, since these appear to be sites that lots of real-life users *do* to visit regularly, as opposed to sites that advertisers suggest they visit, so they are likely to be mentioned in legitimate personal or business e-mail. Probably sites popular enough to be there have far more to lose than to gain from spamming anyway.
I took the HTML from Alexa's five pages which listed 100 sites each, did a bit of text editing and hey presto: here's the list as an attached ASCII file.
A quick check against my local blacklist yielded exactly 0 intersections :-)
[...]
About a third of the top 500 sites (160) were already in my local whitelist. I'll probably add the rest to my whitelist too.
Anybody here who can bulk-check these against SURBL, in case there are listed sites?
Joe
Way ahead of you Joe. I whitelisted the Alexa 500 when we started, so you won't find them on SURBLs. :-) I don't mention it because I don't want to know what Alexa's licensing policies are. Thanks for thinking of it though. :-)
I agree with your reasoning. Popular sites are more likely to be legitimate and get mentioned in hams, and blocklisting them could cause a lot of FPs. So they should stay off.
And yes, it does include some hosting sites and ISPs in Asia that get occasionally mentioned in casual spam. Most of these ISPs have AUPs *on their own domains* so that *their own domains* are probably not a major source of spam hosting. This does not prevent us from listing any of their customers who spam.
Does anyone else have other potential whitelist sources like this?
Jeff C. -- "If it appears in hams, then don't list it."
Hi Joe, At 21:15 23-09-2004, Joe Wein wrote:
The following entries appeared in suspicious mail or as sender addresses and had been investigated by my filter (WHOIS lookup, etc.), but were not classified as spam domains:
[snip]
no-ip.com
That domain is for a dynamic DNS service provider. It is used by a lot of "poor" people. The top-level domain is definitely not a spam domain. Some of its subdomains have been used in mailings which are strict compliance with the CAN-SPAM Act of 2003. :)
Regards, -sm