[SURBL-Discuss] Whitelist data: Alexa.com top 500

Jeff Chan jeffc at surbl.org
Fri Sep 24 06:36:23 CEST 2004


On Thursday, September 23, 2004, 9:15:59 PM, Joe Wein wrote:
> Alexa by Amazon.com has a top 500 list on its site, which it derives from
> stats collected via its Alexa toolbar plugin. This may be a good source of
> whitelist data.

> Any site making that high score has the potential to cause a lot of
> collateral damage if blacklisted, since these appear to be sites that lots
> of real-life users *do* to visit regularly, as opposed to sites that
> advertisers suggest they visit, so they are likely to be mentioned in
> legitimate personal or business e-mail. Probably sites popular enough to be
> there have far more to lose than to gain from spamming anyway.

> I took the HTML from Alexa's five pages which listed 100 sites each, did a
> bit of text editing and hey presto: here's the list as an attached ASCII
> file.

> A quick check against my local blacklist yielded exactly 0 intersections :-)

[...]

> About a third of the top 500 sites (160) were already in my local whitelist.
> I'll probably add the rest to my whitelist too.

> Anybody here who can bulk-check these against SURBL, in case there are
> listed sites?

> Joe

Way ahead of you Joe.  I whitelisted the Alexa 500 when we
started, so you won't find them on SURBLs.  :-)  I don't
mention it because I don't want to know what Alexa's
licensing policies are.  Thanks for thinking of it
though.  :-)

I agree with your reasoning.  Popular sites are more
likely to be legitimate and get mentioned in hams, and
blocklisting them could cause a lot of FPs.  So they
should stay off.

And yes, it does include some hosting sites and ISPs
in Asia that get occasionally mentioned in casual spam.
Most of these ISPs have AUPs *on their own domains*
so that *their own domains* are probably not a major
source of spam hosting.  This does not prevent us from
listing any of their customers who spam.

Does anyone else have other potential whitelist sources
like this?

Jeff C.
--
"If it appears in hams, then don't list it."



More information about the Discuss mailing list