[SURBL-Discuss] Took top percentiles of DMOZ and wikipedia domains, some results

Ryan Thompson ryan at sasknow.com
Sat Oct 9 20:14:59 CEST 2004


Jeff Chan wrote to SURBL Discuss:

> xiloo.com             1667 days         342 NANAS
>
> Aside from those two, the rest may be candidates for
> whitelisting, though I did not check them further.
> (Note also that GetURI does not count NANAS; I did those
> few manually.

Hi Jeff,

The reason for that is twofold:

1. Obtaining that information automatically, although relatively easy
    and good for a fun time, expressly violates Google's ToS. They have a
    client library for automated queries, but it only allows 1000 queries
    per account per day, and doesn't yet work with Google Groups.

2. As we know, raw NANAS counts can be extremely misleading. For
    instance, as you pointed out a few days ago, yahoo.com has > 0.5M
    NANAS hits.

By forcing someone to click on the "[ NANAS ]" link, GetURI plays nicely
with Google, and encourages people to hand-check NANAS hits to look for
spamvertised examples. I'd worry that with raw counts automatically
displayed, that some would draw false conclusions from "xx NANAS".

> May I ask for some help in checking these?

Sure, I'll peek at a few right away.

- Ryan

-- 
   Ryan Thompson <ryan at sasknow.com>

   SaskNow Technologies - http://www.sasknow.com
   901-1st Avenue North - Saskatoon, SK - S7K 1Y4

         Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
   Toll-Free: 877-727-5669     (877-SASKNOW)     North America


More information about the Discuss mailing list