Hi Jeff,
Jeff Chan wrote:
Doing a little preliminary checking of this particular dataset leads me to wonder a little how appropirate it might be for SURBLs. In particular I found over a hundred whitelist hits of sites like aol.com, att.net, btopenworld.com, budweiser.com, clara.net, cnet.com, comcast.net, he.net, lsu.edu, match.com, mindspring.com, msn.com, rr.com, sina.com, texas.net, tripod.com, umich.edu, victoriassecret.com, washington.edu, etc.:
I did a quick check on a few domains and I do not share your conclusion.
# grep aol.com domains adultaol.com register.oscar.aol.com sex-aol.com sexonaol.com usaol.com
# grep att.net domains adultonly.home.att.net borderjumper.home.att.net brookeb.home.att.net chrisd054.home.att.net dating.home.att.net divinenews.home.att.net lilcindy.home.att.net livevids.home.att.net livevids2.home.att.net livevids3.home.att.net livevids4.home.att.net models.home.att.net models2.home.att.net personals.home.att.net pvelasquez.home.att.net sasha69.home.att.net sex-ads.home.att.net sexworld.home.att.net xxxmovies.home.att.net
# grep -w au.com domains aotoys.au.com condoms.au.com freeporn.au.com hornytoad.au.com muff.au.com
So aol.com and att.net and au.com are not in the database and not blacklisted. no subdomain of aol.com is in the blacklist. For au.com and att.net there are only adult subdomains in the blacklist. This is ok.
that's after excluding the adult/urls list which had about 300 whitelist hits, including more hosting providers like terra.es, etc. Recall that our whitelists are not too complete, so there may be other legitimate domains that are included. We can't be blocking on aol.com, cnet.com, msn.com, etc.
Clearly some of these (shared hosting) sites may have been used to host sex content, but since RBLs are domain-based, and SURBLs are registrar-domain-based, I'm having some doubts about how useful this particular data source might be for SURBL use.
ftp://ftp.univ-tlse1.fr/pub/reseau/cache/squidguard_contrib/adult.tar.gz
Perhaps there are other lists of sex domains that are more selective?
Jeff C.
The domain terra.es is also not in the domains list. It is indeed in the url list, e.g. personal.telefonica.terra.es/web/sex terra.es/personal2/amateursexual etc.
I think that the only database that can be used by SURBL is the domains database and that the url database is not suitable to be used by SURBL since URLs are difficult to translate to a DNS query string.
I assume that something went wrong when you verified the quality of the database. If you have any questions you can also contact me off list.
-Marc