On Tue, 20 Jul 2004 15:27:52 +0200, Marc Kool m.kool@vioro.nl wrote:
Hi Jeff,
Jeff Chan wrote:
Doing a little preliminary checking of this particular dataset leads me to wonder a little how appropirate it might be for SURBLs. In particular I found over a hundred whitelist hits of sites like aol.com, att.net, btopenworld.com, budweiser.com, clara.net, cnet.com, comcast.net, he.net, lsu.edu, match.com, mindspring.com, msn.com, rr.com, sina.com, texas.net, tripod.com, umich.edu, victoriassecret.com, washington.edu, etc.:
I did a quick check on a few domains and I do not share your conclusion.
# grep aol.com domains adultaol.com register.oscar.aol.com sex-aol.com sexonaol.com usaol.com
register.oscar.aol.com is the server used by AOL messenger and ICQ to login - how on earth does this count as an Adult Website, much less a sex site?!!
# grep att.net domains adultonly.home.att.net borderjumper.home.att.net brookeb.home.att.net chrisd054.home.att.net dating.home.att.net divinenews.home.att.net lilcindy.home.att.net livevids.home.att.net livevids2.home.att.net livevids3.home.att.net livevids4.home.att.net models.home.att.net models2.home.att.net personals.home.att.net pvelasquez.home.att.net sasha69.home.att.net sex-ads.home.att.net sexworld.home.att.net xxxmovies.home.att.net
Ahh the plot thickens... Subdomains..
# grep -w au.com domains aotoys.au.com condoms.au.com freeporn.au.com hornytoad.au.com muff.au.com
Still more..
So aol.com and att.net and au.com are not in the database and not blacklisted. no subdomain of aol.com is in the blacklist.
What is register.oscar.aol.com if it isn't a subdomain?
For au.com and att.net there are only adult subdomains in the blacklist. This is ok.
However SURBL's in general don't use subdomains, I've just run a test on my personal SURBL and SpamCopURI doesn't currently look at subdomains. I suspect because of the requirement for a lookup per domain level which would obviously both make things inefficient and also leave room for a denial of service.
I assume that something went wrong when you verified the quality of the database.
I think the levels of understanding of what was in the DB and what SURBL was able to do were what went wrong.
Given my very quick testing I think it would probably be worth giving this data a try, we would most likely need to work out how to remove the subdomained entries - the list is huge, and efficiency we can gain by removing excess data would obviously be useful.
The data is somewhat preemptive - just because you have an adult content website doesn't always mean you are spamming, in fact I'm sure there are an awful lot of Adult sites which never spam.
I do however feel that there is a need for this kind of data, there are a lot of organisations which have liability concerns if their users recieve pornographic messages (schools) and many people who find adult content offensive (churches etc).
I reckon let's give it a go for a while like we did 6dos - what's the worst that can happen? We might get another SURBL - well more content is always a good thing in that case :) -- Regards,
David Hooton