On Tuesday, July 20, 2004, 10:59:27 AM, Marc Kool wrote: (David Hooton wrote:)
However SURBL's in general don't use subdomains, I've just run a test on my personal SURBL and SpamCopURI doesn't currently look at subdomains. I suspect because of the requirement for a lookup per domain level which would obviously both make things inefficient and also leave room for a denial of service.
Hmmm. I am afraid that spammers will abuse this property of SpamCopURI.
Actually the design decision to reduce subdomains to base domains was made to eliminate the abuse by spammers of using randomized subdomains....
Since AOL, ATT, MSN, or other legitimate ISPs and their subdomains are not often professional spammer destinations, it seemed more important to catch the deliberate randomizers. It looks like that may be less so for sex sites.
This is what I stated in the original proposal: let's make a SURBL list for adult-related URI's, not necessarily spammers. I know that SURBL is meant to fight spam, but it is relatively easy to extend with functionality to ban emails that refer to adult sites, that I think SURBL is the place to do it instead of creating a new mechanism in SA.
I agree about some of the value in this, certainly for squid use. I can think of a few different ways to proceed:
1. Discard all subdomains: probably too drastic for squid use since some legitimate sites could be lost, but probably appropriate for SURBL use.
2. Fold subdomains to registrar domains: creates too many false positives (at least for SURBL use) of sites hosted on otherwise legitimate hosting providers like att.net, etc. Would also break some squid matches.
3. Include the subdomains (the fully qualified-domain names) in the list as they appear in the data: this will prevent the registrar domains (like att.net) from matching in SURBLs, and it's also faithful to the original data, which can be a good thing in general and is probably preferable for squid use.
The main problem is that most code for using SURBLs on the client (mail server) side try to reduce the subdomains down to base domains. So they will tend not to match deliberately included subdomains. That can be an ok thing for SURBLs. Essentially it tells SURBLs to ignore the subdomains. If we wanted SURBLs to actually match these spam sites we'd check the full subdomains.
For Squid use #3 is probably the desirable since it best captures the original data.
So #3 would probably get the best results for both squid and SURBLs (by side effect of not matching the registrar domains). It's probably the best compromise under the current designed uses of both squid and SURBLs.
Comments?
Jeff C.