[SURBL-Discuss] Guestbook spam

Michael Renzmann mrenzmann at otaku42.de
Tue Nov 14 07:11:15 CET 2006


Hi.

> I agree we could investigate some more to get als web based patterns
> going, not that hard to do.

This is where I jump in with the suggestion I already mentioned on friday :)

 From what I've seen in the list of spamvertised sites (the one I used 
for my tests) it seems that many of them belong to masshosters such as 
aol.com, alice.it, or blog providers. These services seem to be 
attractive to spammers, since many of them offer free webspace suited 
for hosting link farms and what not.

Currently there are two approaches to handle this in a rhsbl: put the 
whole domain on the block list, or exclude (whitelist) it. Both 
approaches are far not ideal.

Blocking those domains means that the number of false positives will 
rise, as this step will also block legitimate websites hosted with that 
provider. Whitelisting them circumvents that problem, but results in a 
higher number of false negatives (i.e. it won't catch spamsites). 
Something in between both extremes would be nice.

As far as I can tell from my little investigation, it seems that these 
"big hosters" provide one of two schemes for their customers:

  1. http://customer.hoster.tld/...
  2. http://host.hoster.tld/customer/...

The "customer" part of the URI is what needs to be looked at in order to 
distinct spammers from non-spammers.


Example:
The examined URI is http://spammer.masshoster.tld/cheap-viagra.html. As 
described in the surbl.org implementation guidelines, the first lookup 
would be for masshoster.tld. The lookup resolves, the last octet of the 
result is treated as bitmask (similar to how it is done for 
multi.surbl.org). Since the domain belongs to a known masshoster, and 
that masshoster uses hosting-scheme 1, this is signalled by having the 
corresponding bit set in the response.

The application now does a second lookup, this time for 
spammer.masshoster.tld (if the hoster used scheme 2, the lookup would be 
for masshoster.tld.customer). If that lookup resolves, the URI is spam, 
otherwise it's ham. The first lookup result will not be taken into 
consideration in either case.


Advantages:
1.
The modifications needed for an existing rhsbl (zone file) that 
implements this enhancement as well as for the applications that make 
use of the enhancement on the client side are not hard to implement IMO. 
The enhancement makes use of mechanisms that already are used. No 
changes are needed to the DNS servers, as far as I can tell.

2.
The second lookup becomes necessary only for known masshoster domains. 
No blind guesses are needed on the lookup application about whether a 
domain is a known masshoster and which "hosting scheme" it probably uses.

3.
The enhancement allows to rise the number of "true" positives without 
the negative side effect of false positives - at least as long as the 
rhsbl provider applies the same care as for the rest of his blocklist.


In order to be backward compatible this enhancement should not be 
applied on a lookup zone that is queried by "non-enhanced" applications, 
at least if that zone had masshosters whitelisted before. The fact that 
a masshoster domain now resolves in the first lookup would be 
misinterpreted by applications that are not aware of the enhancement, 
resulting in a higher number of false positives. It would be better to 
"mirror" such zones (for example multi.surbl.org) to a new one (for 
example emulti.surbl.org, with "e" for "enhanced" ;)) and apply the 
changes there.


I have to admit that I'm quite new to the concept of rhsbls, and chances 
are that I miss important points here. I'd be glad for any (fair) 
comments and suggestions.

Bye, Mike


More information about the Discuss mailing list