[SURBL-Discuss] Guestbook spam
mrenzmann at otaku42.de
Tue Nov 14 07:11:15 CET 2006
> I agree we could investigate some more to get als web based patterns
> going, not that hard to do.
This is where I jump in with the suggestion I already mentioned on friday :)
From what I've seen in the list of spamvertised sites (the one I used
for my tests) it seems that many of them belong to masshosters such as
aol.com, alice.it, or blog providers. These services seem to be
attractive to spammers, since many of them offer free webspace suited
for hosting link farms and what not.
Currently there are two approaches to handle this in a rhsbl: put the
whole domain on the block list, or exclude (whitelist) it. Both
approaches are far not ideal.
Blocking those domains means that the number of false positives will
rise, as this step will also block legitimate websites hosted with that
provider. Whitelisting them circumvents that problem, but results in a
higher number of false negatives (i.e. it won't catch spamsites).
Something in between both extremes would be nice.
As far as I can tell from my little investigation, it seems that these
"big hosters" provide one of two schemes for their customers:
The "customer" part of the URI is what needs to be looked at in order to
distinct spammers from non-spammers.
The examined URI is http://spammer.masshoster.tld/cheap-viagra.html. As
described in the surbl.org implementation guidelines, the first lookup
would be for masshoster.tld. The lookup resolves, the last octet of the
result is treated as bitmask (similar to how it is done for
multi.surbl.org). Since the domain belongs to a known masshoster, and
that masshoster uses hosting-scheme 1, this is signalled by having the
corresponding bit set in the response.
The application now does a second lookup, this time for
spammer.masshoster.tld (if the hoster used scheme 2, the lookup would be
for masshoster.tld.customer). If that lookup resolves, the URI is spam,
otherwise it's ham. The first lookup result will not be taken into
consideration in either case.
The modifications needed for an existing rhsbl (zone file) that
implements this enhancement as well as for the applications that make
use of the enhancement on the client side are not hard to implement IMO.
The enhancement makes use of mechanisms that already are used. No
changes are needed to the DNS servers, as far as I can tell.
The second lookup becomes necessary only for known masshoster domains.
No blind guesses are needed on the lookup application about whether a
domain is a known masshoster and which "hosting scheme" it probably uses.
The enhancement allows to rise the number of "true" positives without
the negative side effect of false positives - at least as long as the
rhsbl provider applies the same care as for the rest of his blocklist.
In order to be backward compatible this enhancement should not be
applied on a lookup zone that is queried by "non-enhanced" applications,
at least if that zone had masshosters whitelisted before. The fact that
a masshoster domain now resolves in the first lookup would be
misinterpreted by applications that are not aware of the enhancement,
resulting in a higher number of false positives. It would be better to
"mirror" such zones (for example multi.surbl.org) to a new one (for
example emulti.surbl.org, with "e" for "enhanced" ;)) and apply the
I have to admit that I'm quite new to the concept of rhsbls, and chances
are that I miss important points here. I'd be glad for any (fair)
comments and suggestions.
More information about the Discuss