[SURBL-Discuss] Guestbook spam

Michael Renzmann mrenzmann at otaku42.de
Fri Nov 10 18:40:09 CET 2006


Hi all.

>> May I suggest that you try checking web spams with SURBLs and see
>> what the hit rate is like.  If the hit rate is significantly less
>> than for mail spam, then it may not be worth using our data (and
>> generating the DNS queries) for the website checking application.
> Will do so. I'm currently preparing the logged data and will see what rate
> we get for that. Will report back when I have the results.

Done, but the results are disappointing (and somewhat surprising).

I threw together a list of all recognized/blocked posts sent to
madwifi.org during the last 4 months, and added a list of all blocked spam
posts sent to trac-hacks.org during the last week. After refining the list
as described in the implementation guidelines, removing well-known domains
and the "(roughly) top 200 domains not blacklisted by SURBL", 854 domains
remained [1]. These 854 domains have been tested against a selection of 14
RHSBLs [2], some of them (such as porn.rhs.mailpolice.com) being very
specialized.

Rank 1, with 139 positives, is multi.surbl.org. This is quite surprising,
since surbl.org focuses on e-mail spamvertisements. bsb.empty.us, which
afaik focuses on website and comment spam, is on rank 7 with just 7(!)
positives... the full ranklist is at [3], and the scripts used for testing
as well as the "raw" results can be found at [4]


Conclusions:
============
1.
While I already expected that there is quite some difference between the
spamvertisement distributed by e-mail and that distributed on websites,
the recognition rate advantage of multi.surbl.org vs. bsb.empty.us is
surprising. However, 16% recognition rate is still not good enough to
justify adding additional load on surbl.org for website spam recognition.

2.
It seems that it could be worth to start yet another (more specialized)
rhsbl for the described purpose. A few Trac hackers already started
working on that.



I'd like to discuss an idea I have in mind that could improve the
recognition rate for rhsbl's (including surbl.org), but I have to rush
back home now. I'll put that in a new mail on monday.

Bye, Mike

[1] http://otaku42.de/static/spam-audit/rbltest/domains.lst.txt
[2] http://otaku42.de/static/spam-audit/rbltest/rhsbl.lst.txt
[3] http://otaku42.de/static/spam-audit/rbltest/ranklist.txt
[4] http://otaku42.de/static/spam-audit/rbltest/




More information about the Discuss mailing list