[SURBL-Discuss] Guestbook spam

Gerhard W. Recher (rbl) rbl at clean-mx.com
Fri Nov 10 22:33:12 CET 2006


Michael Renzmann schrieb:
> Hi all.
>
>   
>>> May I suggest that you try checking web spams with SURBLs and see
>>> what the hit rate is like.  If the hit rate is significantly less
>>> than for mail spam, then it may not be worth using our data (and
>>> generating the DNS queries) for the website checking application.
>>>       
>> Will do so. I'm currently preparing the logged data and will see what rate
>> we get for that. Will report back when I have the results.
>>     
>
> Done, but the results are disappointing (and somewhat surprising).
>
> I threw together a list of all recognized/blocked posts sent to
> madwifi.org during the last 4 months, and added a list of all blocked spam
> posts sent to trac-hacks.org during the last week. After refining the list
> as described in the implementation guidelines, removing well-known domains
> and the "(roughly) top 200 domains not blacklisted by SURBL", 854 domains
> remained [1]. These 854 domains have been tested against a selection of 14
> RHSBLs [2], some of them (such as porn.rhs.mailpolice.com) being very
> specialized.
>
> Rank 1, with 139 positives, is multi.surbl.org. This is quite surprising,
> since surbl.org focuses on e-mail spamvertisements. bsb.empty.us, which
> afaik focuses on website and comment spam, is on rank 7 with just 7(!)
> positives... the full ranklist is at [3], and the scripts used for testing
> as well as the "raw" results can be found at [4]
>
>
> Conclusions:
> ============
> 1.
> While I already expected that there is quite some difference between the
> spamvertisement distributed by e-mail and that distributed on websites,
> the recognition rate advantage of multi.surbl.org vs. bsb.empty.us is
> surprising. However, 16% recognition rate is still not good enough to
> justify adding additional load on surbl.org for website spam recognition.
>
> 2.
> It seems that it could be worth to start yet another (more specialized)
> rhsbl for the described purpose. A few Trac hackers already started
> working on that.
>
>
>
> I'd like to discuss an idea I have in mind that could improve the
> recognition rate for rhsbl's (including surbl.org), but I have to rush
> back home now. I'll put that in a new mail on monday.
>
> Bye, Mike
>
> [1] http://otaku42.de/static/spam-audit/rbltest/domains.lst.txt
> [2] http://otaku42.de/static/spam-audit/rbltest/rhsbl.lst.txt
> [3] http://otaku42.de/static/spam-audit/rbltest/ranklist.txt
> [4] http://otaku42.de/static/spam-audit/rbltest/
>
>
> _______________________________________________
> Discuss mailing list
> Discuss at lists.surbl.org
> http://lists.surbl.org/mailman/listinfo/discuss
>   
Michael Renzmann schrieb:
> Hi all.
>
>   
>>> May I suggest that you try checking web spams with SURBLs and see
>>> what the hit rate is like.  If the hit rate is significantly less
>>> than for mail spam, then it may not be worth using our data (and
>>> generating the DNS queries) for the website checking application.
>>>       
>> Will do so. I'm currently preparing the logged data and will see what rate
>> we get for that. Will report back when I have the results.
>>     
>
> Done, but the results are disappointing (and somewhat surprising).
>
> I threw together a list of all recognized/blocked posts sent to
> madwifi.org during the last 4 months, and added a list of all blocked spam
> posts sent to trac-hacks.org during the last week. After refining the list
> as described in the implementation guidelines, removing well-known domains
> and the "(roughly) top 200 domains not blacklisted by SURBL", 854 domains
> remained [1]. These 854 domains have been tested against a selection of 14
> RHSBLs [2], some of them (such as porn.rhs.mailpolice.com) being very
> specialized.
>
> Rank 1, with 139 positives, is multi.surbl.org. This is quite surprising,
> since surbl.org focuses on e-mail spamvertisements. bsb.empty.us, which
> afaik focuses on website and comment spam, is on rank 7 with just 7(!)
> positives... the full ranklist is at [3], and the scripts used for testing
> as well as the "raw" results can be found at [4]
>
>
> Conclusions:
> ============
> 1.
> While I already expected that there is quite some difference between the
> spamvertisement distributed by e-mail and that distributed on websites,
> the recognition rate advantage of multi.surbl.org vs. bsb.empty.us is
> surprising. However, 16% recognition rate is still not good enough to
> justify adding additional load on surbl.org for website spam recognition.
>
> 2.
> It seems that it could be worth to start yet another (more specialized)
> rhsbl for the described purpose. A few Trac hackers already started
> working on that.
>
>
>
> I'd like to discuss an idea I have in mind that could improve the
> recognition rate for rhsbl's (including surbl.org), but I have to rush
> back home now. I'll put that in a new mail on monday.
>
> Bye, Mike
>
> [1] http://otaku42.de/static/spam-audit/rbltest/domains.lst.txt
> [2] http://otaku42.de/static/spam-audit/rbltest/rhsbl.lst.txt
> [3] http://otaku42.de/static/spam-audit/rbltest/ranklist.txt
> [4] http://otaku42.de/static/spam-audit/rbltest/
>
>
> _______________________________________________
> Discuss mailing list
> Discuss at lists.surbl.org
> http://lists.surbl.org/mailman/listinfo/discuss
>   
Dear Michael,

i disagree not at all, but partial.

i took your list and asked our own rbl server. results in short 789 out 
of 854 Domains of your list will be recognized by our service  clean-mx
surbl 139/854
uribl 66/854

see results on http://support.clean-mx.de/clean-mx/rbltest_results.txt  
this list is based on your input (and also preserves order)
by we way you shall not block virgilio.it .....

Web-Site-spamming either blogs guestbooks etc... has a different 
approach from the point of view of their originators.

1) it's tricky to tweak pages in the web for abuse
2) this is time consuming so only a few will do that in the "wild"
3) it's much more easier to mail all this stuff over a bot-net
4) the message of all these spammers is always the same... buy .... look 
at .. obey this finacial tip.... help me... and so on
5) they have to attract their readers to their message so they always 
must use the same sort of linguistic acrobatic tokens...

at least the same set of keywords stopping mailspam is sufficient to 
detect and stop web-spam

I totally agree that spamvertized domains in web-spam is a bit diffrent 
from mailspam but not much.



yours gerhard
(feel free to contact me off-list....)



More information about the Discuss mailing list