Re: [SURBL-Discuss] Re: RFC: sex site domain SURBL

21 Jul 2004


      Jeff Chan wrote:
...
On Tuesday, July 20, 2004, 6:58:15 AM, David Hooton wrote:
...
On Tue, 20 Jul 2004 15:27:52 +0200, Marc Kool m.kool@vioro.nl wrote:
...
...
I did a quick check on a few domains and I do not share your conclusion.
I think we have a slight case of culture clash here.  This
adult data is meant to be used in a proxy server where
the data is apparently matched literally against URI data
from web requests, etc.
SURBLs are designed to be used with specific email message body
scanning programs that attempt to reduce the domains found in
message body URIs to their registrar (base) domain so that
subdomains like "models.home.att.net" are reduced to the
base domain "att.net" before being included in a SURBL
or checked against a SURBL.
This is new for me and it is clear.
...
The main reason we did this was to defeat the "random
subdomain" spammers who generate random subdomains to
try to defeat simple URI pattern matching or to key
their spams to confirm the recipient addresses.  Examples
might be "abc1.xyz.spammerdomain.com" and
"abc2.xyz.spammerdomain.com".  Those we want to reduce
to just "spammerdomain.com" since the randomized/keyed
versions may occur only once and the sc.surbl.org data
engine tries to increase the likelyhood of inclusion
in the list with an increasing number of reports.
It may be useful to read about the sc.surbl.org data:
Yep, the reasons why this is done are clear but are not flawless.
There are ISPs myisp.net that give customers a subdomain:
e.g. myspamsite.myisp.net which can not be included in SURBL.
I also assume that the percentage of these type of domains is not so big...
*snip*
...
...
Given my very quick testing I think it would probably be worth giving
this data a try, we would most likely need to work out how to remove
the subdomained entries - the list is huge, and efficiency we can gain
by removing excess data would obviously be useful.
Good suggestion, but perhaps slightly tricky to implement,
depending on the data.
I can easily use a regex to delete entries with subdomains
like "xxxmovies.home.att.net" so that "att.net" does not
get on the list.  But that would only be effective if the
deliberately randomized domains like "abc.xyz.spammerdomain.com"
were reduced to "spammerdomain.com" in the source data, otherwise
we would lose both.
In other words, if the data is a literal transcription of
everything found in spams, including randomized URIs like
"abc.xyz.spammerdomain.com," then we will lose the latter if I
discard all subdomains.
So Mark, can you tell us if the randomized domains that spammers
frequently used are reduced to the base domains in the adult
data, i.e. "spammerdomain.com" and not "abc.xyz.spammerdomain.com"?
Nope :-(
...
Jeff C.
There are indeed "different cultures"
surbl: fight spam of which lots is adult related
squidguard: block adult sites of which only a small percentage spams
_I assume that most sites (want to) fight spam also (want to) block adult sites_.
For the record: my originals proposal would make sex.surbl.org more 
of a squidguard-based list than a surbl-based list.
One of the reasons to propose sex.surbl.org was the fact that SURBL list
lag behind reality.  In July I received 156 spams of which 16 were not
detected by SA+SOME_SARE_RULES+OWN_RULES+SURBL because the SURBL lists 
were not updates fast enough (the 16 spams were marked as spam at a later
time because then SURBL marked them and the SA rating went up).
This is not meant to criticize anybody, just to put a fact.
I observed that many spams from new domains 
- share IP addresses
- automatically forward you to a known sex site (in the squidguard database)
and proposed sex.surbl.org
I hate to say it :-) but if the implementation gives to much headaches,
the proposal as it is now, can be disregarded.
However, I see some value for the squidguard adult database to be used by software
behind spamtraps: if an URI is retrieved and redirects you to a known sex site,
the URI can be added automatically (= fast) to a SURBL list.
Marc

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [SURBL-Discuss] Re: RFC: sex site domain SURBL