Re: [SURBL-Discuss] Re: second and third level domains - again!

28 Apr 2004


      On Tuesday, April 27, 2004, 1:47:41 PM, John Fawcett wrote:
...
Jeff Chan wrote on Mon Apr 26 03:58:58 CEST 2004
...
...
The underlying principle as I see it is that most
major site will have functional anti-abuse and anti-spam
policies, so either a base domain is good or bad.  I
know that seems simplistic, but it's easy and fast to
implement AND it seems to match reality pretty well.
...
There may come a point where a "big" domain 
starts to appear in spam despite an outward
image of "antispam" and the choice which the
current infrastructure offers is to block all of it or 
none of it. Blocking all of it may produce
too many FPs whereas blocking a subset may be
acceptable. At the moment there haven't
been any tough decisions  to take on listing or
whitelisting. Everything has been very clear cut.
It's not guaranteed to stay that way.
...
I don't know how real a risk this is, but I was
worried about an infrastructure which effectively 
ties our hands on this point. (Once surbl is 
deployed in many different client software, 
I suppose it will be hard to change anything about
the "public interface" to the data....)
That's a valid concern about making the mechanisms too
rigid, but the thing to remember is that legitimate parent
domain operators like .uk or yahoo.com have a strong
incentive to keep their child domains (subdomains) clean
of spammer hosting and other abuse.
In other words I don't see the mixed case happening too often,
simply due to the best interests of most legitimate sites in
*staying* legitimate.  But you're right we should not design
ourselves into a corner unnecessarily.
...
...
That's an interesting idea.  Basically you want to
signal redirection to higher domain levels with a
special result for levels that should never get
checked like co.uk.
...
...
That might be doable, but it would require extra
logic on the client side as you note.  That already
sounds more complex than I like, though I see what
you're getting at.  Better to control what goes into
the data (i.e. never let the TLD itself co.uk in),
and make sure the client is following similar rules.
...
What I was proposing is a *change* to client
processing logic, but not added *complexity*.
rather a simplification.
...
What I mean is that currently the client has to
contain

processing logic
data on the ccTLDs

...
(in the long run there will be multiple versions 
of the ccTLD data implemented in various client 
versions at any one time. Users will have to 
upgrade the client software to keep current
with domain data).
...
Using an A record in surbl to indicate 
"this domain is not listed but a subdomain is. Try again"
would mean the client just follows a simple
processing rule. It doesn't need to know 
anything about specific domain data.
...
The logic would be something like this.
lets assume that "url" is found in an email.
...
level=2
more_info_result = '127.0.0.255'
listed_result = '127.0.0.2'
do
{
    result = query_dns ( extract_domain (url, level))
    level = level +1 
}while (result == more_info_result)
...
if (result == listed_result) { score it }
Yes, this is a nice, modular approach.  Though we may
want to adjust the specifics, it's a good idea to make
the handling of ccTLDs uniform across data and clients
somehow.
Another approach would be for the SURBL data side to
borrow the same SpamAssassin ccTLD modules that the two
SA clients are using.  Key is that we're all handling
them similarly.
Remember that the goal is to capture the registered domain,
whatever form that happens to take.
...
...
We will always catch bigspammer.co.uk with the current
strategy.
...
I think that some of the ccTLDs have a mixed assignment
strategy. This means that they should sometimes be
checked at the 2nd level sometime at the 3rd level. 
The current logic always checks at a single predefined level.
Not quite; it's table-driven at least on the data side.
If co.uk is in the ccTLD table then the third level is
checked, i.e. spammerdomain.co.uk.  Since secondlevelspamdomain.uk
is *not* in the table it would get checked at the third level...
*and caught*.  :-)
Eric or Justin, what is the Perl or SA module currently
being used on the client side to handle ccTLDs again please?
I should probably look into using it on the data side too.
...
The two example I saw were: .fr and .ca
...
Currently we check .ca at the third level, but it is possible
to register a second level domain at .ca which we never catch
so bigspammer.ca will get through.
The signalling is not at the TLD.  It's at whatever level
is in the table.  We don't list .ca, but we do list ab.ca.
That means foobar.ab.ca gets checked at the third level
and somenewspamdomain.ca gets checked at the second level.
...
We check .fr at the second level however there are 
many "standard" second level domains (like .nom.fr) which 
means we probably want to be checking these ones at the third 
level. (Translation: any bigspammer.nom.fr domain is imune to
the current strategy unless we want to upset everyone who has
a nom.fr domain by listing that).
In this case there's a lack of data on the .fr ccTLDs.  If
somemone could research that and get them to me I'll add them
to our table.  (Ditto any other countries.  :-)  FWIW I just
added nom.fr to:
nom.fr
tm.fr
gouv.fr
asso.fr
nom.fr
avocat.fr
notaire.fr
barreau.fr
mairie.fr
The danger in lacking ccTLD data is not that spammers will
get away with hosting (spammer.fr will always get caught if
the ccTLD data is missing), but that a legitimate ccTLD might get
added to the blocklists, i.e. a false positive.
...
...
I know a lot of what I argue for above seems simplistic
when a more complex solution could have more interesting
results, but very often the simpler solutions are better,
especially in terms of resource consumption.
...
You're right to argue for the most simple solution. (In fact
there are probably simpler solutions than the one I
suggested!). My concern was to avoid inflexibility
in the infrastructure and automatic immunity for various 
classes of domains.
Which is always a good concern.  In some ways the simpler
the solution the more flexible it will be and vice versa.
Jeff C.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [SURBL-Discuss] Re: second and third level domains - again!