Re: [SURBL-Discuss] RFC: SURBL software implemetation guidelines

19 Apr 2004

      On Sunday, April 18, 2004, 6:08:11 PM, Simon Byrnand wrote:
...
At 12:43 19/04/2004, Jeff Chan wrote:
...
...

Extract base (registrar) domains from those URIs. This

includes removing any and all leading host names, subdomains,
www., randomized subdomains, etc. In order to determine the
base domain it may be necessary to use a table of country code
TLDs (ccTLDs) such as the partially-imcomplete one SURBL uses.
[...]
...
If a spammer were to register a domain in NZ it would look like:
...
spammer.co.nz or spammer.net.nz or spammer.gen.nz etc.... randomised 
subdomains that they could create on their own nameservers would look like 
a65423xyz.spammer.co.nz or awef3242.fssf342.spammer.co.nz etc...
...
Will the current code (of both SpamCopURI, and the backend processing of 
the surbl servers for that matter) incorrectly strip this off to co.nz ? I 
ask, because I have definately seen dns queries from SpamCopURI trying to 
look up co.nz.sc.surbl.org which is wrong - that would cover a large 
fraction of the websites under the NZ domain heirachy, it should be looking 
up spammer.co.nz, never co.nz.
...
Is there any reliable way for the code to know what a base registrar domain 
is and how many tiers there are under that domain heirachy ? (May also be a 
non-trivial problem)
The traditional solution to ccTLDs (Country Code TLDs) seems to
be to make a table of them, and make sure any extracted domains
are +1 domain levels longer.  So for company.co.nz, don't take
co.nz as the base domain, but instead use company.co.nz since we
know from the table that co.nz is a two level country code TLD.
My slightly incomplete table of ccTLDs is at:
http://spamcheck.freeapp.net/two-level-tlds
I think SpamAssassin (3.0?) in general has code to do that.
I'm sure SpamCop's internal processing of URIs also takes it into
account.  I'm not sure how Eric's SpamCopURI currently handles
it.  I do know that the current sc.surbl.org data engine will
capture them correctly and I have somewhat of a kludge to get
rid of the two level ccTLDs that would otherwise get through
by letting the engine to all the processing on them, then
suppressing their output with a whitelist which includes the two
level ccTLD domains.  Probably it would be better to increase
the cutoff to three levels instead of two in my code whenever
handling a two-level ccTLD such as co.nz to prevent the procesing
of two-level ccTLDs themselves in the first place while still
leaving the processing of longer ccTLD domains (i.e. complete
ones like company.co.nz) in place.
So the quick answer is that the data side of sc.surbl.org has it
pretty much covered, and I'm not sure about the message parsing
side of things in SA 2.63 and 3.0.
Jeff C.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [SURBL-Discuss] RFC: SURBL software implemetation guidelines