Fwd: Re: [SURBL-Discuss] checking plain domains in message bodies against SURBLs reportedly effective

5 Sep 2004


      This is a forwarded message
From: Theo Van Dinter felicity@kluge.net
To: SURBL Discussion list discuss@lists.surbl.org, SpamAssassin Developers spamassassin-dev@incubator.apache.org
Date: Saturday, September 4, 2004, 10:36:53 AM
Subject: [SURBL-Discuss]  checking plain domains in message bodies against SURBLs reportedly effective
===8<==============Original message text===============
On Sat, Sep 04, 2004 at 10:45:44AM -0600, Ryan Thompson wrote:
...
Yep. Good idea, overall. There are a few gotchas:
TLD extensions sometimes map file extensions. We might have to whitelist
command.com, and the entire country of Poland. :-)
Since the domain is in plain text and doesn't contain a protocol or
subdomain (i.e., 'www'), I haven't yet seen a mail client that will
display it as a clickable URL.
This is generally the tact we're taking in SpamAssassin -- if a general
MUA doesn't display it as a link, then we don't consider it an URL.
Another issue for the generic domains thing is performance -- lots of
messages have lots of things like could potentially look like a domain,
and querying for them all adds a bit of a load on the client and the
server.
For instance:  /\b([a-zA-Z0-9_.-]{1,256}.[a-zA-Z]{2,6})\b/
in theory (I haven't tested it), will grab anything that looks like a
generic domain name in text.  If you check that list against a list of
valid TLDs, you'd probably end up with a decent list, but you'd hit the top
issue quoted above where "Go take a look at command.com" isn't clear if it's
an URL or a filename.
-- 
Randomly Generated Tagline:
"Brevity is the soul of lingerie." - Dorothy Parker

===8<===========End of original message text===========

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Fwd: Re: [SURBL-Discuss] checking plain domains in message bodies against SURBLs reportedly effective